In [1]:
import pandas as pd
import numpy as np
import re

# Flag for Overheating

## Motivation
We noticed that a suprising number of violations mentioned "overheating." Similarly, we also found a large number of violations mentioning extreme cold and hypothermia. With climates rapidly changing, these issues may become more common in the future. This flag captures violations related to extreme ambient temperatures.  

## Logic Overview

We use several include and exclude conditions to capture relevant reports. This logic is a starting point; we will continue to iterate on improving accurate capture of violations. 

#### Include Conditions
1. We use keywords that are temperature measurements. 
2. "F" is used as shorthand for Fahrenheit, but is also used in various codes mentioned in the reports. We use a Regex pattern to only capture reports mentioning "F" in a temperature context. 
3. We use keywords related to overheating/excessive cooling. To avoid capture of reports mentioning heating equipment (heating pads, heated blankets), the majority of keywords are results of extreme ambient temperatures: heat stress, hypothermia, heat stroke, frostbite, etc.

#### Exclude Conditions
1. Many relevant reports mention temperature given in degrees. We use a Regex pattern for including recorded temperatures (80.5 degrees, 52 F, 25 Celsius) but excluding non-temperature related mentions (degrees of rust, degrees of hair loss).
2. We exclude mentions of "180 degrees" which is the water temperature required for proper sanitization and an extremely improbable ambient temperature on Earth. 


## Output
Will create Citations csv in flagged_citations folder with the following new columns, all with prefix "flag_". 
Indicator columns: 
- 'flag_include_1'
- 'flag_include_2'
- 'flag_include_3'
- 'flag_exclude_1'
- 'flag_exclude_2'
- 'flag_overheating'

In [2]:
# Read in most recent aphis inspection-citations.csv
combined_dir = '../aphis-inspection-reports/data/combined/'

citations = pd.read_csv(combined_dir + 'inspections-citations.csv')
citations.shape

(38749, 6)

## Flag Conditions

### Include Conditions

In [3]:
# 1 
temp_measurement_keywords = [
    'fahrenheit',
    'celsius',
    'deg f', 
    'degrees'
]

citations['flag_include_1'] = citations['narrative'].apply(lambda x: any(word in [word for word in x.lower().split()] for word in temp_measurement_keywords))
citations['flag_include_1'].value_counts()

flag_include_1
False    38136
True       613
Name: count, dtype: int64

In [4]:
# 2
temperature_pattern = re.compile(r'\b\d+(\.\d+)? f\b')

citations['flag_include_2'] = citations['narrative'].apply(lambda x: bool(temperature_pattern.search(x.lower())))
citations['flag_include_2'].value_counts()

flag_include_2
False    38697
True        52
Name: count, dtype: int64

In [5]:
# 3
overheating_keywords = [
    'climatic',
    'overheating',
    'heat stress', 
    'hot weather',
    'extreme heat', 
    'high body temperature',
    'heat index'
    'heat stroke', 
    'cold weather', 
    'extreme cold',
    'hypothermia', 
    'frostbite', 
    'low body temperature',  
    'cold stress', 
    'hypothermic' 
]

citations['flag_include_3'] = citations['narrative'].apply(lambda x: any(word in [word for word in x.lower().split()] for word in overheating_keywords))
citations['flag_include_3'].value_counts()

flag_include_3
False    38434
True       315
Name: count, dtype: int64

In [6]:
#citations[citations['include_1'] == True]['narrative'].tolist()

### Exclude Conditions

In [7]:
# 1
citations['flag_exclude_1'] = citations['narrative'].apply(lambda x: 'degrees of' in x.lower())
citations['flag_exclude_1'].value_counts()

flag_exclude_1
False    38695
True        54
Name: count, dtype: int64

In [8]:
# 2
overheating_negative_keywords = [
    '180 f', 
    '180 degrees'
]

citations['flag_exclude_2'] = citations['narrative'].apply(lambda x: any(keyword in x.lower() for keyword in overheating_negative_keywords))
citations['flag_exclude_2'].value_counts()

flag_exclude_2
False    38722
True        27
Name: count, dtype: int64

## Creating Overheating Flag Column

In [9]:
# Combined overheating flag
citations['flag_overheating'] = ((( (citations['flag_include_1'] | citations['flag_include_2']) & citations['flag_include_3']) ) & ~citations['flag_exclude_1'] & ~citations['flag_exclude_2'] )
citations['flag_overheating'].value_counts()

flag_overheating
False    38646
True       103
Name: count, dtype: int64

## Spot-checking Flag

In [10]:
# Spot-check for positives
citations[citations['flag_overheating'] == True]['narrative'].sample(100).tolist()

["There is 1 water buffalo, 1 watusi, 2 brahman, 6 domestic cows, and 4 equine in one enclosure. These animals have a\nshelter that is 12' by 14'. At time of inspection there was heavy, wet snow and temperatures were hovering around 30\ndegrees F.\nNatural or artificial shelter appropriate to the local climatic conditions for the species concerned shall be provided for all\nanimals kept outdoors to afford them protection and to prevent discomfort to such animals.\nAppropriate shelter for the weather conditions must be provided for all animals housed outdoors to protect them from the\ninclement weather and prevent discomfort.\nTo be corrected by January 27, 2023\nThis inspection and exit interview were conducted with licensee.\nEnd Section",
 '(b) Shelter from the elements. (4) Contain clean, dry, bedding material.\n***In the out door enclosures, the housing units had a carpet rug on the floors. The facility representative stated that\nshavings are put into the housing units when it is 

In [11]:
# Spot-check for negatives
citations[citations['flag_overheating'] == False]['narrative'].sample(100).tolist()

['* Documentation of the last recorded physical examination for 25 adult dogs could not be found at the time of inspection.\nPhysical examination of every dog on the premise is necessary to evaluate each dog for its overall internal and external\nphysical body condition. In addition, the physical examination will be able to evaluate the overall condition and health of\nthe hair coat, feet, ears, and eyes. The licensee must have a complete physical examination from head to tail of each dog\nby the attending veterinarian not less than once every 12 months. Correct by 10/7/23.',
 'Opie, a four-year-old neutered male corgi dog in good physical condition per the consignor and transporter, was accepted\nfor consigned transport at approximately 6:50 PM ET on 30December2022 in Townsend, Delaware. The dog was\ntransported in a crate in the transporter’s minivan. The dog was delivered to the receiver in Jesup, Iowa at approximately\nmidmorning on 05January2023. The dog was found cold, stiff, and

In [12]:
# Filtering for flag
flagged_citations = citations[citations['flag_overheating'] == True]
flagged_citations.shape

(103, 12)

In [13]:
# % of flagged citations
(flagged_citations.shape[0]/citations.shape[0]) * 100

0.2658133113112597

In [14]:
# Save citations with new flag column
flagged_citations.to_csv('../flagged_citations/overheating.csv')