# Project 2 - Analyzing Adverse Food Events

Dylan Schwartz, Jenny Zhu

## Project Overview
Our project will aim to predict the potential symptoms a person may experience after consuming a particular food, given the food type and the person’s age and gender. We are interested in determining if certain food brands or food types can lead to adverse health effects in college students, as well as other age and gender groups. We propose developing a machine learning model to classify records into groups of potential health symptoms based on certain features, such as a person’s age, gender, the food’s brand name, and the food category. 



## Questions:

1. What symptoms are most prevalent? 
2. Given a person’s age, gender, reaction date, and food brand, what is the most likely health outcome, and what are the most likely symptoms to be experienced?
3. Can we predict the food a person consumed based on their symptoms?
4. What kind of foods should certain demographics avoid eating? 


## Terminology: 

### * Cleaning the data *

In [53]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math
import datetime
import collections
from IPython.display import Image

%matplotlib inline

In [88]:
# Importing Adverse Food Events Table
app_cols = ['RA_Report #', 'RA_CAERS Created Date', 'AEC_Event Start Date', 'PRI_Product Role', 'PRI_Reported Brand/Product Name', 'PRI_FDA Industry Code', 'PRI_FDA Industry Name', 'CI_Age at Adverse Event', 'CI_Age Unit', 'CI_Gender', 'AEC_One Row Outcomes', 'SYM_One Row Coded Symptoms']
pre_events = pd.read_csv('adversefoodevents.csv', dtype=object , names=app_cols,
                     encoding='latin-1')
# deleting columns that are not needed
pre_events  = pre_events.drop(0)
pre_events = pre_events.drop('RA_Report #', axis=1)
pre_events = pre_events.drop('RA_CAERS Created Date', axis=1)

pre_events.head()


Unnamed: 0,AEC_Event Start Date,PRI_Product Role,PRI_Reported Brand/Product Name,PRI_FDA Industry Code,PRI_FDA Industry Name,CI_Age at Adverse Event,CI_Age Unit,CI_Gender,AEC_One Row Outcomes,SYM_One Row Coded Symptoms
1,8/4/2003,Suspect,MIDWEST COUNTRY FAIR CHOCOLATE FLAVORED CHIPS,3,Bakery Prod/Dough/Mix/Icing,2.0,Year(s),Female,"VISITED AN ER, VISITED A HEALTH CARE PROVIDER,...","SWELLING FACE, RASH, WHEEZING, COUGH, HOSPITAL..."
2,8/4/2003,Suspect,MIDWEST COUNTRY FAIR CHOCOLATE FLAVORED CHIPS,3,Bakery Prod/Dough/Mix/Icing,2.0,Year(s),Female,"VISITED AN ER, VISITED A HEALTH CARE PROVIDER,...","SWELLING FACE, WHEEZING, COUGH, RASH, HOSPITAL..."
3,,Suspect,KROGER CLASSIC CREAM-DE-MINT CANDY MINT CHIP I...,13,Ice Cream Prod,,Not Available,Female,VISITED AN ER,"NAUSEA, DYSGEUSIA, DIARRHOEA"
4,11/24/2003,Suspect,ENFAMIL LIPIL BABY FORMULA,40,Baby Food Prod,3.0,Month(s),Not Available,NON-SERIOUS INJURIES/ ILLNESS,"GASTROINTESTINAL DISORDER, VOMITING"
5,,Suspect,ENFIMIL LIPIL BABY FORMULA,40,Baby Food Prod,,Not Available,Not Available,VISITED A HEALTH CARE PROVIDER,"GASTROINTESTINAL DISORDER, PHYSICAL EXAMINATION"


In [111]:
# # Creating events table (cleaned version of the pre_events table)

# ## HELPER FUNCTIONS #####

# # Helper function that assigns empty start date values to the current date
def clean_start_date(v):
    if pd.isnull(v):
        return datetime.datetime.now().strftime("%Y-%m-%d")
    else:
        date = v.split("/")
        c_date = datetime.date(int(date[2]), int(date[0]), int(date[1])).strftime("%Y-%m-%d")
        return c_date
    
def get_age(row):
    age = row['CI_Age at Adverse Event']
    if(pd.isnull(age) or isinstance(int(age), int) == False):
        return '-1' 
    elif(row['CI_Age Unit'] == 'Month(s)'):
        return str(int(age)/12)
    else:
        return age

# # Helper function that converts comma-separated symptoms into a list
def list_symptoms(x): 
    if(isinstance(x, str)):
        symptoms_list = x.split(",")
        return symptoms_list
    else:
        return []


# ##########################

# #creating cleaned food events dataframes 
events = pre_events.copy(deep=True)

# # Cleaning adverse reaction start date
events['AEC_Event Start Date'] = pre_events['AEC_Event Start Date'].apply( lambda v: clean_start_date(v))

# # Making a new column for the patient's age in years
events['year_age'] = pre_events.apply( lambda row: get_age(row), axis=1)
events = events.drop('CI_Age at Adverse Event', axis=1)
events = events.drop('CI_Age Unit', axis=1)

# # # Converting comma-separated symptoms into a list of symptoms 
events['SYM_One Row Coded Symptoms'] = pre_events['SYM_One Row Coded Symptoms'].apply( lambda x: list_symptoms(x))

events.head(50)

Unnamed: 0,AEC_Event Start Date,PRI_Product Role,PRI_Reported Brand/Product Name,PRI_FDA Industry Code,PRI_FDA Industry Name,CI_Gender,AEC_One Row Outcomes,SYM_One Row Coded Symptoms,year_age
1,2003-08-04,Suspect,MIDWEST COUNTRY FAIR CHOCOLATE FLAVORED CHIPS,3,Bakery Prod/Dough/Mix/Icing,Female,"VISITED AN ER, VISITED A HEALTH CARE PROVIDER,...","[SWELLING FACE, RASH, WHEEZING, COUGH, HOS...",2.0
2,2003-08-04,Suspect,MIDWEST COUNTRY FAIR CHOCOLATE FLAVORED CHIPS,3,Bakery Prod/Dough/Mix/Icing,Female,"VISITED AN ER, VISITED A HEALTH CARE PROVIDER,...","[SWELLING FACE, WHEEZING, COUGH, RASH, HOS...",2.0
3,2018-04-24,Suspect,KROGER CLASSIC CREAM-DE-MINT CANDY MINT CHIP I...,13,Ice Cream Prod,Female,VISITED AN ER,"[NAUSEA, DYSGEUSIA, DIARRHOEA]",-1.0
4,2003-11-24,Suspect,ENFAMIL LIPIL BABY FORMULA,40,Baby Food Prod,Not Available,NON-SERIOUS INJURIES/ ILLNESS,"[GASTROINTESTINAL DISORDER, VOMITING]",0.25
5,2018-04-24,Suspect,ENFIMIL LIPIL BABY FORMULA,40,Baby Food Prod,Not Available,VISITED A HEALTH CARE PROVIDER,"[GASTROINTESTINAL DISORDER, PHYSICAL EXAMINAT...",-1.0
6,2003-12-21,Suspect,"FRITO LAY FUNYUNS ONION FLAVOR, ONION RINGS",7,Snack Food Item,Male,NON-SERIOUS INJURIES/ ILLNESS,[CHOKING],10.0
7,2018-04-24,Suspect,GRAPE,20,Fruit/Fruit Prod,Not Available,DEATH,"[DEATH, CHOKING]",-1.0
8,2003-12-01,Suspect,HERBALIFE RELAX NOW,54,Vit/Min/Prot/Unconv Diet(Human/Animal),Female,VISITED A HEALTH CARE PROVIDER,"[PARANOIA, PHYSICAL EXAMINATION, DELUSION]",-1.0
9,2003-12-01,Suspect,HERBALIFE TOTAL CONTROL,54,Vit/Min/Prot/Unconv Diet(Human/Animal),Female,VISITED A HEALTH CARE PROVIDER,"[PARANOIA, PHYSICAL EXAMINATION, DELUSION]",-1.0
10,2018-04-24,Suspect,YOHIMBE,54,Vit/Min/Prot/Unconv Diet(Human/Animal),Male,REQ. INTERVENTION TO PRVNT PERM. IMPRMNT.,[BLOOD PRESSURE INCREASED],66.0
