This script aims to analyze and process data related to smoke estimates, asthma hospitalization rates, and respiratory disease mortality for Tulare County, CA. The primary objectives include:

1. Data Retrieval:
   - Importing CSV files containing smoke estimates, asthma hospitalization rates, and respiratory disease mortality data.
   

2. Data Filtering:
   - Extracting respiratory disease mortality and asthma hospitalization data specific to Tulare County (identified by FIPS code 6107 and county name Tulare).
   - Filtering the asthma hospitalization dataset to focus exclusively on Tulare County.
   - Removing unnecessary columns from the respiratory disease dataset.


3. Data Segmentation:
   - Separating respiratory disease data into categories based on disease types such as Chronic obstructive pulmonary disease (COPD) and Asthma.
   - Dividing the disease data further by gender (Male, Female, Both) for detailed analysis.


4. Data Export:
   - Saving the segmented and filtered datasets into separate CSV files for subsequent in-depth analysis and visualization.

In [1]:
## Import necessary libraries
import pandas as pd

In [2]:
# File paths for smoke estimate, asthma hospitalization, and respiratory disease data
smoke_estimate_file_path = '../intermediate data/annual_smoke_estimate.csv'
asthma_hospitalization_file_path = '../data/Asthma_Hospitalization_Rates_For_California_Counties_from_2015_to_2020.csv'
resp_disease_data_file_path = '../data/IHME_USA_COUNTY_RESP_DISEASE_MORTALITY_1980_2014_CALIFORNIA_Y2017M09D26.csv'

In [3]:
# Reading data into DataFrames
smoke_estimate_df = pd.read_csv(smoke_estimate_file_path)
asthma_hospitalization_df = pd.read_csv(asthma_hospitalization_file_path)
resp_disease_df = pd.read_csv(resp_disease_data_file_path)

In [4]:
# Filtering asthma hospitalization data for Tulare County
asthma_hospitalization_filtered_df = asthma_hospitalization_df[asthma_hospitalization_df['COUNTY'] == 'Tulare']

In [5]:
# Filtering respiratory disease data for Tulare County (FIPS code 6107) and dropping unnecessary columns
resp_disease_filtered_df = resp_disease_df[resp_disease_df['FIPS'] == 6107]
resp_disease_filtered_df = resp_disease_filtered_df.drop(['lower', 'upper', 'measure_id'], axis=1)

In [6]:
# Separating data for Chronic obstructive pulmonary disease (COPD) and Asthma
resp_disease_copd_df = resp_disease_filtered_df[resp_disease_filtered_df['cause_name'] == 'Chronic obstructive pulmonary disease']
resp_disease_asthma_df = resp_disease_filtered_df[resp_disease_filtered_df['cause_name'] == 'Asthma']

In [7]:
# Separating data by gender for Asthma and COPD
resp_disease_male_asthma_df = resp_disease_asthma_df[resp_disease_asthma_df['sex'] == 'Male']
resp_disease_female_asthma_df = resp_disease_asthma_df[resp_disease_asthma_df['sex'] == 'Female']
resp_disease_both_asthma_df = resp_disease_asthma_df[resp_disease_asthma_df['sex'] == 'Both']

resp_disease_male_copd_df = resp_disease_copd_df[resp_disease_copd_df['sex'] == 'Male']
resp_disease_female_copd_df = resp_disease_copd_df[resp_disease_copd_df['sex'] == 'Female']
resp_disease_both_copd_df = resp_disease_copd_df[resp_disease_copd_df['sex'] == 'Both']

In [8]:
# Saving filtered data into separate CSV files for further analysis
resp_disease_male_asthma_df.to_csv('../intermediate data/resp_disease_male_asthma.csv', index=False)
resp_disease_female_asthma_df.to_csv('../intermediate data/resp_disease_female_asthma.csv', index=False)
resp_disease_both_asthma_df.to_csv('../intermediate data/resp_disease_both_asthma.csv', index=False)
resp_disease_male_copd_df.to_csv('../intermediate data/resp_disease_male_copd.csv', index=False)
resp_disease_female_copd_df.to_csv('../intermediate data/resp_disease_female_copd.csv', index=False)
resp_disease_both_copd_df.to_csv('../intermediate data/resp_disease_both_copd.csv', index=False)