# Overview

This the procedure to take the original data from the `"CountyHealthData_2014-2015.csv"` and subset it into a new piece of data that contains information specifically for the **Uninsured children in North Carolina by County in 2014**.

This procedural document goes through 13 steps that eaily goes through how to subest the data starting with importing pandas and the csv file to Identifying the "series" to subsetting the data, copying it and creating a new csv file for this new subset. 

## Step 1 -- Importing Pandas 

Pandas is a package that provides tools that are needed that are not already avialable in Python. 

The pandas package can be imported by using the command: 

In [14]:
import numpy as np 
import pandas as pd

## Step 2 -- Import the CSV file

To insert the CSV file we have to identify the data frame as **County_Health_Data** and use a code to allow pandas to be able to read the CSV file. 

To import the data set we can use the code:

In [15]:
County_Health_Data=pd.read_csv("CountyHealthData_2014-2015.csv")

## Step 3 -- Explore the Dataframe 

Exploring the dataset first can help to understand the dataframe more before aggregating it. 

To see how many columns and rows there are in the dataframe we can use the attribute `.shape` which will bring up the data (rows * columns).    

In [16]:
County_Health_Data.shape

(6109, 64)

To see all of the column names we can use the attribute `.column` 

In [17]:
County_Health_Data.columns

Index(['State', 'Region', 'Division', 'County', 'FIPS', 'GEOID', 'SMS Region',
       'Year', 'Premature death', 'Poor or fair health',
       'Poor physical health days', 'Poor mental health days',
       'Low birthweight', 'Adult smoking', 'Adult obesity',
       'Food environment index', 'Physical inactivity',
       'Access to exercise opportunities', 'Excessive drinking',
       'Alcohol-impaired driving deaths', 'Sexually transmitted infections',
       'Teen births', 'Uninsured', 'Primary care physicians', 'Dentists',
       'Mental health providers', 'Preventable hospital stays',
       'Diabetic screening', 'Mammography screening', 'High school graduation',
       'Some college', 'Unemployment', 'Children in poverty',
       'Income inequality', 'Children in single-parent households',
       'Social associations', 'Violent crime', 'Injury deaths',
       'Air pollution - particulate matter', 'Drinking water violations',
       'Severe housing problems', 'Driving alone to work'

## Step 4 -- Identifying the "Series"

We need to use bracket notation to call the information in 1 series of the dataframe. 

We want to know what is contained in the column **"Uninsured children"**.

We can do that by using the code below:

In [9]:
County_Health_Data["Uninsured children"]

0       0.250
1       0.176
2       0.096
3       0.123
4       0.124
        ...  
6104    0.090
6105    0.086
6106    0.101
6107    0.084
6108    0.092
Name: Uninsured children, Length: 6109, dtype: float64

This can also be done to see the information contained in the column **"Region"**.

In [18]:
County_Health_Data["Region"]

0       West
1       West
2       West
3       West
4       West
        ... 
6104    West
6105    West
6106    West
6107    West
6108    West
Name: Region, Length: 6109, dtype: object

Using dot notation we can see how values of each region there are in the dataframe. 

To do this we can use a method for categorical values shown below:

In [19]:
County_Health_Data.Region.value_counts()

South        2803
Midwest      2038
West          834
Northeast     434
Name: Region, dtype: int64

## Step 5 -- Filtering for Region 

Using nested squared brackets we can filter the data by using logical conditions.

The statement used is `County_Health_Data[County_Health_Data["Region"] == "South"]'`

It allows for Python to be able to search for the column **"Region"** and then to specifically be able to find the entries for **"South"** using **'True/False values'** to select rows.


In [20]:
County_Health_Data[County_Health_Data["Region"] == "South"]

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
46,AL,South,East South Central,Autauga County,1001,1001,Region 16,1/1/2014,8376.0,0.228,...,7.42,0.180,0.047,10219.0,0.156,18.0,51441,0.361,3.57,0.237
47,AL,South,East South Central,Autauga County,1001,1001,Region 16,1/1/2015,8405.0,0.228,...,8.23,0.169,0.037,9939.0,0.156,18.0,51868,0.383,4.60,
48,AL,South,East South Central,Baldwin County,1003,1003,Region 16,1/1/2014,7770.0,0.127,...,14.71,0.209,0.056,9624.0,0.144,27.0,48867,0.368,4.18,0.193
49,AL,South,East South Central,Baldwin County,1003,1003,Region 16,1/1/2015,7457.0,0.127,...,15.29,0.199,0.054,9502.0,0.144,29.0,47539,0.344,4.30,
50,AL,South,East South Central,Barbour County,1005,1005,Insuff Data,1/1/2014,9458.0,0.234,...,,0.242,0.061,10809.0,0.169,11.0,30287,0.664,5.13,0.176
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6058,WV,South,South Atlantic,Wirt County,54105,54105,Insuff Data,1/1/2015,10747.0,0.161,...,,0.235,0.055,10459.0,0.131,119.0,37132,0.526,,
6059,WV,South,South Atlantic,Wood County,54107,54107,Region 27,1/1/2014,7944.0,0.205,...,11.03,0.222,0.044,10707.0,0.170,89.0,42065,0.433,3.29,0.178
6060,WV,South,South Atlantic,Wood County,54107,54107,Region 27,1/1/2015,8111.0,0.205,...,12.84,0.217,0.040,10355.0,0.170,96.0,41205,0.450,3.30,
6061,WV,South,South Atlantic,Wyoming County,54109,54109,Region 27,1/1/2014,14864.0,0.348,...,59.77,0.213,0.050,10662.0,0.242,13.0,32880,0.513,8.29,0.242


## Step 6 -- Copying Filtered Data 

To use this subset of data we can make a copy of the new dataframe so that python knows what columns to pull out of the original datframe for this new subset. 

We can do this by using the `.copy()`
 method at the end of the code.


In [22]:
South_subset = County_Health_Data[County_Health_Data["Region"] == "South"].copy()

In [9]:
South_subset

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
46,AL,South,East South Central,Autauga County,1001,1001,Region 16,1/1/2014,8376.0,0.228,...,7.42,0.180,0.047,10219.0,0.156,18.0,51441,0.361,3.57,0.237
47,AL,South,East South Central,Autauga County,1001,1001,Region 16,1/1/2015,8405.0,0.228,...,8.23,0.169,0.037,9939.0,0.156,18.0,51868,0.383,4.60,
48,AL,South,East South Central,Baldwin County,1003,1003,Region 16,1/1/2014,7770.0,0.127,...,14.71,0.209,0.056,9624.0,0.144,27.0,48867,0.368,4.18,0.193
49,AL,South,East South Central,Baldwin County,1003,1003,Region 16,1/1/2015,7457.0,0.127,...,15.29,0.199,0.054,9502.0,0.144,29.0,47539,0.344,4.30,
50,AL,South,East South Central,Barbour County,1005,1005,Insuff Data,1/1/2014,9458.0,0.234,...,,0.242,0.061,10809.0,0.169,11.0,30287,0.664,5.13,0.176
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6058,WV,South,South Atlantic,Wirt County,54105,54105,Insuff Data,1/1/2015,10747.0,0.161,...,,0.235,0.055,10459.0,0.131,119.0,37132,0.526,,
6059,WV,South,South Atlantic,Wood County,54107,54107,Region 27,1/1/2014,7944.0,0.205,...,11.03,0.222,0.044,10707.0,0.170,89.0,42065,0.433,3.29,0.178
6060,WV,South,South Atlantic,Wood County,54107,54107,Region 27,1/1/2015,8111.0,0.205,...,12.84,0.217,0.040,10355.0,0.170,96.0,41205,0.450,3.30,
6061,WV,South,South Atlantic,Wyoming County,54109,54109,Region 27,1/1/2014,14864.0,0.348,...,59.77,0.213,0.050,10662.0,0.242,13.0,32880,0.513,8.29,0.242


## Step 7 -- Filtering the New Subset 

To filter the dataframe in more to specifically just include the **"Uninsured children"** column by **State** and **County** we can refilter the data. 

We will use the `.loc` attribure to find the column we want. 

To do this in the statement we change the **County_Health_Data** to the new **South_subset**.

Using double bracket notation we can use the method below to filter the subset. 


In [29]:
South_subset.loc[:,["State","County","Year", "Uninsured children"]]

Unnamed: 0,State,County,Year,Uninsured children
46,AL,Autauga County,1/1/2014,0.047
47,AL,Autauga County,1/1/2015,0.037
48,AL,Baldwin County,1/1/2014,0.056
49,AL,Baldwin County,1/1/2015,0.054
50,AL,Barbour County,1/1/2014,0.061
...,...,...,...,...
6058,WV,Wirt County,1/1/2015,0.055
6059,WV,Wood County,1/1/2014,0.044
6060,WV,Wood County,1/1/2015,0.040
6061,WV,Wyoming County,1/1/2014,0.050


## Step 8 -- Naming the New Subset

To make this new data a new subset we repeat Step 6 except instead of **South_subset** we change the name to a new name.

This names the new subset and then the equal sign tells Python what the new subset is. 


In [30]:
Uninsured_children_subset = South_subset.loc[:,["State","County","Year", "Uninsured children"]]

In [31]:
Uninsured_children_subset

Unnamed: 0,State,County,Year,Uninsured children
46,AL,Autauga County,1/1/2014,0.047
47,AL,Autauga County,1/1/2015,0.037
48,AL,Baldwin County,1/1/2014,0.056
49,AL,Baldwin County,1/1/2015,0.054
50,AL,Barbour County,1/1/2014,0.061
...,...,...,...,...
6058,WV,Wirt County,1/1/2015,0.055
6059,WV,Wood County,1/1/2014,0.044
6060,WV,Wood County,1/1/2015,0.040
6061,WV,Wyoming County,1/1/2014,0.050


## Step 9 -- Filtering the New Dataset by Specific Columns 

Now that we have filter the dataset to include the information that we want we can filter it to be a specific state.

We can use this formula to be able to specifically tell python that we want the **counties** and **uninsured children** columns for **North Carolina** specifically. 

Uninsured_children_subset[Uninsured_children_subset["State"] == "NC"]



In [46]:
Uninsured_children_subset[Uninsured_children_subset["State"] == "NC"]

Unnamed: 0,State,County,Year,Uninsured children
3243,NC,Alamance County,1/1/2014,0.073
3244,NC,Alamance County,1/1/2015,0.088
3245,NC,Alexander County,1/1/2014,0.077
3246,NC,Alexander County,1/1/2015,0.076
3247,NC,Alleghany County,1/1/2014,0.131
...,...,...,...,...
3438,NC,Wilson County,1/1/2015,0.079
3439,NC,Yadkin County,1/1/2014,0.097
3440,NC,Yadkin County,1/1/2015,0.094
3441,NC,Yancey County,1/1/2014,0.110


We can then name this new data set so that when the name of the new data set is entered into python it will pull up the new subset.

We can do this by using the following code:

In [55]:
Uninsured_children_1=Uninsured_children_subset[Uninsured_children_subset["State"] == "NC"]

## Step 10 -- Copying the New Subset 

We can use the `.copy()` method to copy the data so that python knows what data to pull for this new subset

We can use the following code:


In [47]:
Uninsured_children_1=Uninsured_children_subset[Uninsured_children_subset["State"] == "NC"].copy()

In [48]:
Uninsured_children_1

Unnamed: 0,State,County,Year,Uninsured children
3243,NC,Alamance County,1/1/2014,0.073
3244,NC,Alamance County,1/1/2015,0.088
3245,NC,Alexander County,1/1/2014,0.077
3246,NC,Alexander County,1/1/2015,0.076
3247,NC,Alleghany County,1/1/2014,0.131
...,...,...,...,...
3438,NC,Wilson County,1/1/2015,0.079
3439,NC,Yadkin County,1/1/2014,0.097
3440,NC,Yadkin County,1/1/2015,0.094
3441,NC,Yancey County,1/1/2014,0.110


Now if you put the name of the new subset it will paste the data that is now in the new subset. 

In [39]:
Uninsured_children_1

Unnamed: 0,State,County,Year,Uninsured children
3243,NC,Alamance County,1/1/2014,0.073
3244,NC,Alamance County,1/1/2015,0.088
3245,NC,Alexander County,1/1/2014,0.077
3246,NC,Alexander County,1/1/2015,0.076
3247,NC,Alleghany County,1/1/2014,0.131
...,...,...,...,...
3438,NC,Wilson County,1/1/2015,0.079
3439,NC,Yadkin County,1/1/2014,0.097
3440,NC,Yadkin County,1/1/2015,0.094
3441,NC,Yancey County,1/1/2014,0.110


## Step 11 -- Filtering by Year 

We can repeat the same steps we have in previous steps we have used to filter the data but change the name of the dataframe and what we are filtering by.

When we are filtering this dataset we are filtering it by year so that all of the counties only have one year variable **2014** instead of both 2014 and 2015. 

We can do this by using this code:

In [43]:
Uninsured_children_1[Uninsured_children_1["Year"] == "1/1/2014"]

Unnamed: 0,State,County,Year,Uninsured children
3243,NC,Alamance County,1/1/2014,0.073
3245,NC,Alexander County,1/1/2014,0.077
3247,NC,Alleghany County,1/1/2014,0.131
3249,NC,Anson County,1/1/2014,0.063
3251,NC,Ashe County,1/1/2014,0.093
...,...,...,...,...
3433,NC,Wayne County,1/1/2014,0.076
3435,NC,Wilkes County,1/1/2014,0.071
3437,NC,Wilson County,1/1/2014,0.081
3439,NC,Yadkin County,1/1/2014,0.097


#### Naming the New Subset

We can now name this subset by putting the new name is equal to what we have done to the data. 

We can use this code to make the new subset:

In [49]:
Uninsured_children_2=Uninsured_children_1[Uninsured_children_1["Year"] == "1/1/2014"]

## Step 12 -- Copying the New Subset 

This makes a copy of the dataset that we have filtered to specifically include only 2014. 

We can now use the same step from Step 12 using the `.copy()` method 


In [51]:
Uninsured_children_2=Uninsured_children_1[Uninsured_children_1["Year"] == "1/1/2014"].copy()

In [52]:
Uninsured_children_2

Unnamed: 0,State,County,Year,Uninsured children
3243,NC,Alamance County,1/1/2014,0.073
3245,NC,Alexander County,1/1/2014,0.077
3247,NC,Alleghany County,1/1/2014,0.131
3249,NC,Anson County,1/1/2014,0.063
3251,NC,Ashe County,1/1/2014,0.093
...,...,...,...,...
3433,NC,Wayne County,1/1/2014,0.076
3435,NC,Wilkes County,1/1/2014,0.071
3437,NC,Wilson County,1/1/2014,0.081
3439,NC,Yadkin County,1/1/2014,0.097


## Step 13 -- Exporting the New Subset as a CSV file  

To export the new subset as a csv we use the method `.to_csv()-`

To do this to our specific subset we add the filename and the extension in the parantheses.

This exports the csv file to our directory.  

`Uninsured_children_2.to_csv("Uninsured_children_2.to_csv"`


In [54]:
Uninsured_children_2.to_csv("Uninsured_children_2.to_csv")