### Overview

You will begin by mounting your google drive so that you can access your csv file located in your colab notebook folder.

Notice that we import our google drive as ('/content/gdrive') so that python is able to recognize our google drive.
    

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


### Importing

Next, we'll begin by importing the packages that we'll need to use with Python.

We will load pandas with `import pandas` and name it `as pd` statement. This will allow us to call functions from `pandas` with `pd.<function>` instead of `pandas.<function>` so that we don't have to write our pandas every time. This is not a vital function but it does make it easier.

We also imported the `numpy` package `as np`, which helps in solving some of the math involved and makes calculations easier for us.

In [3]:
import numpy as np
import pandas as pd

### Colab Notebook

Next, we'll make sure that the csv file we want to use is located inside of a folder. In this instance, I have it inside of a folder named 'Colab Notebook'.

We will also make sure that the name of the file matches what is written in the coding line below. This ensures that 'pd' is able to read the coding.

When we include `df=pd.read_csv` this allows pandas to understand that the entire csv file we attached is now named df, which is short for data frame. This, again, can be named whatever you would like and is only for convience.

In [4]:
df=pd.read_csv('gdrive/My Drive/Colab Notebooks/CountyHealthData_2014-2015.csv')

### Uninsured Adults

Next, we'll want to begin our first data subset from the larger csv file.

We will name the subset whatever is convienent. In this case, I named it `Uninsured_Adults`. The underscore is vital so that `pd` is able to read it as a title.

We use `df.loc` so that we show the names of columns via the .loc code.

We then include the names of the columns wanted. In this case, we wanted every state and county that had data for uninsured adults.

In [5]:
Uninsured_Adults = df.loc[:,["State","County","Uninsured adults"]].copy()

### Uninsured Adults cont.

Now that pandas recognizes our coding string as `Uninsured_Adults` we are able to simply enter this code and our dataset should be visible.


In [6]:
Uninsured_Adults

Unnamed: 0,State,County,Uninsured adults
0,AK,Aleutians West Census Area,0.374
1,AK,Aleutians West Census Area,0.314
2,AK,Anchorage Borough,0.218
3,AK,Anchorage Borough,0.227
4,AK,Bethel Census Area,0.394
...,...,...,...
6104,WY,Uinta County,0.192
6105,WY,Washakie County,0.225
6106,WY,Washakie County,0.226
6107,WY,Weston County,0.201


### Poor Health subset

Congrats! You have now completed your first data subset!

Now we will move onto our next data subset.

This process is very similar to the previous one.

We begin by naming our code whatever we would like. In this case, I named it `Poor_Health`.

Make sure to include `Poor_Health= df.loc` to only show the names of the columns you want included.

Again, I wanted to show every state and county that had poor or fair health so those were the three items I included in my string.


In [7]:
Poor_Health = df.loc[:,["State","County","Poor or fair health"]].copy()

### Poor health cont.

Now that pandas recognizes our coding string as `Poor_Health` we are able to simply enter this code and our dataset should be visible.

In [8]:
Poor_Health

Unnamed: 0,State,County,Poor or fair health
0,AK,Aleutians West Census Area,0.122
1,AK,Aleutians West Census Area,0.122
2,AK,Anchorage Borough,0.125
3,AK,Anchorage Borough,0.125
4,AK,Bethel Census Area,0.211
...,...,...,...
6104,WY,Uinta County,0.135
6105,WY,Washakie County,0.106
6106,WY,Washakie County,0.106
6107,WY,Weston County,0.162


### Median Household Income subset

Congrats! You have now completed your next data subset!

Now we will move onto second to last data subset.

This process is, again, very similar to the previous one.

We begin by naming our code whatever we would like. In this case, I named it `Median_Household_Income`.

This process should seem very familar by now as we are simply repeating each code and including every heading we want included. This data could be altered by including regions rather than states, counties, etc.

Make sure to include `Median_Household_Income= df.loc` to only show the names of the columns you want included.

Again, I wanted to show every state and county and their median household income so those were the three items I included in my string.

In [9]:
Median_household_income = df.loc[:,["State","County","Median household income"]].copy()

### Median Household income cont.

We now can simply enter into our next line of code `Median_Household_Income` and our data subset should show up.



In [10]:
Median_household_income

Unnamed: 0,State,County,Median household income
0,AK,Aleutians West Census Area,69192
1,AK,Aleutians West Census Area,74088
2,AK,Anchorage Borough,71094
3,AK,Anchorage Borough,76362
4,AK,Bethel Census Area,41722
...,...,...,...
6104,WY,Uinta County,60953
6105,WY,Washakie County,49533
6106,WY,Washakie County,50740
6107,WY,Weston County,53665


### Final subset

Finally, we want to combine all three of our sections into one large subset.

This should be a repetitive process as you will be performing the same line of code as before.

Begin by naming your subset whatever you would like. I decided upon `Final` for my subset.

Next, ensure that you have `final=df.loc`to only include the names of the columns desired.

Laslty, you will include all states and counties in all three headings. These being Median Household Income, Poor or fair health, and Uninsured adults.


In [11]:
Final= df.loc[:,["State","County","Median household income","Poor or fair health","Uninsured adults"]].copy()

### Final subset cont.

You should simply be able to enter whatever you named your subset, mine being `Final` and a final data subset including all the headings we wanted in one large subset.



In [12]:
Final

Unnamed: 0,State,County,Median household income,Poor or fair health,Uninsured adults
0,AK,Aleutians West Census Area,69192,0.122,0.374
1,AK,Aleutians West Census Area,74088,0.122,0.314
2,AK,Anchorage Borough,71094,0.125,0.218
3,AK,Anchorage Borough,76362,0.125,0.227
4,AK,Bethel Census Area,41722,0.211,0.394
...,...,...,...,...,...
6104,WY,Uinta County,60953,0.135,0.192
6105,WY,Washakie County,49533,0.106,0.225
6106,WY,Washakie County,50740,0.106,0.226
6107,WY,Weston County,53665,0.162,0.201


### Exporting our New Subsets

Once we've finished making our data subsets and creating more **useful** subsets for deeper analysis, we can export them as new .csv files, giving us the ability to reuse these datasets and further review them and have them available for others to view and analyze as well when posted on our public GitHub website.

#### Exporting to .csv file

To do so we can use the code `.to_csv()` - adding the filename and extension within the parentheses at the end.

For example, for our final subset would put: `Final.to_csv("Final.csv")` this will export a `.csv` file in our folder.

However, we only want to include the column names not the index numbers when exporting.

To eliminate the index number, we can add `index=false` to our line of code, which tells it not to bring in those index numbers.

`Final.to_csv("Final.csv", index=False)`

We will repeat this for every data subset we would want to export and upload to GitHub.

In [13]:
Uninsured_Adults.to_csv("Uninsured_Adults.csv", index=False)

In [14]:
Poor_Health.to_csv("Poor_Health.csv", index=False)

In [15]:
Median_household_income.to_csv("Median_household_income.csv", index=False)

In [16]:
Final.to_csv("Final.csv", index=False)

#### Congrats!

You have now created and exported four total data subsets from the original csv file!

You are now able to upload these to your public GitHub website!