# **Overview**

#### This manual will guide you through isolating a subset from a larger set of data using simple coding.

#### Specifically, this manual will be using the Public County Health Data Set which includes records from across the US from the years 2014-2015. However, the steps used here can be used in any other data set.

# **Getting Started**

#### First, you must download your data set into your google drive. Once you have done so, you can mount your drive to your colab notebook using the following code.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


#### Next you must import numpy and pandas which will allow you to use different coding functions with your data set. You can use the functions "import" and "as" to bring numpy and pandas into your colab notebook and name them for easier use.

#### Bellow, numpy is named "np" and pandas is named "pd".

In [None]:
import numpy as np
import pandas as pd

# **Accessing the Data Set**

#### Now that we have imported everything, we can use pandas to read our data set csv file.

#### Use the following code and within the parenthesis, include the pathway to your csv file.


```
name=pd.read_csv()
```


#### You can name the data set whatever you like by changing the left side of the equal sign. In the example below, we have named the data set "data".

In [None]:
data=pd.read_csv('gdrive/My Drive/CountyHealthData_2014-2015.csv')

#### We can now simply type the name of our data set to run the function and ensure that our data is being read correctly.

In [None]:
data

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
0,AK,West,Pacific,Aleutians West Census Area,2016,2016,Insuff Data,1/1/2014,,0.122,...,,0.374,0.250,3791.0,0.185,216.0,69192,0.127,,0.287
1,AK,West,Pacific,Aleutians West Census Area,2016,2016,Insuff Data,1/1/2015,,0.122,...,,0.314,0.176,4837.0,0.185,254.0,74088,0.133,,
2,AK,West,Pacific,Anchorage Borough,2020,2020,Region 22,1/1/2014,6827.0,0.125,...,15.37,0.218,0.096,6588.0,0.119,135.0,71094,0.319,6.29,0.160
3,AK,West,Pacific,Anchorage Borough,2020,2020,Region 22,1/1/2015,6856.0,0.125,...,17.08,0.227,0.123,6582.0,0.119,148.0,76362,0.334,5.60,
4,AK,West,Pacific,Bethel Census Area,2050,2050,Insuff Data,1/1/2014,13345.0,0.211,...,,0.394,0.124,5860.0,0.200,169.0,41722,0.668,12.77,0.477
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6104,WY,West,Mountain,Uinta County,56041,56041,Insuff Data,1/1/2015,7436.0,0.135,...,18.66,0.192,0.090,7600.0,0.123,47.0,60953,0.273,,
6105,WY,West,Mountain,Washakie County,56043,56043,Insuff Data,1/1/2014,6580.0,0.106,...,,0.225,0.086,8202.0,0.099,47.0,49533,0.328,,0.133
6106,WY,West,Mountain,Washakie County,56043,56043,Insuff Data,1/1/2015,7572.0,0.106,...,,0.226,0.101,7940.0,0.099,47.0,50740,0.309,,
6107,WY,West,Mountain,Weston County,56045,56045,Insuff Data,1/1/2014,5633.0,0.162,...,,0.201,0.084,6906.0,0.130,28.0,53665,0.232,,0.171


# Isolating a Subset of Data

#### In this example, we would like to only look at the Northeastern region of the data set.

#### We can do this by setting one the collum "region" to only show the "Northeast" rows using the code below. You can do this with any collum by changing "region" to the name of another collum.


In [None]:
Northeast_data = data[data["Region"] == "Northeast"].copy()

#### Note that we have also renamed this new subset that we have created as "Northeast_data". You can change the name by changing the text on the left side of the equal sign.

#### Now that we have isolated the Northeastern data, we can make this subset more focused and only look at the collums that include the data for "adult obesity" and "food environment index".

#### We can do this by using the ".loc" function as seen bellow.

In [None]:
Northeast_subset = Northeast_data.loc[:,["Region","Adult obesity","Food environment index"]].copy()

#### Once again, note that we have renamed the new subset "Northeast_subset".

#### Now, to ensure that we have correctly isolated the rows and collums that we want, we can type the name of our subset to display it.

In [None]:
Northeast_subset

Unnamed: 0,Region,Adult obesity,Food environment index
593,Northeast,0.186,8.572
594,Northeast,0.196,8.300
595,Northeast,0.246,7.962
596,Northeast,0.261,7.700
597,Northeast,0.219,9.020
...,...,...,...
5726,Northeast,0.225,8.400
5727,Northeast,0.235,8.242
5728,Northeast,0.225,8.000
5729,Northeast,0.240,8.364


#### Finally, we can export the subset we created as a csv file using the following code. We include "index=False" in our code so that the row numbers from the old data set are not transferred to our new subset.

In [None]:
Northeast_subset.to_csv("Northeast_subset.csv", index=False)

#### To download this new csv file to your divice, simply go to the file's tab on the left and click "download".