# Creating a Data Compilation of Adult Obesity Rates

This notebook will provide as a guide to carving out the subset data of Adult Obesity Rates in North Carolina by county from the core data set. 

## Importing Pandas and Data

We want to begin by importing the pandas and numpy package, and define them as "pd" for pandas and "np" for numpy.

In [2]:
import pandas as pd
import numpy as np

We can't extract data without a data set, so let's import our data file. We can do this by using "pd.read_csv("name of file.csv")" and defining it as "df". 

In [3]:
df=pd.read_csv("CountyHealthData_2014-2015.csv")

## Indexing and Filtering Data

Now we want to begin filtering our data to get our desired subset. 

Let's begin by filtering our data to show only the data for North Carolina. We can do this by using df[df["State"]=="NC"].

In [4]:
df[df["State"]=="NC"]

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
3243,NC,South,South Atlantic,Alamance County,37001,37001,Region 20,1/1/2014,7123.0,0.192,...,10.48,0.259,0.073,8640.0,0.167,46.0,41394,0.444,4.94,0.202
3244,NC,South,South Atlantic,Alamance County,37001,37001,Region 20,1/1/2015,7291.0,0.192,...,12.38,0.249,0.088,9050.0,0.167,56.0,43001,0.455,4.60,
3245,NC,South,South Atlantic,Alexander County,37003,37003,Region 20,1/1/2014,7974.0,0.178,...,22.74,0.240,0.077,9316.0,0.205,30.0,39655,0.417,6.27,0.273
3246,NC,South,South Atlantic,Alexander County,37003,37003,Region 20,1/1/2015,8079.0,0.178,...,24.04,0.239,0.076,9242.0,0.205,32.0,46064,0.449,7.20,
3247,NC,South,South Atlantic,Alleghany County,37005,37005,Insuff Data,1/1/2014,8817.0,0.234,...,18.18,0.320,0.131,9585.0,0.210,55.0,34046,0.523,,0.215
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3438,NC,South,South Atlantic,Wilson County,37195,37195,Region 20,1/1/2015,8028.0,0.159,...,7.31,0.262,0.079,9450.0,0.107,77.0,40772,0.556,9.60,
3439,NC,South,South Atlantic,Yadkin County,37197,37197,Region 20,1/1/2014,7893.0,0.207,...,18.45,0.252,0.097,10084.0,0.158,32.0,40012,0.422,3.76,0.241
3440,NC,South,South Atlantic,Yadkin County,37197,37197,Region 20,1/1/2015,7258.0,0.207,...,20.21,0.242,0.094,10998.0,0.158,32.0,40998,0.455,,
3441,NC,South,South Atlantic,Yancey County,37199,37199,Region 15,1/1/2014,6872.0,0.193,...,20.79,0.268,0.110,7707.0,0.158,79.0,36019,0.477,,0.176


Next, we want to filter our data to show us only the information we wish to see. In this case, we want to see Adult Obesity rates by County in North Carolina, so we can filter this by using the .loc command. To do this, we can use df.loc[:,["State","County","Adult obesity"]]. However, since we want to see data for only North Carolina, we must end the command with [df["State"]== "NC"].

In [5]:
df.loc[:,["State","County","Adult obesity"]][df["State"]== "NC"]

Unnamed: 0,State,County,Adult obesity
3243,NC,Alamance County,0.341
3244,NC,Alamance County,0.332
3245,NC,Alexander County,0.272
3246,NC,Alexander County,0.283
3247,NC,Alleghany County,0.247
...,...,...,...
3438,NC,Wilson County,0.373
3439,NC,Yadkin County,0.297
3440,NC,Yadkin County,0.301
3441,NC,Yancey County,0.287


## Exporting Our Data

Now that we've effectively extracted our desired subset of data from the core data, we need to export this data into our own csv file. We should name our file and export it onto our computer using the .to_csv command. Additionally, since we don't necessarily need our index values, we can set our index values to false

In [8]:
df.to_csv("df", index=False)