## Instructions for Compiling a Data Set
This notebook will provide instructions for compliling a data set for mental health statistics in various North Carolina counties.

To get started, locate and download the County Health .csv file from the repository. (It can be found near the top of the repository.)

#### **Uploading A .csv File Into Notebook**

1. Once the County Health .csv file is downloaded, click on the file icon found on the left side bar of the screen.
2. Locate the *content* folder
3. Use the three dots to *upload* the .csv file from your desktop

Once the file is successfully uploaded, create a new code cell using the *+Code* button on the top bar.

#### **Importing Pandas**

In order to compile the new data subset, we need to import the Pandas package into our notebook.
- This can be done by typing `import pandas as pd`

**Press Shift, Enter to run the code. You will need to do this for every line of code.**



In [None]:
import pandas as pd

### Applying Pandas

Now, we need to have pandas read the .csv file so that we can analyze the County Health data set. The code we use for this will be `pd.read_csv`
- Before typing in that line of code, we need to assign a name to the data. This will make it easier to reference the data later on. An example of this would be using a name such as initial_data. The name itself does not necessarily matter, just make sure that you will be able to recognize what data it is associated with.
- After assigning a name to this data, you will need to type an = between the name and the `pd.read_csv`
-- Example: `initial_data=pd.read_csv`
- Following the `pd.read_csv`, enter the exact name of the County Health .csv file insinde of parantheses and quotation marks (in that order).
-- The end result should look like this: `intial_data=pd.read_csv("CountyHealthData_2014-2015.csv")`

**Note:** Capitalization and spaces do matter, so make sure the file name is typed correctly. Always make sure to check for spelling mistakes throughout your code.


In [4]:
raw_data=pd.read_csv("CountyHealthData_2014-2015.csv")

### Using the .loc Method

1. We first need to determine the titles of the columns we will want to use for the new data set. In this case we will be using the State, County, Poor Mental Health Days, and Mental Health Providers columns.
2. In a new code cell, type the reference name we previously came up with followed by `.loc`.
3. After the `.loc`, type an open bracket followed by a colon (:). This colon tells the program that we want to use multiple rows and columns at once.
4. Type a comma after the colon, then type another open bracket.
5. Type in the title of each column being used, with each name being inside of it's own set of quotation marks. The names should also be seperated by commas.
6. Now, close both sets of brackets and run the code, you should get a list starting at 0 that contains data for mental health statistics for each state and county within the County Health data set.

In [5]:
raw_data.loc[:,["State","County","Poor mental health days","Mental health providers"]]

Unnamed: 0,State,County,Poor mental health days,Mental health providers
0,AK,Aleutians West Census Area,2.1,99.0
1,AK,Aleutians West Census Area,2.1,163.0
2,AK,Anchorage Borough,3.0,204.0
3,AK,Anchorage Borough,3.0,285.0
4,AK,Bethel Census Area,2.6,734.0
...,...,...,...,...
6104,WY,Uinta County,3.5,313.0
6105,WY,Washakie County,2.3,198.0
6106,WY,Washakie County,2.3,319.0
6107,WY,Weston County,3.0,68.0


### Creating a New Subset

In this step, we are simply going to assign a new name to the table previously made. Because this new table only contains mental health statistics, we can make the name related to mental health.

- In order to reassign this table, we need to type in the new name for this subset. For example, we could use the name `mental_health`.
- Type an equal sign after the new name, then insert the exact line of code you used to create the table using the `.loc` method.


In [6]:
MH_subset = raw_data.loc[:,["State","County","Poor mental health days","Mental health providers"]]

You can test that you've done this step correctly by typing the newly assigned subset name by itself. You should once again recieve the table which displays mental health statistics for states and counties.

In [7]:
MH_subset

Unnamed: 0,State,County,Poor mental health days,Mental health providers
0,AK,Aleutians West Census Area,2.1,99.0
1,AK,Aleutians West Census Area,2.1,163.0
2,AK,Anchorage Borough,3.0,204.0
3,AK,Anchorage Borough,3.0,285.0
4,AK,Bethel Census Area,2.6,734.0
...,...,...,...,...
6104,WY,Uinta County,3.5,313.0
6105,WY,Washakie County,2.3,198.0
6106,WY,Washakie County,2.3,319.0
6107,WY,Weston County,3.0,68.0


### Narrowing the Range

You may have noticed that the table we came up with has data for all 50 states, but we only want data for North Carolina. To fix this, we can use filtering.

1. Type the name of the mental health subset, followed by an open bracket. No space is needed.
2. Type the name of the mental health subset once again, again followed by an open bracket.
3. Inside the second set of brackets, type in `"State"`.
4. Close the second set of bracket.
5. Type a space, then two equal signs.
6. Type another space then `"NC"`.
7. Close the first set of brackets and run the code.

The result should be a new table with only data for North Carolina.

In [8]:
MH_subset[MH_subset["State"] == "NC"]

Unnamed: 0,State,County,Poor mental health days,Mental health providers
3243,NC,Alamance County,3.6,76.0
3244,NC,Alamance County,3.6,140.0
3245,NC,Alexander County,4.6,18.0
3246,NC,Alexander County,4.6,43.0
3247,NC,Alleghany County,4.4,145.0
...,...,...,...,...
3438,NC,Wilson County,3.1,119.0
3439,NC,Yadkin County,4.6,39.0
3440,NC,Yadkin County,4.6,60.0
3441,NC,Yancey County,4.1,67.0


### Exporting A New .csv File

Now that we have our new data set complied, it is time to export it as a .csv file.

1. Type the name of the subset previously created.
2. Follow the name with `.to_csv` with no spaces between.
3. Type a parantheses and a quotation mark, typing in a name for the new .csv file inside.
4. Close the quotation mark and type a comma.
5. Type a space, followed by `index=False`. The final line of code should look like: `mental_health.to_csv("MentalHealth.csv", index=False)`
6. Close the paratnesse and run the code.

Check if the export has been successful by clicking on the file icon on the left side bar.
- Go to the content folder and your new .csv file should be there.
- Click on the three dots and click download.


In [9]:
MH_subset.to_csv("NC_MentalHealth.csv", index=False)