# Data Beyond the Limit USA!

This notebook will walk us through the steps I took to create a new subset of data from the pre-processed set.

This data was downloaded from The Washington Post set named *data-2C-beyond-the-limit-usa* from the software developer GitHub.



## Overview of Steps

* Background
* Importation of Data
* Filtering Data
* Exporting New Dataset

### **Background**
1. Begin by creating a folder on your device to be the destination for your downloaded data.
2. Then, open the GitHub website for the dataset. It can be accessed [here](https://github.com/washingtonpost/data-2C-beyond-the-limit-usa/tree/main).
3. From this link, click on the "data" folder. From there, click on the "processed" folder.
>You will be brought to a screen displaying all the acquired data in this set. By the end, we will have smaller more specific version of this data.
4. Download the set titled "model_state.csv" and place it in the folder on your device designated as the destination for this data.
5. Import this file into your Google Drive - *specifically in the "Colab Notebooks" folder.*

####**Congrats! You've completed the background steps to begin working!**


---

### **Importation of Data**

1. Open up a new Google Colaboratory Notebook and title it something related to this project.
> It doesn't have to be anything special. Mine was simply named Climate Data!
2. Mount your Google Drive to the Notebook by running the following code.



In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


> A permission will likely pop up asking to sign into a Google account. Connect to the account that has your data.

3. Import the Pandas and Numpy packages by replicating the code precisely as it is seen below.

In [None]:
import pandas as pd

In [None]:
import numpy as np

> Importing the packages as these abbreviations (pd and np) allows for quicker usage later on in the coding process.

4. Use Pandas to read the csv file from your Google Drive.
>The file must be read through pandas with a pathway through your Google Drive, and then to the specific file. This can be done by running the code below.

In [None]:
MSDF=pd.read_csv('gdrive/My Drive/Colab Notebooks/model_state.csv')

> While running the code, you must have a name to call the dataset back to. I named the set above *MSDF* (MS standing for Model State, DF for dataframe). To name a set you simply write the chosen name, the equal sign (=), and the code.

 What you type in may be very slightly different than what I have. This would only be the case if you renamed the file after downloading it. If so, be sure that after the last slash in the code you type in **the exact name of the file, including the ".csv" at the end.**

P.S. you can type in the name you chose to display the rows and columns of the dataset! I did this below!

In [None]:
MSDF

Unnamed: 0,fips,Fall,Spring,Summer,Winter,max_warming_season,Annual,STUSAB,STATE_NAME,STATENS
0,1,-0.195668,-0.105862,-0.325009,0.458526,Winter,-0.035048,AL,Alabama,1779775
1,4,1.203951,1.38448,1.274455,1.388388,Winter,1.31988,AZ,Arizona,1779777
2,5,-0.04254,0.266399,0.058596,0.532247,Winter,0.214074,AR,Arkansas,68085
3,6,1.570921,1.449242,1.478335,1.41243,Fall,1.480561,CA,California,1779778
4,8,1.055309,1.43691,1.367845,1.838758,Winter,1.438589,CO,Colorado,1779779
5,9,1.453093,1.543407,1.580628,2.633975,Winter,1.801492,CT,Connecticut,1779780
6,10,1.378949,1.537848,1.522878,2.201002,Winter,1.661683,DE,Delaware,1779781
7,12,1.076586,0.860797,0.914455,1.300233,Winter,1.034878,FL,Florida,294478
8,13,0.251217,0.174462,-0.016056,1.109362,Winter,0.384049,GA,Georgia,1705317
9,16,0.686631,1.004868,1.07272,1.409905,Winter,1.046917,ID,Idaho,1779783


####**You're doing great! Now that your data is in the program, we can start filtering it down!**


---



### **Filtering of Data**

There are several different attributes of the data sets that can be interesting to explore and find more out about. To replicate this process though, we only want to look at and compare the data representing the winter and annual temperature changes. We can do this by filtering some of the columns out of our set.

1. You can call for only specific columns in your quote by writing them in quotation marks. To write multiple, add commas between them.
>**Remember! You must code the names of the columns exactly how they appear in the dataset.**
2. Start you code by writing the name you chose for the data. (MSDF for me) Then add the names of your columns.
3. You must specify which rows to display, also.
>We want to show all the rows for each of the states in the study. This means you have to add [0:49] to the end of your code.

It should appear similar to what is written below. The only possible difference would be due to differing names for the datasets.

In [None]:
MSDF [[ "Winter", "Annual", "STUSAB"]] [0:49]

Unnamed: 0,Winter,Annual,STUSAB
0,0.458526,-0.035048,AL
1,1.388388,1.31988,AZ
2,0.532247,0.214074,AR
3,1.41243,1.480561,CA
4,1.838758,1.438589,CO
5,2.633975,1.801492,CT
6,2.201002,1.661683,DE
7,1.300233,1.034878,FL
8,1.109362,0.384049,GA
9,1.409905,1.046917,ID


After creating your new subset, you need to be sure to title it, so you can come back to it.
4. To do this, first type in the title you intend to choose for the subset along with the equal sign
5. Then copy and paste the entire previous code after the equal sign.
>I named my subset MSDFSub. Yours should appear similar to what is shown below.

In [None]:
 MSDFSub = MSDF [[ "Winter", "Annual", "STUSAB"]] [0:49]

It can be beneficial to run this code to make sure the new subset has been successfully named. This will make it easier to export this new dataset in the next section.

After running this code, you will be left with a smaller set. This will allow for easier analysis of the impacts of the winter on the annual temperature changes in the states!

####**Awesome! You've finished the tough part. Now that your data is filtered, all you have to do is download this new set!**

---

### **Exporting New Dataset**
Exporting data works similarly to other aspects of this coding process. You must use the pandas .to_csv function to do it.

1. Type the name of your new subset in first.
>In my case, this is MSDFSub
2. Then add .to_csv
3. To the end, add parentheses filled with quotation marks and the name of subset with .csv at the end
> Mine looked like what is coded below.

In [None]:
MSDFSub.to_csv("MSDFSub.csv")

4. Ensure your new csv file is somewhere you can find it!
> The new file will appear in the side "Files" menu. I downloaded mine and added it to my Colab Notebooks Google Drive Folder!
5. Celebrate, you did it!

---

##***Congratulations! You've successfully filtered your data and created a new subset from it!***