# Project 1 - Planet Earth's Global Temperature


In this project we will download temperature data from NASA’s Goddard Institute for Space Studies (**GISS**) and analyze it using **pandas** and **seaborn**.

Include your name below:


### Introduction

First, we must understand the data collection method. This method was original documented by Hansen and Lebedeff in [1987](https://pubs.giss.nasa.gov/docs/1987/1987_Hansen_ha00700d.pdf) and the study has been repeated many times including in [2010](https://pubs.giss.nasa.gov/docs/2010/2010_Hansen_ha00510u.pdf). A summary of the data collection method is as follows:
1. Thousands of meteorological stations around the world (land and sea) measure the surface temperature. All measurements discussed here will be in Celsius.
1. The world average temperature throughout every month for all stations is calculated. Systematic affects are corrected and uncertainty is estimated.
1. The world average temperature every month for all meteorological stations during the three decade period 1951-1980 is defined to be the "reference monthly average temperature"
1. The **tempeature anomaly** is defined:
   * "tempeature anomaly" = "world monthly average temperature" - "reference monthly average temperature"

### Accessing the data

The GISS data can be found here: https://data.giss.nasa.gov/gistemp/

From this link please find and download the CSV file
"Global-mean monthly, seasonal, and annual means, 1880-present, updated through most recent month". The file name is GLB.Ts+dSST.csv.

Use panda's `read_csv()` with the option `skiprows=1` to read in the data and setup a dataframe.

### Data analysis Instructions

1. Data inspection - Take a look at the dataframe you have created using `df.tail()` where df is the name I have given to my dataframe.
    * You should see a "Year" column and a column for every month ("Jan", "Feb", etc.).
        * There are also columns for averages:
            * "J-D" = 1 year average temperature (January -> December)
            * "D-N" = 1 year average temperature (December -> November)
            * "DJF" = 3 month average temperature (Dec, Jan, Feb)
            * "MAM" = 3 month average temperature (Mar, Apr, May)
            * "JJA" = 3 month average temperature (Jun, Jul, Aug)
            * "SON" = 3 month average temperature (Sep, Oct, Nov)<br><br>
            
2. Setup - Set the "Year" column as the dataframe index<br><br>

3. Data cleaning - Note that the 2021 data is very recent (up to August 2021) and there is data missing for September->December. The missing data appears as three stars ("`***`"). We need to clean this up by replacing the stars with something that pandas understands: "NaN" = Not a Number. Also we should make sure that all datapoints are floats and not strings. We can accomplish this using the following lines (where df is the name I have given to the dataframe):
~~~~
df = df.replace("***", "NaN")
df = df.astype(float)
~~~~
<br><br>

4. Make a new dataframe "df_drop" containing only the month data (dropping the averages 'J-D','D-N','DJF','MAM','JJA', and 'SON' using the pandas function `drop()`).<br><br>

5. Make plots using your dataframe df_drop<br><br>
    1. Make a plot of the January Temperature Anomaly (column "Jan"). The plot should contain the temperature anomaly in degrees celsius on the y-axis and the year on the x-axis. Basically you are plotting the data contained in one column of the dataframe vs the index of the dataframe (the year). Please label your axes and include temperature units. Note, you can make bigger plots and label the axes using the following syntax (example from a previous notebook):
~~~
fig, ax = plt.subplots(1, 1, figsize=(12, 8))
df_pop3.loc["United Kingdom"].NumericPopulation.plot(ax=ax,kind='bar')
ax.set_ylabel("Population")
~~~
<br><br>
    1. Repeat the same plot but with option `kind='bar'`<br><br>
    1. Make a new column in your new dataframe with column name "Yearly Avg" containing the 12-month average temperature for the year and plot this column ("Yearly Average Temperature Anomaly" vs "Year"). Calculate this average on your own (instead of relying on the "J-D" column which we just dropped). To do this you can use the pandas [mean()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mean.html) function . My recommendation: Read the documentation linked above so that you can choose the option of taking the mean of a dataframe row instead of a column (option="axis").<br><br>
    1. Again plot "Temperature Anomaly" Vs "Year", this time as a line graph. Plot the temperature anomaly for all 12 months and for the yearly average. You should have a total of 13 lines drawn. Draw the yearly average line with a larger line width and with color black in order to make it stand out. Please label your axes and draw a legend.<br><br>
    1. This time we want to plot "Temperature Anomaly" Vs "Month". We will do this for every every year (so there will be hundreds of lines drawn). The easiest way to do this is to use the transpose() function to convert the dataframe indeces to columns and the columns to indeces. I also recommend setting the color palette using plot() option `cmap="BuPu"`. This will ensure that one can tell the old years from the more recent years. You may also want to draw the legend off to the side of the plot using the following syntax
~~~
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
~~~
    <br><br>
    1. Use seaborn to plot the dataframe as a heatmap. The y-axis should be the month and the x-axis should be the year. The z-axis (the color of the heatmap) should represent the temperature anomaly. Again the easiest way to make the y-axis the month is to use the transpose() function on the dataframe. Choose a different color palette which you think will be a good way of representating the data https://matplotlib.org/examples/color/colormaps_reference.html<br><br>
    
    
    
    
### Code:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#  sometimes necessary for plotting with Jupyter+pandas+pyplot
%matplotlib inline

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>