# Pandas and numpy - pair-up

With a partner of your choosing complete the following in a jupyter notebook.

1. Read the following data into a pandas data frame
 ` ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt`
 
2. Select columns 0,1,3 
`[[0, 1, 3]]`

3. Use a for loop to find all rows where 
Co2 (column 3) enteries with the value -99.99 (these are missing values) and replace them with NaN values (try using np.nan - do you know what it is? )

4. Change names of columns to year, month, and CO2 (use colnames)

5. Add a column 'Day' and specifiy the day 15 for all enteries

6. Add a date column according to the 'year', 'month' and 'day' columns (options: use apply with lambda or for loop together with datetime.date (make sure to import it)) 

7. Drop the 'Day' column


### Solution

First, we will import the linked text file into a pandas dataframe. We need to use a different separator.

In [2]:
import pandas as pd
import numpy as np
from datetime import date

In [3]:
co2_df = pd.read_csv('ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt', 
                     sep=r'\s+', comment='#', header=None, index_col=False,
                     names=['year', 'month', 'dec_date', 'average', 'interpolated', 'trend', 'days'])
co2_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 737 entries, 0 to 736
Data columns (total 7 columns):
year            737 non-null int64
month           737 non-null int64
dec_date        737 non-null float64
average         737 non-null float64
interpolated    737 non-null float64
trend           737 non-null float64
days            737 non-null int64
dtypes: float64(4), int64(3)
memory usage: 40.4 KB


In [4]:
co2_df.head(10)

Unnamed: 0,year,month,dec_date,average,interpolated,trend,days
0,1958,3,1958.208,315.71,315.71,314.62,-1
1,1958,4,1958.292,317.45,317.45,315.29,-1
2,1958,5,1958.375,317.5,317.5,314.71,-1
3,1958,6,1958.458,-99.99,317.1,314.85,-1
4,1958,7,1958.542,315.86,315.86,314.98,-1
5,1958,8,1958.625,314.93,314.93,315.94,-1
6,1958,9,1958.708,313.2,313.2,315.91,-1
7,1958,10,1958.792,-99.99,312.66,315.61,-1
8,1958,11,1958.875,313.33,313.33,315.31,-1
9,1958,12,1958.958,314.67,314.67,315.61,-1


In [5]:
co2_df_subset = co2_df.iloc[:,[0, 1, 3]]

In [6]:
co2_df_subset.head()

Unnamed: 0,year,month,average
0,1958,3,315.71
1,1958,4,317.45
2,1958,5,317.5
3,1958,6,-99.99
4,1958,7,315.86


In [7]:
co2_df.loc[co2_df['average'] == -99.99, 'average'] = np.nan

In [8]:
co2_df.head()

Unnamed: 0,year,month,dec_date,average,interpolated,trend,days
0,1958,3,1958.208,315.71,315.71,314.62,-1
1,1958,4,1958.292,317.45,317.45,315.29,-1
2,1958,5,1958.375,317.5,317.5,314.71,-1
3,1958,6,1958.458,,317.1,314.85,-1
4,1958,7,1958.542,315.86,315.86,314.98,-1


In [9]:
co2_df.rename(columns={'average': 'CO2'}, inplace=True)

In [10]:
co2_df.head()

Unnamed: 0,year,month,dec_date,CO2,interpolated,trend,days
0,1958,3,1958.208,315.71,315.71,314.62,-1
1,1958,4,1958.292,317.45,317.45,315.29,-1
2,1958,5,1958.375,317.5,317.5,314.71,-1
3,1958,6,1958.458,,317.1,314.85,-1
4,1958,7,1958.542,315.86,315.86,314.98,-1


In [11]:
co2_df['Day'] = 15
co2_df.head()

Unnamed: 0,year,month,dec_date,CO2,interpolated,trend,days,Day
0,1958,3,1958.208,315.71,315.71,314.62,-1,15
1,1958,4,1958.292,317.45,317.45,315.29,-1,15
2,1958,5,1958.375,317.5,317.5,314.71,-1,15
3,1958,6,1958.458,,317.1,314.85,-1,15
4,1958,7,1958.542,315.86,315.86,314.98,-1,15


In [12]:
change_date = lambda row: date(year = int(row.year), month = int(row.month), day = int(row.Day))

co2_df['date'] = co2_df.apply(change_date, axis=1)

In [13]:
co2_df.drop(columns = 'Day', inplace=True)

In [14]:
co2_df.head(15)

Unnamed: 0,year,month,dec_date,CO2,interpolated,trend,days,date
0,1958,3,1958.208,315.71,315.71,314.62,-1,1958-03-15
1,1958,4,1958.292,317.45,317.45,315.29,-1,1958-04-15
2,1958,5,1958.375,317.5,317.5,314.71,-1,1958-05-15
3,1958,6,1958.458,,317.1,314.85,-1,1958-06-15
4,1958,7,1958.542,315.86,315.86,314.98,-1,1958-07-15
5,1958,8,1958.625,314.93,314.93,315.94,-1,1958-08-15
6,1958,9,1958.708,313.2,313.2,315.91,-1,1958-09-15
7,1958,10,1958.792,,312.66,315.61,-1,1958-10-15
8,1958,11,1958.875,313.33,313.33,315.31,-1,1958-11-15
9,1958,12,1958.958,314.67,314.67,315.61,-1,1958-12-15
