# My Programming for Data Analytics Project

**By Joanne Feeney**
***

For this project, I will:

- Analyse CO2 vs Temperature Anomaly from 800kyrs – present. 
- Examine one other (paleo/modern) features (e.g. CH4 or polar ice-coverage).
- Examine Irish context e.g. Climate change signals: (see Maynooth study: The emergence of a climate change signal in long-term Irish meteorological observations - ScienceDirect).
- Fuse and analyse data from various data sources and format fused data set as a pandas dataframe and export to csv and json formats.
- For all of the above variables, analyse the data, the trends and the relationships between them (temporal leads/lags/frequency analysis).
- Predict global temperature anomaly over next few decades (synthesise data) and compare to published climate models if atmospheric CO2 trends continue.
- Comment on accelerated warming based on very latest features (e.g. temperature/polar-icecoverage).

Importing different packages that I will use in this notebook

In [None]:
# Imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

Reading in data provided by the lecturer & skipping rows that are not required.

Composite (Co2) data:

In [None]:
# Naming as df1 and reading it into python
df1 = pd.read_csv("data\Temperature_data_from_NOAA_3.csv", skiprows=14)

Jouzel (temperature) data:

In [None]:
# Naming as df2 and reading it into python
df2 = pd.read_csv("data\Temperature_data_from_Jouzel.csv")

Luthi et. al. data already included as part of NOAA csv.

[1]

I begin comparing columns and seeing what information I require for this project. I can see from F.Parrenin et. al.'s paper [2], that EDC3 / yr BP and any other variations of this column name on the other datasets means years before AD1950 which clears things up for me.

In [None]:
# Dropping NaN columns in composite data (Geeksforgeeks)
df1.drop(['Unnamed: 3', 'Unnamed: 4','Unnamed: 5','Unnamed: 6', 'Unnamed: 7'], axis=1, inplace=True)

[3]

Using df1.info() to find out more information about the dataset. From this we can see there are 1901 entries and there are only float data types.

In [None]:
df1.info()

Using df1.describe() to see the maximum, minimum, mean etc. of each of the  variables.

In [None]:
df1.describe()

Using df.isnull().sum() to check if there are any null/NaN entries in the dataset.

In [None]:
df1.isnull().sum()

Adding column names.

In [None]:
# Adding columns
df1.columns =['Years before AD1950', 'Co2 levels', 'Sigma mean of Co2']

By using all of the above quick python codes, we can already make some assumptions about the composite Co2 dataset. We can see that there are some columns with missing values, which we have dropped. Each of the three variables we require has a count of 1901. 

We only have one types of variable, floats. There are no null values in the three columns that we will be using for Co2.

Plotting the entire dataset Co2 history to compare levels in the atmosphere from 800,000 years ago to present day

In [None]:
# Plotting all variables and inverting x axis (Stackoverflow.com)
df1.plot(x='Years before AD1950', y='Co2 levels', title='level of Co2 in atmosphere 800Kyr-present').invert_xaxis();

[4]

Taking a closer look at the last 50 years

In [None]:
# Creating a variable for recent years
df1_present = df1[0:62]

In [None]:
# Plotting from 1950 on and inverting x axis (Stackoverflow.com) (Pythonguides.com)
ax = df1_present.plot(x='Years before AD1950', y='Co2 levels', title='level of Co2 in atmosphere 1950AD-present').invert_xaxis();
plt.xlabel('1950-present', size=20);
plt.tick_params(axis='x', labelbottom=False)

[5]

From the above, we can see that the levels of Co2 in the atmosphere has significantly increased in comparison to any amount of time in the past 800,000 years. The highest Co2 levels we have previously seen on this planet according to this dataset sat at about the 300 mark roughly 330,000 years ago whereas between 1950-present day we have gone way over this maximum and are now sitting at roughly 375.

Now let's look at temperature data.

In [None]:
# Dropping columns data bar year and temperature (Geeksforgeeks)
df2.drop(['bag', 'ztop', 'AICC2012', 'deutfinal', 'acc-EDC3beta'], axis=1, inplace=True)

In [None]:
df2.head()

In [None]:
# Adding columns
df2.columns =['Years before AD1950', 'Temperature (Kelvin)']

In [None]:
# Plotting all variables and inverting x axis (Stackoverflow.com)
df2.plot(x='Years before AD1950', y='Temperature (Kelvin)', title='temperature 800Kyr-present', color='r').invert_xaxis();

Creating a variable which contains both temperature & Co2 data.

In [None]:
# Merging datasets together (Python for MBAs)
df_fused = pd.merge(df1, 
                    df2,
                    left_on='Years before AD1950',
                    right_on='Years before AD1950',
                    how='outer')

[1]

Below is a plot with both Co2 and temperature data on the one graph.

In [None]:
# Plot containing Co2 & temperature data (Datacamp.com & Stackoverflow)
fig, ax = plt.subplots()
ax.plot(df1, df1['Co2 levels'], color='blue')
ax.set_xlabel('Years before AD1950')
ax.set_ylabel('Co2', color='blue')
ax2 = ax.twinx()
ax2.plot(df2, df2['Temperature (Kelvin)'], color='red')
ax2.set_ylabel('Temp', color='red')
plt.gca().invert_xaxis()
plt.show()


[11] & [12]

As we can see above, 

The one other feature that I will be investigating as part of this project will be polar ice coverage.

In [None]:
# Reading in ice dataset that I found online
df_ice_mass_loss = pd.read_csv("data\cumulative-ice-mass-loss-and.csv")

In [None]:
# Dropping columns
df_ice_mass_loss.drop(['Cumulative ice mass loss (Greenland):number', 'Greenland Cumulative ice mass loss:number','Greenland Lower bound:number',
                       'Greenland Upper bound:number', 'Antarctic Cumulative mass loss uncertanty:number', 
                       'Antarctic Lower bound:number', 'Antarctic Upper bound:number'], axis=1, inplace=True)

In [None]:
# Removing numbers after decimal point (Geeksforgeeks.com)
year = df_ice_mass_loss['Year:number']

lst = [] 
for each in year: 
    lst.append(str(each).split('.')[0]) 
  
# Converting to integer data type 
final_list = [int(i) for i in lst]

[9]

In [None]:
df_ice_mass_loss.insert(1, 'Year', final_list)

In [None]:
df_ice_mass_loss.drop('Year:number', axis=1, inplace=True)

In [None]:
# Assigning Year column to datetime format
pd.to_datetime(df_ice_mass_loss['Year'], format='%Y');

[10]

In [None]:
# Plotting all variables
df_ice_mass_loss.plot(x='Year', y='Cumulative ice mass loss (Antarctica):number', title='Ice mass loss in Antarctica 1992-2014', color='g');

In [None]:
# Reading in another ice dataset that I found online
df_sea_ice = pd.read_csv("data\Sea_Ice_Index_Monthly_Data_by_Year.csv")

In [None]:
#Dropping NaN column
df_sea_ice.drop('Unnamed: 13', axis=1, inplace=True)

In [None]:
# Assigning Year column to datetime format
pd.to_datetime(df_sea_ice['Year'], format='%Y');

In [None]:
df_sea_ice.head()

In [None]:
# Scatterplot of data
ax = sns.scatterplot(df_sea_ice, x='Year', y='Annual')
ax.set(xlabel='Years', ylabel='Sea Ice Amount', title='Sea ice 1978-present');

For the Irish context, I read: 

"The emergence of a climate change signal in long-term Irish meteorological observations" (https://www.sciencedirect.com/science/article/pii/S2212094723000610#bib13) 

I will speak briefly about what I think the data that I am investigating means for our country.

I have fused the Jouzel and Luthi et. al. (temperature & Co2) data to a variable called df_fused and will export that to csv and JSON formats.

Below, I  will predict global temperature anomaly for the next few decades as best I can and will compare my findings to current published data.

### Conclusion
***

### References

[1] Python for MBAs, Griffel & Guetta, Columbia Business School Publishing, 2021, eBook Academic Collection (EBSCOhost), (https://web.s.ebscohost.com/ehost/ebookviewer/ebook/ZTAwMHh3d19fMjQ1ODcyM19fQU41?sid=9d53254f-59d9-4f57-baa5-1b1ed8837cce@redis&vid=3&format=EB), chapter 7.6 JOINS IN PANDAS, last accessed 20/12/23

[2] The EDC3 chronology for the EPICA Dome C icecore, F.Parrenin et. al., 2007 (https://cp.copernicus.org/articles/3/485/2007/cp-3-485-2007.pdf), last accessed 20/12/23

[3] Geeksforgeeks.com, (https://www.geeksforgeeks.org/how-to-drop-one-or-multiple-columns-in-pandas-dataframe/), last accessed 20/12/23

[4] Stackoverflow.com, (https://stackoverflow.com/questions/28837123/pyplot-reverse-x-axis-and-reverse-table-subplot), last accessed 22/12/23

[5] Pythonguides.com, (https://pythonguides.com/matplotlib-tick-params/), last accessed 22/12/23

[6] Stackoverflow.com, (https://stackoverflow.com/questions/26045779/how-to-turn-all-numbers-in-a-list-into-their-negative-counterparts),  last accessed 22/12/23

[9] Geeksforgeeks.com, (https://www.geeksforgeeks.org/how-to-remove-all-decimals-from-a-number-using-python/), last accessed 29/12/23

[10] Saturncloud.io, (https://saturncloud.io/blog/converting-object-column-in-pandas-dataframe-to-datetime-a-data-scientists-guide/#:~:text=To%20convert%20this%20column%20to,to_datetime()%20method.&text=In%20this%20example%2C%20we%20used,'%20)%2C%20and%20the%20pd.), last accessed 29/12/23

[11] Datacamp.com, (https://campus.datacamp.com/courses/introduction-to-data-visualization-with-matplotlib/plotting-time-series?ex=5#:~:text=Using%20twin%20axes,-00%3A00%20%2D%2000&text=Again%2C%20we%20start%20by%20adding,object%20and%20show%20the%20figure.), last accessed 04/01/23

[12] Stackoverflow.com, (https://stackoverflow.com/questions/2051744/how-to-invert-the-x-or-y-axis), last accessed 04/01/23

### Datasets

[7] European Environment Agency, [cumulative-ice-mass-loss-and.csv] (https://www.eea.europa.eu/data-and-maps/daviz/cumulative-ice-mass-loss-and#tab-dashboard-01), last accessed 27/12/23

[8] NOAA, [Sea_Ice_Index_Monthly_Data_by_Year.csv] (https://noaadata.apps.nsidc.org/NOAA/G02135/seaice_analysis/), Last accessed 27/12/23

***
## The End