# PfDA Assignment 2 2023

## An analysis of paleo-present climate data 

# Table of Contents
1. [Introduction](#overview)  
    - [Problem Statement](#problem-statement)

<img src = images/climate_montage.png alt= "Climate change Images">

<a id="overview"></a>

# 1. Introduction and Project Overview: 

This notebook contains my submission for the Programming for Data Analysis Module 2023 module at ATU as part of the Higher Diploma in Computing and Data Analytics.

<a id="problem-statement"></a>
## Problem statement:

- Analyse CO2 vs Temperature Anomaly from 800kyrs – present. 
- Examine one other (paleo/modern) features (e.g. CH4 or polar ice-coverage) 
- Examine Irish context: o Climate change signals: (see Maynooth study: The emergence of a climate change signal in long-term Irish meteorological observations - ScienceDirect https://www.sciencedirect.com/science/article/pii/S2212094723000610#bib13) 
- Fuse and analyse data from various data sources and format fused data set as a pandas dataframe and export to csv and json formats 
- For all of the above variables, analyse the data, the trends and the relationships between them (temporal leads/lags/frequency analysis). 
- Predict global temperature anomaly over next few decades (synthesise data) and compare to published climate models if atmospheric CO2 trends continue  
- Comment on accelerated warming based on very latest features (e.g. temperature/polar-icecoverage) 

Use a Jupyter notebook for your analysis and track your progress using GitHub. 

Use an academic referencing style 

## Background

Paleoclimatology is the study of previous climates that have existed during Earth's different geologic ages.  We can then use data gathered to try to identify the causes of climate changes that have happened in the past in order to better understand our present and future climate.

Paleoclimatology has also helped scientists study and understand how other environmental factors, such as continental drift, solar energy, greenhouses gases in the atmosphere, and the variation in Earth’s orbit have all affected the climate of Earth over time.

The science of paleoclimatology is vital to our understanding of climate on Earth. As scientists become increasingly aware of how climates have been influenced in the past, they can develop models that help predict how increased carbon dioxide levels and other changes might impact the climate of Earth in the future.

https://education.nationalgeographic.org/resource/paleoclimatology-RL/

## Importing Python libraries and modules

In [1]:
# Importing libraries and modules necessary for this task
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xlrd
import datetime as dt
import math



# CO2 Data

The CO2 data that we're looking at is a composite of atmospheric CO2 records from Antartic ice cores and can be found in this reposoitory as a .xls file [CO2_IPCC](/Data/CO2_IPCC.xls).  The data in this file spans a timeframe of 800k years before present, where present date is 1950.  This version compiled by Bereiter et al. in 2014 replaces the old version of Lüthi et al. (2008), which contains the analytical bias described in the article mentioned above and lower quality data and many other sections.

The ice core data is gathered by drilling ointo ice sheets and extracting ice core samples which are then analysed to identify deposits within the ice, such as pollen and gas.  The information gathered from these ice cores allows paleoclimatologists to better understand atmospheric and climatic conditions that existed when particular layers of the sheet formed.


In [2]:
# Read in the CO2 data from IPCC xls file, skipping the first 14 rows. 
# https://agupubs.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2F2014GL061957&file=grl52461-sup-0003-supplementary.xls

co2_IPCC = pd.read_excel('data/CO2_IPCC.xls', sheet_name='CO2 Composite', skiprows=range(14))

In [3]:
co2_IPCC.head()

Unnamed: 0,Gasage (yr BP),CO2 (ppmv),sigma mean CO2 (ppmv)
0,-51.03,368.022488,0.060442
1,-48.0,361.780737,0.37
2,-46.279272,359.647793,0.098
3,-44.405642,357.10674,0.159923
4,-43.08,353.946685,0.043007


In [4]:
co2_IPCC.describe()

Unnamed: 0,Gasage (yr BP),CO2 (ppmv),sigma mean CO2 (ppmv)
count,1901.0,1901.0,1901.0
mean,242810.270113,235.566624,1.340519
std,274261.195468,35.902698,0.924188
min,-51.03,173.71362,0.01
25%,14606.209,204.826743,0.639335
50%,74525.645,232.456008,1.073871
75%,504177.187879,257.93,1.8
max,805668.868405,368.022488,9.96



https://en.wikipedia.org/wiki/Before_Present

In order to make the data more relateable and easier to compare with the other datasets I will convert the column 'Gasage (yr BP)' to the same year format that is seen in the other datasets.


In [5]:
years = 1950 - co2_IPCC['Gasage (yr BP)']

In [6]:
# co2_IPCC.insert(5, "Year", [years])
co2_IPCC.loc[:, "year"] = (years).astype(int)


In [7]:
print(co2_IPCC.dtypes) 


Gasage (yr BP)           float64
CO2 (ppmv)               float64
sigma mean CO2 (ppmv)    float64
year                       int32
dtype: object


In [8]:
# Read in the CO2 data from Mauna Loa .csv file, skipping the first 43 rows. 
# https://agupubs.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2F2014GL061957&file=grl52461-sup-0003-supplementary.xls

co2_maunaloa = pd.read_csv('data/co2_mauna_loa.csv', skiprows=range(43))

In [9]:
co2_maunaloa.head()

#co2_maunaloa.tail()

Unnamed: 0,year,mean,unc
0,1959,315.98,0.12
1,1960,316.91,0.12
2,1961,317.64,0.12
3,1962,318.45,0.12
4,1963,318.99,0.12


In order to analyse the 2 data sets they will need to be merged.

In [10]:
# https://stackoverflow.com/questions/46066685/rename-the-column-inside-csv-file
co2_maunaloa = co2_maunaloa.rename(columns=({'mean':'CO2 (ppmv)'}))


In [14]:
# https://pandas.pydata.org/docs/user_guide/merging.html
co2_relevant_columns = co2_maunaloa[['year', 'CO2 (ppmv)']]

co2_merged = pd.concat([co2_relevant_columns, co2_IPCC])

In [15]:
print(co2_merged)

        year  CO2 (ppmv)  Gasage (yr BP)  sigma mean CO2 (ppmv)
0       1959  315.980000             NaN                    NaN
1       1960  316.910000             NaN                    NaN
2       1961  317.640000             NaN                    NaN
3       1962  318.450000             NaN                    NaN
4       1963  318.990000             NaN                    NaN
...      ...         ...             ...                    ...
1896 -801975  202.921723   803925.284376               2.064488
1897 -802059  207.498645   804009.870607               0.915083
1898 -802572  204.861938   804522.674630               1.642851
1899 -803182  202.226839   805132.442334               0.689587
1900 -803718  207.285440   805668.868405               2.202808

[1965 rows x 4 columns]


## Resources

https://www.sciencedirect.com/science/article/pii/S2212094723000610#bib13

https://xlrd.readthedocs.io/en/latest/

https://gml.noaa.gov/ccgg/trends/data.html

https://www.met.ie/climate/available-data/long-term-data-sets/


## Background Raading

https://www.ipcc.ch/site/assets/uploads/2018/03/srccs_chapter2-1.pdf

https://education.nationalgeographic.org/resource/paleoclimatology-RL/