# Programming for Data Analysis Assignment 2

Author - Sean Humphreys

## Contents

1. [Problem Statement](#problem-statement)

1. [Software Libraries](#software-libraries)

2. [CO2 v Temperature Anomaly 800yrs to Present](#co2-vs-temperature-anomaly-800k-yr---present)

    3.1. [Data Cleansing](#data-cleansing)

2. [References](#references)

3. [Associated Reading](#associated-reading)

---

## Problem Statement <a id="problem-statement"></a>

+ Analyse CO2 vs Temperature Anomaly from 800kyrs – present.

+ Examine one other (paleo/modern) features (e.g. CH4 or polar ice-coverage)

+ Examine Irish context:
    
    + [Climate change signals](/literature/the_emergence_of_a_climate_change_signal_in_long_term_irish_meteorological_observations.pdf) : (see Maynooth study: The emergence of a climate change signal in long-term Irish meteorological observations - ScienceDirect)

+ Fuse and analyse data from various data sources and format fused data set as a pandas dataframe and export to csv and json formats

+ For all of the above variables, analyse the data, the trends and the relationships between them (temporal leads/lags/frequency analysis).

+ Predict global temperature anomaly over next few decades (synthesise data) and compare to published climate models if atmospheric CO2 trends continue

+ Comment on accelerated warming based on very latest features (e.g. temperature/polar-ice-coverage)

---

## Software Libraries <a id="software-libraries"></a>

- [Matplotlib](https://matplotlib.org/) (https://matplotlib.org/ - last accessed 13 Dec. 2023) - is an open-source software library for creating static, animated, and interactive visualisations in Python.

- [Pandas](https://pandas.pydata.org/) (https://pandas.pydata.org/ - last accessed 3 Nov. 2023) is an open-source software library used in data analytics that allows data analysis and manipulation. Pandas is built on top of the Python programming language. A Pandas DataFrame is a dictionary like container for series objects. A DataFrame is the primary Pandas data structure.

In [1]:
# import the required software libraries
import pandas as pd
import matplotlib.pyplot as plt

---

## CO2 vs Temperature Anomaly 800k Yr - Present <a id="CO2-vs-Temperature-Anomaly-800k-Yr---Present"></a>

The Pandas software library is used to clean and process datasets. The dataset csv file is read into Pandas as a Pandas DataFrame.

### Data Cleansing <a id="data-cleansing"></a>

The most recent CO2 data is available from the [Global Monitoring Laboratory](https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_annmean_mlo.csv) (https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_annmean_mlo.csv last accessed 13 Dec. 2023).[1]

This dataset covers a period from 1959-2022.

Using Pandas the Comma Separated Value (CSV) file can be read in as a DataFrame.

[1] Dr. Pieter Tans, NOAA/GML (gml.noaa.gov/ccgg/trends/) and Dr. Ralph Keeling, Scripps Institution of Oceanography (scrippsco2.ucsd.edu/)

In [3]:
# https://gml.noaa.gov/ccgg/trends/data.html
mauna_loa = pd.read_csv('https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_annmean_mlo.csv', skiprows=43)

The columns in the dataset are renamed to logical names.

In [4]:
# code adapted from # https://sparkbyexamples.com/pandas/rename-columns-with-list-in-pandas-dataframe/
cols = ['year', 'co2_ppmv', 'unc']

mauna_loa.columns = cols

An unnecessary column is removed from the dataset.

In [5]:
# code adapted from https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html [Accessed 13 Dec. 2023]
mauna_loa.drop(['unc'], axis=1, inplace=True)

An additional column is added to the dataset that calculates the year no before the 2023.

In [6]:
mauna_loa['years_before_present'] = 2023 - mauna_loa['year']

# sort the data based on the year before present. Based on code from - https://saturncloud.io/blog/how-to-sort-pandas-dataframe-from-one-column/ [Accessed 13 Dec. 2023].
mauna_loa = mauna_loa.sort_values('years_before_present')

The columns in the dataset are reordered.

In [7]:
mauna_loa = mauna_loa.reindex(columns=['yr_bp', 'co2_ppmv', 'year', 'years_before_present'])

In [2]:
co2_temp = pd.read_excel('datasets/historic/grl52461-sup-0003-supplementary.xls')

---

## Examine one other (paleo/modern) feature

---

## Irish context

---

## Fused Dataset

---

## Data Analysis

---

## Predictive Model

---

## References <a id="references"></a>

Naveen (2022). How to Rename Columns With List in Pandas. [online] Spark By {Examples}. Available at: https://sparkbyexamples.com/pandas/rename-columns-with-list-in-pandas-dataframe/ [Accessed 13 Dec. 2023].

pandas.pydata.org. (n.d.). pandas.DataFrame.drop — pandas 1.2.4 documentation. [online] Available at: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html. [Accessed 13 Dec. 2023].

saturncloud.io. (2023). How to Sort Pandas DataFrame by One or Multiple Column | Saturn Cloud Blog. [online] Available at: https://saturncloud.io/blog/how-to-sort-pandas-dataframe-from-one-column/ [Accessed 13 Dec. 2023].

---

## Associated Reading <a id="associated-reading"></a>

Matplotlib (2012). Matplotlib: Python plotting — Matplotlib 3.1.1 documentation. [online] Matplotlib.org. Available at: https://matplotlib.org/. [Accessed 13 Dec. 2023].

Pandas (2018). Python Data Analysis Library — pandas: Python Data Analysis Library. [online] Pydata.org. Available at: https://pandas.pydata.org/. [Accessed 13 Dec. 2023].

---

Notebook Ends