<h1>EOSC 442 - Final Project</h1>
Members: Isaiah Youm, Bernice Huynh, Ting Gu, Yicheng Ma

Research Objective: // TODO 


In [41]:
# Import libraries to manage data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime

<hr>
<h3>List of Functions Used</h3>

In [42]:
# Converts year and month gathered from dataset into datetime
def convert_to_datetime(year:str, month:str, index:int, DataFrame:pd.DataFrame) -> datetime.date:
    """
    year = Name of Column of Year in String (Column from DataFrame)\n
    month = Name of Column of Month in String (Column from DataFrame)\n
    index = the iteration number in a loop\n
    DataFrame = the target pandas.DataFrame\n
    """
    years = str(int(DataFrame.iloc[index][year]))
    months = str(int(DataFrame.iloc[index][month]))
    form_prep = years+'/'+months
    date_time = datetime.strptime(form_prep, '%Y/%m')

    return date_time

<hr>
<h3> Setting up Data for Ice Extent </h3>

The csv file for the ice-extent data are separated by 12 months, which have their respective years from 1979 to 2022.
The year and month, however, are separated and do not have a proper index.
Therefore, when setting up this data, combining the month and year to a useable DateTime index is necessary.

<strong> Throughout this data analysis, all DateTime index should be in the form of YYYY-MM-01

In [43]:
# Setting-up data for Ice-Extent.
# NOTE: The csv file for the ice-extent data are separated by 12 months, which have their respective years from 1979 to 2022


# Create an empty DataFrame for the Ice-Extent data.
# Initiailizing the column names, because the pd.concat() function only works with data
# that have the same parameters/columns.
ice_extent = pd.DataFrame()

# For-loop through the twelve datasets
for num in range(1, 13):
    if num < 10:
        relative_file_path = f'./Ice Extent/N_0{num}_extent_v3.0.csv'
    else:
        relative_file_path = f'./Ice Extent/N_{num}_extent_v3.0.csv'
    data_point = pd.read_csv(relative_file_path, delimiter=',\s+', engine='python')
    # print(data_point)
    ice_extent = pd.concat([ice_extent, data_point], sort=True)

##############

# Have to change the ice_extent data to index it by YYYY_MM

# To do this, we first need to grab the year and mo to DateTime format.
datetime_iceextent = []
for num in range(0, len(ice_extent)):
    date_time = convert_to_datetime('year', 'mo', num, ice_extent)
    datetime_iceextent.append(date_time)

# Add list of datetime made to the Ice Extent DataFrame as 'DateTime'
ice_extent['DateTime'] = datetime_iceextent

# Set using pandas.DataFrame.set_index(column_name)
ice_extent = ice_extent.set_index('DateTime')

# Sort the data by DateTime (ascending chronological order)
ice_extent = ice_extent.sort_index()

# Delete redundant columns (year and mo in this case)
ice_extent = ice_extent.drop('year', axis=1)
ice_extent = ice_extent.drop('mo', axis=1)


<hr>
<h3> Setting up Data for Ice Thickness</h3>

<h5>Data Parameters:</h6>
<hr>
<pre>
AIR-EM_summaries:

- Parameters:
'Year' 'Month' 'Lat' 'Lon' 'Avg_thkns' 'Min_thkns' 'Max_thkns' 'SD_thkns'
- Data Range: 
2001 ~ 2005
</pre>
<hr>
<pre>
CanCoast_summaries:

- Parameters:
'Year' 'Month' 'Lat' 'Lon' 'Avg_thkns' 'Min_thkns' 'Max_thkns' 'SD_thkns' 'Avg_snow' 'Min_snow' 'Max_snow' 'SD_snow'
- Data Range:
1947 ~ 2013
- Missing Value: -999.00
</pre>
<hr>
<pre>
CryoSat-AWI_summaries:

- Parameters:
'Year' 'Month' 'Lat' 'Lon' 'Avg_thkns' 'Min_thkns' 'Max_thkns' 'SD_thkns'
- Data Range:
2010 ~ 2016
</pre>
<hr>
<pre>
IceBridge-QL.summaries:

- Parameters:
'Year' 'Month' 'Lat' 'Lon' 'Avg_thkns' 'Min_thkns' 'Max_thkns' 'SD_thkns' 'Avg_snow' 'Min_snow' 'Max_snow' 'SD_snow'
- Data Range:
2012 ~ 2015
</pre>
<hr>
<pre>
IceBridge-V2.summaries:

- Parameters:
'Year' 'Month' 'Lat' 'Lon' 'Avg_thkns' 'Min_thkns' 'Max_thkns' 'SD_thkns' 'Avg_snow' 'Min_snow' 'Max_snow' 'SD_snow'
- Data Range:
2009 ~ 2013
</pre>
<hr>
<pre>
ICESAT1-G_summaries:

- Parameters:
'Year' 'Month' 'Lat' 'Lon' 'Avg_thkns' 'Min_thkns' 'Max_thkns' 'SD_thkns'
- Data Range:
2003 ~ 2008
</pre>
<hr>
<pre>
ICESAT1-SH_summaries:

- Parameters:
'Year' 'Month' 'Lat' 'Lon' 'Avg_thkns' 'Min_thkns' 'Max_thkns' 'SD_thkns'
- Data Range:
2003 ~ 2008
</pre>

In [44]:
# Setting-up data for Ice Thickness
# There are multiple data sets that we need to add together.
# We will make a DataFrame and designate a common parameter for all of them, then add all the data together.
# (Because pd.concat() only works if the columns are the same)

# Create empty DataFrame that will store all of the data
ice_thickness = pd.DataFrame()

# Create a list that contains the name of the text files we'll iterate.
ice_thickness_names = ['./Ice Thickness/AIR-EM_summaries.txt', './Ice Thickness/CanCoast_summaries.txt', './Ice Thickness/CryoSat-AWI_summaries.txt', './Ice Thickness/IceBridge-QL.summaries.txt', './Ice Thickness/IceBridge-V2.summaries.txt', './Ice Thickness/ICESAT1-G_summaries.txt', './Ice Thickness/ICESAT1-SH_summaries.txt']


# Compound all of the data together
for csvnames in ice_thickness_names:
    df = pd.read_csv(csvnames, usecols=[3, 7, 8, 9, 24, 25, 26, 27], sep='\s+', engine='python')
    ice_thickness = pd.concat([ice_thickness, df], sort=True)

# Set index and 

# Have to change data to index it by YYYY_MM
datetime_icethkness = []
for num in range(0, len(ice_thickness)):
    date_time = convert_to_datetime('Year', 'Month', num, ice_thickness)
    datetime_icethkness.append(date_time)
ice_thickness['DateTime'] = datetime_icethkness

# Set index to DateTime
ice_thickness = ice_thickness.set_index('DateTime')


# Sort the data by DateTime (ascending chronological order)
ice_thickness = ice_thickness.sort_index()

# Delte the Year and Month columns as they are now unnecessary
ice_thickness = ice_thickness.drop('Year', axis=1)
ice_thickness = ice_thickness.drop('Month', axis=1)

print(ice_thickness)


            Avg_thkns    Lat     Lon  Max_thkns  Min_thkns  SD_thkns
DateTime                                                            
1947-09-01       0.12  79.98  -85.95       0.16       0.08      0.06
1947-10-01       0.33  74.72  -94.98       0.38       0.28      0.05
1947-10-01       0.35  79.98  -85.95       0.47       0.23      0.10
1947-11-01       0.54  74.72  -94.98       0.61       0.43      0.08
1947-11-01       0.66  79.98  -85.95       0.76       0.53      0.10
...               ...    ...     ...        ...        ...       ...
2016-02-01       1.26  61.70  -92.07       1.49       0.89      0.25
2016-02-01       1.33  71.80  -93.02       1.39       1.24      0.06
2016-02-01       2.27  77.17  -94.53       2.45       2.01      0.20
2016-02-01       3.37  81.72  -97.03       3.90       3.13      0.30
2016-02-01       1.89  76.24  163.35       2.06       1.73      0.15

[184728 rows x 6 columns]


<hr>
<h3> Setting up Data for Precipitation </h3>

<strong>cmap-mean.csv</strong> has LOTS of data that are unnecessary for this data analysis.

We only need data from lat 60 to 90 since that's the average lattitude we're searching for.
This is because the Arctic is around 76' latttiude, and we want some leniency in the collection of our data.
Therefore, we gather from 60' lat to 90' lat.

In [45]:
# Setting up data for Precipitation

# Reading the raw cmap-means.csv.
# Objective: Filter out all data outside of lattitude 60 ~ 90
precipitation = pd.read_csv('./Precipitation/cmap-mean.csv')
print(f"{len(precipitation)} = total number of data in original csv file.")


# There's more data thats outside of the 60 ~ 90 range. 
# To optimize the code, it is better to make the if condition 60 and below.

for num in range(0, len(precipitation)):
    if precipitation.iloc[num]['lat'] < 60:
        precipitation = precipitation[:num]

        break

print(f"{len(precipitation)} = the number of leftover data-points after filter")

# Need to set the YYYY-MM-DD as the index for the precipitation data.
# Doing this now, because iterating through a DateTime is difficult for the procedures above.
precipitation = precipitation.set_index('time')

# Sort the data by DateTime (ascending chronological order)
precipitation = precipitation.sort_index()


5432832 = total number of data in original csv file.
905472 = the number of leftover data-points after filter


<h4 style='color:cyan;'> Only run this code if you need to save a new .csv file (with <strong>filtered</strong> data from <strong>cmap-mean.csv</strong>))</h4>

Export to a csv file with wanted range of data. <br>
Un-necessary, but am doing this so we don't have a humongous data set to download.<br>
<p><code>precipitation.to_csv("./Precipitation/filtered_precipitation_data.csv", index=False)</code></p>

<hr>
<h3>Data Analysis - Plotting PARAMETER1 vs PARAMETER2</h3>