## Final Project: Reading the sea-ice parameters
### Importing a .csv file into a dataframe and array 
The imported file is a .csv file obtained from the .nc format. This contains two variables namely, Sea Ice Thickness (SIT) and Zonal Wave 3 Index (ZW3) which have been averaged over the global grids. 

In [126]:
import numpy as np
import pandas as pd

path = 'sea_ice_parameters.csv'
df = pd.read_csv(path)
sea_ice=df.to_numpy()

df


Unnamed: 0,Date,SIT,Zonal_Wave_3
0,10-02,-4985,0.381
1,5-03,-33087,0.262
2,6-03,-9663,-0.788
3,7-03,12339,0.0
4,8-03,16822,0.177
5,9-03,14002,-0.888
6,10-03,-18128,-0.707
7,5-04,-28029,0.273
8,6-04,423,0.518
9,7-04,23952,-0.923


In [129]:
print("Dimensions of the csv file: \n\n",sea_ice.shape)


Dimensions of the csv file: 

 (55, 3)


In [120]:
print("Average of SIT: \n\n",np.mean(sea_ice[:,1]))


print("\n\n Average of Zonal Wave 3: \n\n",np.mean(sea_ice[:,2]))

Average of SIT: 

 0.03636363636363636


 Average of Zonal Wave 3: 

 -0.09767272727272726


### Average:
The mean values give an idea of the central mass of the variable. 

### Standard Deviation 
The SD values give an idea about the variabilities in the parameter. In this case, a very high value of Sea ice Thickness (SIT) means that the value differs a lot from its mean value wiht a very high variability observed over the given time period. 

In [124]:
print("Standard Deviation of SIT: \n\n",np.std(sea_ice[:,1]))

print("\n\n Standard Dev. of Zonal Wave 3: \n\n",np.std(sea_ice[:,2]))

Standard Deviation of SIT: 

 23333.009911409852


 Standard Dev. of Zonal Wave 3: 

 0.582585617727962


### Sorting the columns to see when the maximum and minimum SIT values were reached within a year as well as during the entire time period.
The sorting gives us the values which we were expecting from the dataset. The start of the winters in southern hemisphere (i.e. May and June) will show the minimum SIT values (anomalies which are described as variaiton from the mean value). The negative anomaly values mean that the thickness is very less as the snow is yet to fall and get accumulated. The maximum or positive SIT anomalies are seen by the end of winters or start of spring (August-September) when the snow season ends and the accumulation is maximum.  

In [130]:
df.sort_values(by='SIT')

Unnamed: 0,Date,SIT,Zonal_Wave_3
25,5-07,-47351,0.013
19,5-06,-46043,-0.087
49,5-11,-41217,-0.823
13,5-05,-38101,-0.927
43,5-10,-37565,0.259
31,5-08,-33366,0.092
1,5-03,-33087,0.262
37,5-09,-32704,-0.393
7,5-04,-28029,0.273
12,10-04,-20650,-0.189


### Boolean Indexing
#### Selecting data based on the value of a particular column 
This gives us the results similar to the sorting function. But in this case, it helps us to only look at the desired values which are filtered accordingly. 

We notice that its May (05)doesn't have positive anomalies in any of the years while its only in the year 2004 that we see the month of June having a positive SIT anomaly. It can be deduced that the year 2004 experienced an early arrival of winters as well as an early departure since the melting started in October resulting in negative anomalies for that month.  

In [137]:
df[df['SIT'] > 0]

Unnamed: 0,Date,SIT,Zonal_Wave_3
3,7-03,12339,0.0
4,8-03,16822,0.177
5,9-03,14002,-0.888
8,6-04,423,0.518
9,7-04,23952,-0.923
10,8-04,22424,-0.081
11,9-04,13711,-0.344
15,7-05,12548,0.288
16,8-05,16349,-0.018
17,9-05,19395,0.597


### Correlation coefficients 

This statistical operation helps in understanding the relationship between the two variables. In this case, the Sea ice thickness shows about 10% relation with with the Zonal Wave 3.  

In [125]:
print("Correlation between SIT and Zonal Wave3: \n\n", df.corr())

Correlation between SIT and Zonal Wave3: 

                    SIT  Zonal_Wave_3
SIT           1.000000      0.102306
Zonal_Wave_3  0.102306      1.000000
