<a href="https://colab.research.google.com/github/mrsferret/Code-Division/blob/main/Projects/Sea_Level_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using numpy to look for a correlation between time data and sea level rise
---

### Data Source
Global Average Absolute Sea Level Change, 1880-2014 from the US Environmental Protection Agency using data from CSIRO, 2015; NOAA, 2015.
https://datahub.io/core/sea-level-rise

The data describes annual sea levels from 1880 to 2013.  Measures are adjusted using two standards: Commonwealth Scientific and Industrial Research Organisation(CSIRO) and National Oceanic and Atmospheric Administration (NOAA)  

Raw Data file:  https://raw.githubusercontent.com/freeCodeCamp/boilerplate-sea-level-predictor/master/epa-sea-level.csv

For this exercise:
*  import the pandas library
*  import the numpy library
*  read the csv dataset containing data on sea-levels from the year 1880 to 2013 into a dataframe (df)
*  use df.head() and df.info() to inspect the data and the column data types



In [6]:
import pandas as pd
import numpy as np

def calc_sea_level_stats(df):
  #============================================================================
  # Create 2 numpy arrays from input df - one of Years and other of Sea Levels
  #============================================================================
  year_array = df['Year'].to_numpy(np.int16)
  #display("\nYear Array: ", year_array)

  level_array = df['CSIRO_Adj_Sea_Level'].to_numpy()
  #display("\nLevel Array: ", level_array)

  #============================================================================
  # calculate the mean Sea Level change
  #============================================================================
  level_mean = np.mean(level_array)
  print("\nmean of level_array, axis = None : ", np.mean(level_array)) 

  #============================================================================
  # calculate the standard deviation of the variation in Sea Levels
  #============================================================================
  level_std = np.std(level_array)
  print("\nstd of level_array, axis = None : ", np.std(level_array))
  
  #============================================================================
  # retrieve and save the max sea level rise
  #============================================================================
  max_rise = np.max(level_array)
  print("\nmax_rise: ", max_rise)

  #============================================================================
  # retrieve and save the min sea level rise
  #============================================================================
  min_rise = np.min(level_array)
  print("\nmin_rise: ", min_rise)

  #============================================================================
  # find the index of element in level array with value = max_rise 
  # and use this to find corresponding year in Year array
  #============================================================================
  # find the index to the element with max sea level rise
  arr_index = np.argmax(level_array)
  print("\narr_index: ", arr_index)

  max_rise_year = year_array[arr_index]
  print("\nmax_rise_year: ", max_rise_year)

  # retrieve and save the max sea level rise
  max_rise = np.max(level_array)
  print("\nmax_rise: ", max_rise)

  #============================================================================
  # find the index of element in level array with value = min_rise 
  # and use this to find corresponding year in Year array
  #============================================================================
  # find the index to the element with max sea level rise

  arr_index = np.argmin(level_array)
  print("\nmin arr_index: ", arr_index)

  min_rise_year = year_array[arr_index]
  print("\nmin_rise_year: ", min_rise_year)
  #===========================================================================
  # Calculate the Pearson product-moment correlation coefficient 
  #===========================================================================
  pearson_corr_ceof = np.corrcoef(year_array, level_array)
  print("\npearson_corr_ceof: ", pearson_corr_ceof)

#===============================================================================
# Start Here
#===============================================================================
# read csv file into a dataframe
url = 'https://raw.githubusercontent.com/freeCodeCamp/boilerplate-sea-level-predictor/master/epa-sea-level.csv'
df = pd.read_csv(url)

# Change the column names
df.columns =['Year', 'CSIRO_Adj_Sea_Level', 'Lower_Err_Bound', 'Upper_Err_Bound', 'NOAA_Adj_Sea_Level']

# get some info about the dataframe
print (df.info())
print (df.head())
print (df.describe())

# get expected values to check against
max_df = (df[df.CSIRO_Adj_Sea_Level == df.CSIRO_Adj_Sea_Level.max()])
expctd_max_sea_rise = max_df.CSIRO_Adj_Sea_Level
expctd_max_sea_rise_yr = max_df.Year

print ("\nmax sea rise: ", expctd_max_sea_rise)
print ("\nmax sea rise year: ", expctd_max_sea_rise_yr)

# Go and convert df to numpy arrays and calculate some stats 
calc_sea_level_stats(df)



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 134 entries, 0 to 133
Data columns (total 5 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Year                 134 non-null    int64  
 1   CSIRO_Adj_Sea_Level  134 non-null    float64
 2   Lower_Err_Bound      134 non-null    float64
 3   Upper_Err_Bound      134 non-null    float64
 4   NOAA_Adj_Sea_Level   21 non-null     float64
dtypes: float64(4), int64(1)
memory usage: 5.4 KB
None
   Year  CSIRO_Adj_Sea_Level  Lower_Err_Bound  Upper_Err_Bound  \
0  1880             0.000000        -0.952756         0.952756   
1  1881             0.220472        -0.732283         1.173228   
2  1882            -0.440945        -1.346457         0.464567   
3  1883            -0.232283        -1.129921         0.665354   
4  1884             0.590551        -0.283465         1.464567   

   NOAA_Adj_Sea_Level  
0                 NaN  
1                 NaN  
2                 NaN 



---
**Feedback Notes**


---



The Pearson product-moment correlation coefficient or the Pearson coefficient correlation r, determines the strength of the linear relationship between two variables (in this case sea level movement over the years). It assigns a value between -ve 1 and 1, where 0 is no correlation, 1 is total positive correlation, and -ve 1 is total negative correlation. Generally a correlation value of 0.7 between two variables would indicate that a significant and positive relationship exists between the two. With a positive correlation of 0.98. Obviously much more analysis would need to be done to verify this (or not) but such a +ve correlation is a strong indication that sea level is rising over time. 

### Then
---
1.  Calculate some statistics on the level array, eg:
*  mean
*  standard deviation
*  total 

2.  Use the fact that the arrays are aligned (e.g. the first number in the level array is linked to the first year in the year array and display:

*  the year with the biggest rise in level
*  the year with the lowest rise in level

*(**Hint**:  to do this you can use a new numpy function np.where() )*
 ```
np.where(array == value_to_find)
```
*There is some reference material [here](https://thispointer.com/find-the-index-of-a-value-in-numpy-array/)*

**Note**: ```np.where(...)``` will return a tuple containing all indexes where that value was found.  You can print all, or you can print the first value (it is likely that there will only be one in this case) using [0][0].  *With the correct code you should get an answer of 2012*


3.  Calculate the Pearson product-moment correlation coefficient between year and the rise in sea level.  (*Expected output:  0.98 when rounded to 2 decimal places*)

In [None]:
display(df.describe())

Unnamed: 0,Year,CSIRO Adjusted Sea Level,Lower Error Bound,Upper Error Bound,NOAA Adjusted Sea Level
count,134.0,134.0,134.0,134.0,21.0
mean,1946.5,3.650341,3.204666,4.096016,7.363746
std,38.826537,2.485692,2.663781,2.312581,0.691038
min,1880.0,-0.440945,-1.346457,0.464567,6.297493
25%,1913.25,1.632874,1.07874,2.240157,6.84869
50%,1946.5,3.312992,2.915354,3.71063,7.488353
75%,1979.75,5.587598,5.329724,5.845472,7.907365
max,2013.0,9.326772,8.992126,9.661417,8.546648


# Reflection
----

## What skills have you demonstrated in completing this notebook?

Your answer: 

## What caused you the most difficulty?

Your answer: 