## EPS/ESE 135: Observing the Ocean
### Data Analysis Assignment 4 (Intro): Sea level records

For this assignment, you will download and analyze a sea level record of your choice from the [Permanent Service for Mean Sea Level (PSMSL)](https://psmsl.org).

### Accessing tide gauge records via PSMSL

The National Oceanography Centre of the UK maintains a [database of tide gauge records](https://psmsl.org/data/obtaining/) from around the world. You can choose any location you'd like provided it has about 50+ years of data (this is a rough guideline, it's okay if you pick somewhere with a slightly shorter record) and also includes real time data access. For example, here is a screenshot of the station page for the Boston tide gauge:

![screenshot of Boston tide gauge station](PSMSL_screenshot.png)

At the top, the listed time span is 1921-2023 (102 years). You can also see a plot of monthly data showing that there is good data coverage over the past century. You can download the monthly data next to the arrow labeled `2`.

You can also see next to the arrow labeled `1` it says "Nearby real time stations from VLIZ" with a link to "boma" (shorthand for Boston, MA...). You will use this for the first few parts of the assignment, so check to make sure the station you choose has real time data.

### 1. Short term sea level changes

Click the link next to arrow #1 for your chosen location. Here's what the real-time data viewer should look like:

![real time data view for Boston](boston_realtime.png)

You will use this interface to plot different timeframes (under "Period" in the lower left),  remove spikes/outliers if needed (under "Signals" just to the right), and then add the plots to your assignment by downloading the images to the folder containing your .ipynb file.

### 2. Long-term sea level changes

Then you will download the monthly-average sea level data by clicking the link next to arrow #2. The files downloaded from this site have the extension `.rlrdata`. Open your data file in a text editor. It should look something like this:

![rlrdata screenshot](data_screenshot.png)

Even though it has a different file extension, this is functionally a csv-type file, except it is delimited by semicolons instead of commas. 

The first column is "decimal years" where each month is represented by the year + 1/12. (You could convert these to datetime but these values will work well for analyzing long-term trends.) 

The second column is sea level in millimeters. (These specific values are called the [Revised Local Reference](https://psmsl.org/data/obtaining/rlr.php) -- it's not important to understand the details but if you want them they're there.)

The final two columns are some kind of data flag that we can ignore.

Here's a suggestion of how to read in the file:

In [None]:
# import the usual python libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# this uses the same read_csv function to read in the file
# because columns are separated by semicolon, need to add the 'delimeter' argument
# the file doesn't include column headers, so I add them manually with 'names'
#    the only columns we really need are year and sea level (RLR)
#    the last 2 column names are just placeholders
data = pd.read_csv('[station_number].rlrdata',delimiter=';',names=['year','SL','x','xx'])

# quick plot to visualize data
data.plot(x='year',y='SL')
plt.ylabel('sea level [mm]')
plt.xlabel('year')

If you run this code (or look closely at the screenshot above) you will see there are some values of -99999. These are placeholders for missing data, which will really throw off your plots and calculations. One quick way to remove them is:

In [None]:
# create a new version of dataframe that drops negative values of RLR
data_clean = data[data.SL>=0]
data_clean.plot(x='year',y='SL')

One simple way to smooth a timeseries is to calculate a rolling mean, also known as a boxcar filter. You choose a window size and this window is effectively dragged along the timeseries and the mean value inside the window is calculated at each step. 

To calculate the rolling mean, use the [pandas function](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html) `.rolling(_____).mean()`:

In [None]:
# to calculate a 10-year mean, average over 10 years * 12 months/year = 120 months
n_months = 120

# the arguments to .rolling() are:
#   window=n_months (because the time interval for these data is monthly)
#   center=True (so the mean is plotted at the center of each window)
#   min_periods=n_months/2 (this will do the calculation for windows that are only half full, 
#                   to avoid gaps at the edges)
data_clean['SL_10yr'] = data_clean['SL'].rolling(window=n_months, center=True, min_periods=n_months//2).mean()

# plot the monthly data together with the rolling mean
# give each line a label so that we can get a useful legend
plt.plot(data_clean['year'],data_clean['SL'],label='monthly mean')
plt.plot(data_clean['year'],data_clean['SL_10yr'],label='10-year rolling mean')
plt.ylabel('sea level [mm]')
plt.xlabel('year')
plt.legend()
plt.title('Tide gauge record from Boston')

To calculate the linear trend, use `numpy.polyfit()` ([documentation here](https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html)) to fit a 1st-order polynomial to your data. This function will return a slope and intercept value that you can use to calculate and plot the best fit line.

In [None]:
# choose order of polynomial to fit -- for a linear trend this is 1
poly_order = 1

# inputs to np.polyfit are (x-data, y-data, order)
# returns slope m and intercept b
m, b = np.polyfit(data_clean['year'], data_clean['SL'], poly_order)

print(f'The linear trend over this record is {m:.2f} mm/year.')

In [None]:
# calculate trend line over years in timeseries
linfit = m*data_clean['year'] + b

# re-plot lines from previous section
plt.plot(data_clean['year'],data_clean['SL'],label='monthly mean')
plt.plot(data_clean['year'],data_clean['SL_10yr'],label='10-year rolling mean')

# add trend line to plot
plt.plot(data_clean['year'],linfit,color='pink',label='linear trend')

# add labels, title, legend
plt.ylabel('sea level [mm]')
plt.xlabel('year')
plt.legend()
plt.title('Tide gauge record from Boston')

To do these calculations for shorter periods within the timeseries, you can save a new dataframe for each section and then apply the same steps to each.