# Interactive exploration with widgets - an example

In this example, we work with atmospheric CO$_2$ data acquired at Mauna Loa. We will compare the original data with the seasonally averaged data and see if we can obtain similar results using a moving average. 

Data source: https://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record.html

<img src="https://scrippsco2.ucsd.edu/assets/images/mlo_station_map.png" width=60%>

## Imports

In [None]:
import numpy as np
import pandas as pd
from scipy import signal
import matplotlib.pyplot as plt
import ipywidgets

set a nice font size

In [None]:
from matplotlib import rcParams
rcParams["font.size"]=14

## Load the data into pandas

In [None]:
co2_data_source = "https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/in_situ_co2/monthly/monthly_in_situ_co2_mlo.csv"

In [None]:
# read data and do some data cleaning 
co2_data = pd.read_csv(
    co2_data_source, skiprows=np.arange(0, 56), na_values="-99.99",
)

In [None]:
co2_data

## What are our data? 


> The data file below contains 10 columns.  Columns 1-4 give the dates in several redundant  formats. Column 5 below gives monthly Mauna Loa CO2 concentrations in micro-mol CO2 per  mole (ppm), reported on the 2008A SIO manometric mole fraction scale.  This is the  standard version of the data most often sought.  The monthly values have been adjusted  to 24:00 hours on the 15th of each month.  Column 6 gives the same data after a seasonal adjustment to remove the quasi-regular seasonal cycle.  The adjustment involves  subtracting from the data a 4-harmonic fit with a linear gain factor.  Column 7 is a  smoothed version of the data generated from a stiff cubic spline function plus 4-harmonic  functions with linear gain.  Column 8 is the same smoothed version with the seasonal  cycle removed.  Column 9 is identical to Column 5 except that the missing values from  Column 5 have been filled with values from Column 7.  Column 10 is identical to Column 6   except missing values have been filled with values from Column 8.  Missing values are  denoted by -99.99                                                                         

In [None]:
co2_data.columns = [
    "year", "month", "date (int)", "date", "co2", "seasonally adjusted",
    "fit", "seasonally adjusted fit", "co2 filled", "seasonally adjusted filled" 
]

In [None]:
co2_data

## Simple data cleaning

Here we remove the rows where the CO_2 data are NaNs

In [None]:
# grab the subset of data where co2 do not have NaNs
inds = ~np.isnan(co2_data["co2"])  
co2_data_clean = co2_data[inds]
co2_data_clean

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
ax.plot(
    co2_data_clean["date"], co2_data_clean["co2"], 
    label="CO$_2$ [ppm]"
)
ax.plot(
    co2_data_clean["date"], co2_data_clean["seasonally adjusted"], 
    label="seasonally adjusted",
)
ax.set_xlabel("Year")
ax.set_ylabel("CO$_2$ Concentration (ppm)")
ax.grid()
ax.legend()

## Moving average to remove seasonal variations

Does using a moving average produce similar results as the seasonally adjusted data provided by scripps? 

**Note**: we are being a bit sloppy here with the averaging - this assumes that the NaN's are all at the beginning / end of the time-series. 

In [None]:
window_size = 10

n_data = co2_data_clean.shape[0]
average = np.full(n_data, np.nan)
half_window = window_size // 2
for i in range(half_window, n_data - half_window):
    average[i] = np.mean(co2_data_clean["co2"][i - half_window: i + half_window])

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
ax.plot(
    co2_data_clean["date"], co2_data_clean["co2"], 
    label="CO$_2$ [ppm]"
)
ax.plot(
    co2_data_clean["date"], co2_data_clean["seasonally adjusted"], 
    label="seasonally adjusted"
)
ax.plot(
    co2_data_clean["date"], average, 
    label=f"moving average: {window_size}"
)
ax.set_xlabel("Year")
ax.set_ylabel("CO$_2$ Concentration (ppm)")
ax.grid()
ax.legend()

ax.set_xlim([2010, 2020])
ax.set_ylim([380, 420])