# The Mauna Loa Observatory Data

### 2014-05-28

$$
$$

::: {.cell .markdown}

<!-- Do not edit this file locally. -->
<!-- Do not edit this file locally. -->
<!---->
<!-- Do not edit this file locally. -->
<!-- Do not edit this file locally. -->
<!-- The last names to be defined. Should be defined entirely in terms of macros from above-->
<!--

-->

## Setup

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_notebooks/includes/notebook-setup.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_notebooks/includes/notebook-setup.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

In [None]:
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 22})

<!--setupplotcode{import seaborn as sns
sns.set_style('darkgrid')
sns.set_context('paper')
sns.set_palette('colorblind')}-->

## notutils

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/notutils-software.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/notutils-software.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

This small package is a helper package for various notebook utilities
used

The software can be installed using

In [None]:
%pip install notutils

from the command prompt where you can access your python installation.

The code is also available on GitHub:
<https://github.com/lawrennd/notutils>

Once `notutils` is installed, it can be imported in the usual manner.

In [None]:
import notutils

## pods

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/pods-software.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/pods-software.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

In Sheffield we created a suite of software tools for ‘Open Data
Science.’ Open data science is an approach to sharing code, models and
data that should make it easier for companies, health professionals and
scientists to gain access to data science techniques.

You can also check this blog post on [Open Data
Science](http://inverseprobability.com/2014/07/01/open-data-science).

The software can be installed using

In [None]:
%pip install pods

from the command prompt where you can access your python installation.

The code is also available on GitHub: <https://github.com/lawrennd/ods>

Once `pods` is installed, it can be imported in the usual manner.

In [None]:
import pods

## mlai

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/mlai-software.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/mlai-software.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

The `mlai` software is a suite of helper functions for teaching and
demonstrating machine learning algorithms. It was first used in the
Machine Learning and Adaptive Intelligence course in Sheffield in 2013.

The software can be installed using

In [None]:
%pip install mlai

from the command prompt where you can access your python installation.

The code is also available on GitHub: <https://github.com/lawrennd/mlai>

Once `mlai` is installed, it can be imported in the usual manner.

In [None]:
import mlai

## Mauna Loa Data

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_datasets/includes/mauna-loa-data.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_datasets/includes/mauna-loa-data.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

The Mauna Loa data consists of monthly mean carbon dioxide measured at
Mauna Loa Observatory, Hawaii. According to the website,
<https://www.esrl.noaa.gov/gmd/ccgg/trends/>.

> The carbon dioxide data on Mauna Loa constitute the longest record of
> direct measurements of CO2 in the atmosphere. They were started by C.
> David Keeling of the Scripps Institution of Oceanography in March of
> 1958 at a facility of the National Oceanic and Atmospheric
> Administration (Keeling et al., 1976). NOAA started its own CO2
> measurements in May of 1974, and they have run in parallel with those
> made by Scripps since then (Thoning et al., 1989).

In [None]:
import numpy as np
import pods

In [None]:
data = pods.datasets.mauna_loa()

Here, if you’ve downloaded the data before you have a cached version. To
download a fresh version of the data I can set `refresh_data=True`.

In [None]:
data = pods.datasets.mauna_loa(refresh_data=True)
x = data['X']
y = data['Y']

offset = y.mean()
scale = np.sqrt(y.var())

The data dictionary contains the standard keys ‘X’ and ‘Y’ which give a
unidimensional regression problem.

In [None]:
import matplotlib.pyplot as plt
import mlai.plot as plot
import mlai

In [None]:
xlim = (1950,2020)
ylim = (310, 420)
yhat = (y-offset)/scale

fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
_ = ax.plot(x, y, 'r.',markersize=2)
ax.set_xlabel('year')
ax.set_ylabel('CO$_2$ concentration in ppm')
ax.set_xlim(xlim)
ax.set_ylim(ylim)

mlai.write_figure(filename='mauna-loa.svg', 
                  directory='./datasets')

<img src="https://inverseprobability.com/talks/./slides/diagrams//datasets/mauna-loa.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>Mauna Loa data shows carbon dioxide monthly average
measurements from the Mauna Loa Observatory in Hawaii.</i>

Additionally there are keys `Xtest` and `Ytest` which provide test data.
The number of points considered to be *training data* is controlled by
the argument `num_train` argument, which defaults to 545. This number is
chosen as it matches that used in the [Gaussian Processes for Machine
Learning](http://www.gaussianprocess.org/gpml/chapters/RW5.pdf) book
(Rasmussen and Williams, 2006, Chapter 5). Below we plot the test and
training data.

In [None]:
xtest = data['Xtest']
ytest = data['Ytest']
ytesthat = (ytest-offset)/scale

In [None]:
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
_ = ax.plot(x, y, 'r.',markersize=2)
_ = ax.plot(xtest, ytest, 'g.',markersize=2)
ax.set_xlabel('year')
ax.set_ylabel('CO$_2$ concentration in ppm')
ax.set_xlim(xlim)
ax.set_ylim(ylim)

mlai.write_figure(filename='mauna-loa-test.svg', 
                  directory='./datasets')

<img src="https://inverseprobability.com/talks/./slides/diagrams//datasets/mauna-loa-test.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>Mauna Loa test data shows carbon dioxide monthly average
measurements from the Mauna Loa Observatory in Hawaii.</i>

Of course we have included the citation information for the data.

In [None]:
print(data['citation'])

And extra information about the data is included, as standard, under the
keys `info` and `details`.

In [None]:
print(data['info'])
print()
print(data['details'])

And, importantly, for reference you can also check the license for the
data:

In [None]:
print(data['license'])

## References

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/mauna-loa.gpp.markdown" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/mauna-loa.gpp.markdown', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

Keeling, C.D., Bacastow, R.B., Bainbridge, A.E., Ekdahl Jr., C.A.,
Guenther, P.R., Waterman, L.S., Chin, J.F.S., 1976. Atmospheric carbon
dioxide variations at Mauna Loa Observatory, Hawaii. Tellus 28, 538–551.
<https://doi.org/10.1111/j.2153-3490.1976.tb00701.x>

Rasmussen, C.E., Williams, C.K.I., 2006. Gaussian processes for machine
learning. mit, Cambridge, MA.

Thoning, K.W., Tans, P.P., Komhyr, W.D., 1989. Atmospheric carbon
dioxide at Mauna Loa Observatory: 2. Analysis of the NOAA GMCC data,
1974–1985. Journal of Geophysical Research: Atmospheres 94, 8549–8565.
<https://doi.org/10.1029/JD094iD06p08549>