# Datasets: Downloading Data from the Mauna Loa Observatory

## Open Data Science Initiative

### 28th May 2014 Neil D. Lawrence

This data set collection is from the Mauna Loa observatory which records atmospheric carbon levels. The data was used by [Rasmussen and Williams (2006)](http://www.gaussianprocess.org/gpml/chapters/RW5.pdf) to demonstrate hyperparameter setting in Gaussian processes. When first called, or if called with `refresh_data=True` the latest version of the data set is downloaded. Otherwise, the cached version of the data set is loaded from disk.

In [None]:
import pods
import pylab as plt
%matplotlib inline

In [None]:
data = pods.datasets.mauna_loa() 

Here, because I've downloaded the data before I have a cached version. To download a fresh version of the data I can set `refresh_data=True`.


In [None]:
data = pods.datasets.mauna_loa(refresh_data=True)

The data dictionary contains the standard keys 'X' and 'Y' which give a unidimensional regression problem.

In [None]:
plt.plot(data['X'], data['Y'], 'rx')
plt.xlabel('year')
plt.ylabel('CO$_2$ concentration in ppm')

Additionally there are keys `Xtest` and `Ytest` which provide test data. The number of points considered to be *training data* is controlled by the argument `num_train` argument, which defaults to 545. This number is chosen as it matches that used in the [Gaussian Processes for Machine Learning](http://www.gaussianprocess.org/gpml/chapters/RW5.pdf) book. Below we plot the test and training data.

In [None]:
plt.plot(data['X'], data['Y'], 'rx')
plt.plot(data['Xtest'], data['Ytest'], 'go')
plt.xlabel('year')
plt.ylabel('CO$_2$ concentration in ppm')

Of course we have included the citation information for the data.

In [None]:
print(data['citation'])

And extra information about the data is included, as standard, under the keys `info` and `details`.

In [None]:
print(data['info'])
print()
print(data['details'])

And, importantly, for reference you can also check the license for the data:

In [None]:
print(data['license'])