<a href="https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks/overview_of_notebooks_keelingcurve.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 style="color: #269142; font-size: 2.6em; font-family:serif">Notebooks to create graphics that appear on the keelingcurve.ucsd.edu website</h1>

The Keeling curve is an atmospheric carbon dioxide concentration record from the Mauna Loa Observatory, Hawaii starting in 1958. The Keeling curve website presents a series of graphics of the Keeling curve at various time periods alongside Ice Core records going back 800K years and each of those plots can be generated one per notebook. 

<h2 style="font-size: 2em; background:#87bcf5; padding:0.2em">Overview</h2>

This notebook, a combination of text and python code, is an overview of background information, instructions, and an introduction to python code used to run notebooks that create graphics seen on the home page of the Keeling Curve website [keelingcurve.ucsd.edu](keelingcurve.ucsd.edu). The PDF (vector) formats of the graphics can be downloaded from [https://keelingcurve.ucsd.edu/pdf-downloads/](https://keelingcurve.ucsd.edu/pdf-downloads/).

The notebooks listed below are stored in the [Keeling Curve GitHub repository](https://github.com/sio-co2o2/keelingcurve_notebooks). The notebooks can be accessed either there or from the notebook links on this page. 

From the GitHub repository page, the file links can be clicked on and this will open up a preview of the notebook. Look for the Google Colab badge link at the top of each notebook. When this badge is clicked on, the notebook will open in [Google Colab](https://colab.research.google.com/) where it can be run to generate graphics seen on the home page of the Keeling Curve website. All the code in the notebook can be modified without affecting the original version. And any changes can be saved to the user's Google Drive or GitHub repository. The saved notebook can be opened again in Google Colab by clicking on the Google Colab badge link at the top of the notebook. 

Each notebook contains a series of functions written in python which are used to fetch the data, process it for plotting, and then run a plot command to be visualize the plots and enable them to be downloaded locally.  Data is "fetched", or downloaded for each notebook to generate the different plots. Mauna Loa Obeservatory, MLO, data is fetched from the Keeling Curve GitHub repository data folder and icecore data is fetched from the NCEI website. Some notebooks contain code to generate basic animations of the plots showing CO<sub>2</sub> concentration as a function of time. 

### MLO data from the most recent month is preliminary

Mauna Loa carbon dioxide data from the most recent month is preliminary and subject to subsequence updates to account for retrospective calibration and quality control. See [scrippsco2.ucsd.edu](scrippsco2.ucsd.edu) for data that has passed these routine quality checks and updates. The datasets are archived once a month on the [scrippsco2.ucsd.edu](scrippsco2.ucsd.edu) website and the Scripps CO2 Program [library archive](https://library.ucsd.edu/dc/collection/bb3381541w) database at UCSD.

### A sample of the plots that can be generated

![title](../images/overview/mlo_full_record.png)

![title](../images/overview/co2_800k.png)

<h2 style="font-size: 2em; background:#87bcf5; padding:0.2em">License</h2>

<h2 style="font-size: 2em; background:#87bcf5; padding:0.2em">Data sharing policy</h2>


The notebooks, data, and graphics in this GitHub repository are made freely available, with the understanding that appropriate credit will be given. For applications supporting peer-reviewed scientific publications, coauthorship may sometimes be appropriate. An example would be if an important result or conclusion depends on this product, such as the first account of a previously unreported phenomenon. Ethical usage requires disclosing intentions at early stages of the work in order to avoid duplicating ongoing studies at Scripps. For applications where coauthorship is not needed, which includes all applications outside of the peer-reviewed scientific literature, it is sufficient to acknowledge the Scripps CO2 program as the source. Please direct queries to Ralph Keeling (rkeeling@ucsd.edu)

<a id="toc"></a>
<h2 style="font-size: 2em; background:#87bcf5; padding:0.2em">Table of Contents</h2>

1. [Notebooks for the Keeling Curve](#notebooks-for-the-keeling-curve)
2. [Using Google Colab](#using-google-colab)
3. [Data Sources and Citations](#data-sources-and-citations)
4. [Notebooks GitHub Repository](#notebooks-gitHub-repository)
5. [Overview of python](#overview-of-python)
6. [Requests package to fetch data files](#requests-package-to-fetch-data-files)
7. [NumPy package to work with numbers](#numpy-package-to-work-with-numbers)
8. [Pandas package to read in and manipulate data](#pandas-package-to-read-in-and-manipulate-data)
9. [Matplotlib package for plotting](#matplotlib-package-for-plotting)
10. [Overview of functions defined in the notebooks](#overview-of-functions-defined-in-the-notebooks)


<a id="notebooks-for-the-keeling-curve">

<h2 style="font-size: 2em; background:#87bcf5; padding:0.em">1. Notebooks for the Keeling Curve</h2>

### Plot the MLO CO<sub>2</sub> record at various time intervals

- [Plot the full MLO Record](https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks)

- [Plot the MLO Record one week previous](https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks/plot_mlo_one_week_keelingcurve.ipynb)

- [Plot the MLO Record one month previous](https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks/plot_mlo_one_month_keelingcurve.ipynb)

- [Plot the MLO Record six months previous](https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks/plot_mlo_six_months_keelingcurve.ipynb)

- [Plot the MLO Record one year previous](https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks/plot_mlo_one_year_keelingcurve.ipynb)

- [Plot the MLO Record two years previous](https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks/plot_mlo_two_years_keelingcurve.ipynb)

### Plot the combined icecore and MLO CO<sub>2</sub> record at various time intervals

- [Plot icecore and MLO records from 1700 to the present](https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks/plot_icecore_start_1700_ce_keelingcurve.ipynb)

- [Plot icecore and MLO records back 2000 years from the present](https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks/plot_icecore_back_2K_ce_keelingcurve.ipynb)

- [Plot icecore and MLO records back 10K years from the present](https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks/plot_icecore_back_10K_keelingcurve.ipynb)

- [Plot icecore and MLO records back 800K years from the present](https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks/plot_icecore_back_800K_keelingcurve.ipynb)


[TOC](#toc)

<a id="using-google-colab"></a>
<h2 style="font-size: 2em; background:#87bcf5; padding:0.15em">2. Using Google Colab</h2>

- [Overview of Google Colab](#Overview-of-Google-Colab)
- [Running a notebook](#Running-a-notebook)
- [Saving a notebook](#Saving-a-notebook-and-opening-it-later)

### Overview of Google Colab

To use notebooks served by Google Colab, you need to have a google account and be logged into it. Without an account logged in, you can only view the notebook. 

Google Colab is a service provided by Google to enable users to run code on their site from Jupyter notebooks. Jupyter notebooks are a file format that enables text and python code to run in the same browser window. It is a virtual environment with many python code packages already installed so that the user doesn't have to set up their own python environment on their computer.  

All the code can be modified without affecting the original. If you modify this notebook and want a fresh copy, go to this link [Google Colab link to original notebook](https://colab.research.google.com/github/sio-co2o2/keelingcurve_notebooks/blob/main/notebooks/overview_of_notebooks_keelingcurve.ipynb).

You may see the following warning. It's Google's way of asking for permission to use the notebook, but not so friendly. The notebook is run completely on Google Colab and the notebook is stored in a GitHub Repository. Google Colab either calls notebooks from a GitHub repository or a Google Drive. The only user interaction is a button to download the plots via the browser. This notebook does not read any user information and does not require access to your Google Drive. If you want to save the notebook, you will need to save it to your Google Drive or a GitHub repository. This notebook does not save itself and can only be saved using the Google Colab File menu. 

![title](../images/overview/google_warning.png)

### python environment

A python environment is a way to separate different package versions and an environment to store specific packages you may not need in each project. To use the packages, an import statement is required. This tells Google Colab you want to use these specific packages in this notebook environment. 

You can download this notebook for the Keeling Curve GitHub repository and run it on your own computer. It will require setting up a python environment on your computer. Information on how to do this can be found here [find good page to link to]()

#### Running a notebook

At the top of the screen, under the notebook name, click on the menu item "Runtime" and then in the dropdown menu, click on "Run all". This will run all the code for the notebook which access the data,process it, configures the plot properties, creates the plot and creates buttons to push for downloading the image. 

#### Parts of a notebook
Text and code seen in the notebook exist in box areas called cells. There are text cells and code cells. The top of the notebook has buttons for adding a markdown (text cell) or code cell.


![title](../images/overview/code_text.png)

#### To delete a cell

In the right portion of the cell, a menu pops up and to delete the cell, click on a trash can.

![title](../images/overview/cell_menu.png)

To access a cell to make changes, hover over text with your mouse and then double click it, and for code cells, just click inside. If you change text or code in a cell, you can either use the menu option 'Run all' again, or while the cell is selected with your mouse arrow in the cell box, use the key combination shift+enter to run it. This runs that single cell and not the entire notebook.

When shift+enter is pushed on a cell with a function call in it, the cell will run the function call. Running a cell loads information into the Colab memory and tells Colab to display text or what code and the order to run it. 

Text is written using [Markdown](https://www.markdownguide.org/basic-syntax/) which is a set of text symbols used to create headers, bold text, and other format features. There is no color option or font size options with Markdown. 

![title](../images/overview/run_all_menu.png)

### Saving a notebook and opening it later

To save any changes made to the notebook, click on "Save a copy in Drive" in the File dropdown menu to save the notebook file to your Google drive. You can also save it to your GitHub repository. If you save to your Google Drive, the notebook will be stored on your drive in the folder called "Google Colab". To open it again in Google Colab, double click on the file link in the Google Colab folder. 


![title](../images/overview/file_menu.png)

[TOC](#toc)

<a id="data-sources-and-citations"></a>
<h2 style="font-size: 2em; background:#87bcf5; padding:0.15em">3. Data Sources and Citations</h2>

### Mauna Loa CO<sub>2</sub> Data
Mauna Loa carbon dioxide data from the most recent month is preliminary and subject to subsequence updates to account for retrospective calibration and quality control. See scrippsco2.ucsd.edu for data that has passed these routine quality checks and updates. The datasets are archived once a month on the [scrippsco2.ucsd.edu](https://scrippsco2.ucsd.edu) website and the [Scripps CO<sub>2</sub> Program library archive](https://library.ucsd.edu/dc/collection/bb3381541w) at UCSD.

**CO<sub>2</sub> data from 1958 onward are from the Scripps CO<sub>2</sub> program**

Site: [http://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record](http://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record)

DOI: [http://doi.org/10.6075/J08W3BHW](http://doi.org/10.6075/J08W3BHW)

Citation: C. D. Keeling, S. C. Piper, R. B. Bacastow, M. Wahlen, T. P. Whorf, M. Heimann, and H. A. Meijer, Exchanges of atmospheric CO2 and 13CO2 with the terrestrial biosphere and oceans from 1978 to 2000. I. Global aspects, SIO Reference Series, No. 01-06, Scripps Institution of Oceanography, San Diego, 88 pages, 2001. [http://escholarship.org/uc/item/09v319r9](http://escholarship.org/uc/item/09v319r9)



### Icecore Data

**CO<sub>2</sub> data before 1958 going back 2000 years**

Site: [https://www.ncei.noaa.gov/access/paleo-search/study/9959](https://www.ncei.noaa.gov/access/paleo-search/study/9959)

DOI: [https://doi.org/10.1029/2006GL026152](https://doi.org/10.1029/2006GL026152)

Dataset: [https://www.ncei.noaa.gov/pub/data/paleo/icecore/antarctica/law/law2006.txt](https://www.ncei.noaa.gov/pub/data/paleo/icecore/antarctica/law/law2006.txt)

Citation: MacFarling Meure, C., D. Etheridge, C. Trudinger, P. Steele, R. Langenfelds, T. van Ommen, A. Smith, and J. Elkins. 2006. The Law Dome CO2, CH4 and N2O Ice Core Records Extended to 2000 years BP. Geophysical Research Letters, Vol. 33, No. 14, L14810 10.1029/2006GL026152.

**CO<sub>2</sub> data before 1958 going back 800,000 years**

Site: [https://www.ncei.noaa.gov/access/paleo-search/study/6091](https://www.ncei.noaa.gov/access/paleo-search/study/6091)

DOI: [https://doi.org/10.1038/nature06949](https://doi.org/10.1038/nature06949)

Dataset: [https://www.ncei.noaa.gov/pub/data/paleo/icecore/antarctica/epica_domec/edc-co2-2008.txt](https://www.ncei.noaa.gov/pub/data/paleo/icecore/antarctica/epica_domec/edc-co2-2008.txt)

Citation: Lüthi, D., M. Le Floch, B. Bereiter, T. Blunier, J.-M. Barnola, U. Siegenthaler, D. Raynaud, J. Jouzel, H. Fischer, K. Kawamura, and T.F. Stocker. 2008. High-resolution carbon dioxide concentration record 650,000-800,000 years before present. Nature, Vol. 453, pp. 379-382, 15 May 2008.


[TOC](#toc)

<a id="notebooks-github-repository"></a>
<h2 style="font-size: 2em; background:#87bcf5; padding:0.15em">4. Notebooks GitHub Repository</h2>

The Keeling Curve Google Colab Notebooks can be found at the GitHub repository [keelingcurve_notebooks](https://github.com/sio-co2o2/keelingcurve_notebooks)

This repository contains Jupyter notebooks that open in Google Colab, a UCSD/SIO logo used in the MLO plots, and MLO data used to create the plots. The data is updated when there is a tweet from the [Keeling Curve twitter](https://twitter.com/Keeling_curve) account which occurs nearly daily. Icecore data is fetched from [https://www.ncei.noaa.gov](https://www.ncei.noaa.gov). The last month of MLO data is preliminary and is subject to corrections due to factors discussed in the [data sources](#Data-Sources) section.

[TOC](#toc)

<a id="overview-of-python"></a>
<h2 style="font-size: 2em; background:#87bcf5; padding:0.15em">5. Overview of python</h2>

- [Python code basics](#Python-code-basics)
- [Python functions](#Python-functions)
- [Importing code packages](#Importing-code-packages)
- [requests package to fetch data files](#requests-package-to-fetch-data-files)
- [NumPy package to work with numbers](#NumPy-package-to-work-with-numbers)
- [pandas package to read in and manipulate data](#pandas-package-to-read-in-and-manipulate-data)
- [matplotlib package for plotting](#matplotlib-package-for-plotting)


### Python code basics

Python code basics include numbers, strings (characters enclosed in quotes), collections of numbers and strings (lists and dicts), looping and true/false conditions, and functions. 

With Google Colab notebooks, python works right away and doesn't have to be installed. The Google Colab environment includes python and multiple packages.

An overview of python is at [https://www.knowledgehut.com/tutorials/python-tutorial](https://www.knowledgehut.com/tutorials/python-tutorial) and [https://www.codecademy.com/learn/learn-python-3](https://www.codecademy.com/learn/learn-python-3) along with many YouTube videos and online courses.  

#### lists

In python, keeping track of a series of numbers can either be done with a "list" or a "NumPy array". The NumPy array is better for math, and the list is good for basic use. The NumPy package is discussed below. A python list is a way of representing a collection of comma separated values inside a pair of square brackets '[]'. For example, [1, 2, 3, 4, 5] and ['one', 'two', 'three']. And a list can contain both numbers and strings at the same time. The first element is at index 0 since python starts at 0 instead of 1. To get the first element in a list, type the following:

    a_list = [1, 2, 3, 4, 5]
    variable = a_list[0]
    
where variable = 1.

To get a part of a list, use a colon ':' to separate contiguous indices. A bit tricky is that python does not include the last index when selecting it with a ':'. So the following code for a slice of a list results in one number in a list and not two. 

    a_value = a_list[2:3]
    
The variable a_value = [3]. The number 3 is at index = 2. Python list selections are thus [start_index: end_index] where the value at end_index is not included in the result. Notice that a list is returned when a colon ':' is used, and that a single value of 3 is returned if you type a_list[2]. 

    b_value = a_list[2]
    
Lists are great for collecting similar items together.


#### dicts

It's helpful to use names associated with numbers and strings. There are variables of course, but there are also dicts, short for dictionaries. A dict is represented with a pair of curly braces '{}' and is a collection of names (called keys) and their associated values. These values can be numbers, strings, lists, more dicts, etc. An example of a dict is:

    a_dict = {'x': 5}
    b_dict = {'x': 5, 'y': 7}
    
where the names called keys and their values are separated by a colon ':'. Dicts are a helpful way of collecting a lot of variable names and values into one identifier. To get the value of a key, type the following:

    variable = a_dict['x'] 
    
where variable is the value 5. So a variable dict is enclosed in curly brackets composed of keys and values, where the key is associated with its value using a colon ':' and the key/value pairs are separated by commas ','. 

    variable = {key1: value1, key2: value2, key3: value3}



#### tuples

Tuples are another way of representing a collection. Tuples are different than lists and use double parenthesis '()'. Tuples are ordered, unchangeable, and allow duplicate values. You can read more about tuples here [https://www.geeksforgeeks.org/python-tuples/](https://www.geeksforgeeks.org/python-tuples/).

#### true/false conditions

In programming, there are many times where a decision has to be made, and to represent these, a True result means the decision was 'yes' and a False result means the decision was 'no'. To represent this decision, the words 'if', 'else', and 'elif' are used. And to separate the decisions to be made, a colon ':' is added after the decision statement. For python to know what code is associated with an if statement, it is indented. Usually 4 spaces. For example:

    if True:
        print('The decision is yes')
    else:
        print('The decision is no')
        
And if there are more decisions, add in an elif statement.

    variable = 10

    if variable == 5:
        print('The decision is yes, the variable is equal to 5')
    elif variable == 10:
        print('The decision is yes, the variable is equal to 10')   
    else:
        print('The decision wasn't made, so do this instead')

The double equal sign '==' stands for checking if the value on the left is the same as the value on the right. 

And in python, code is indented by 4 spaces after a statement that ends in a colon ':'. The 'print' statement means display the result.



#### loops

Python represents looping with a 'for' statement. 

To do something for each element of a list, use the following code:

    a_list = ['a', 'b', 'c']
    
    for elem in a_list:
        print(elem)

This will look at each element in the list, set the value to a variable named 'elem' and then do something with that variable. In this case, display it. 

The for statement also works with dicts, but in a different way to get to the keys and values inside.

    a_dict = {'key1': value1, 'key2': value2}
    
    for a_key, a_val in a_dict.items():
        print(a_key)
        print(a_val)
        
This will look at each key/value item pair, and then extract out the keys to a variable a_key and the values to a variable a_val. And for each loop, both the key and the value are printed to the display.  



There is also another way to do looping called list comprehension. 

The following for loop

    y = []
    for x in [1,2,3]:
       x=x*2
       y.append(x)
       
is the same as

    y = [x*2 for x in [1,2,3]]

For both, y = [2, 4, 6]



#### python functions

In python, you can type commands without a function, but if you want to contain them and be able to use the same code in many places, functions are necessary. Functions are created using the 'def' keyword followed by the function name and a pair of parentheses '()', and finally a colon ':'. All code of the function is indented so that python knows which code is part of the function. 

    def function_name():
        print("I'm inside a function")
        
and then to call the function elsewhere, type the function name followed by a pair of parentheses '()'.

    function_name()

If there are values to pass to the function, place them inside the parentheses. And to return a calculated value from the function, use the keyword 'return'. If nothing is being returned, then you can skip writing a return statement. You can think of 'def' as standing for 'definition' since a function is being defined.

    def function_name(x):
        y = x + 2
        return y
       
and to call it,

    answer = function_name(5)
    
with the value of answer is 7.

In python, a function has to be written before you call it. And if you have two functions in the program, the order of the functions is important. Imagine reading the program from the top to the bottom, you can't use a function that hasn't been written yet. So if you have one function that calls another, the called function must appear first.

    def function_being_called():
        print('I'm being called')
        
    def function_calling_another():
        function_being_called()
        
and the result is "I'm being called" printed to the console.


### Importing code packages

The import statement at the top of every notebook is to make external python packages of code created by someone else available to use in the notebook. Google Colab makes this easy to do with import statement. If you were to download the Jupyter notebook and use it on your own computer, it is necessary to create a "python environment" which is like a container that holds downloaded code packages. Even though the packages have been downloaded, it is required to use the import statement in a notebook to let python know you will be using them. Google Colab notebooks create the python environment automatically and so packages don't need to be downloaded and installed first, but just imported. 


[TOC](#toc)

<a id="requests-package-to-fetch-data-files"></a>
<h2 style="font-size: 2em; background:#87bcf5; padding:0.15em">6. Requests package to fetch data files</h2>

The [requests package](https://docs.python-requests.org/en/latest/) enables data to be downloaded from a website link, and saved into a variable instead of a file. Well formatted data can also be read into a variable using the pandas package which is described below. The notebook uses the requests package instead of pandas because the icecore data files are text based with many comments and unneeded data mixed in. The notebook uses the pandas package for the MLO data because it is well formed with clear comment lines and column names. To use reqeusts to fetch data, use the following code:

    variable = requests.get('file location')
    
But the code doesn't know if the data is to be formatted as text or json, so you have to tell it with the following code:

    variable_text = variable.text
    
[TOC](#toc)

<a id="numpy-package-to-work-with-numbers"></a>
<h2 style="font-size: 2em; background:#87bcf5; padding:0.15em">7. NumPy package to work with numbers</h2>

With the NumPy package, you can work with arrays. Lists are one dimensional (only rows or only columns). And arrays can hold multiple dimensions (rows and columns at the same time along with other dimensions like time, etc.). NumPy arrays can also just be one dimensional. NumPy is used because it is much faster than working with lists. 

As a shortcut, NumPy is imported such that an abbreviation of 'np' can be used.

    import numpy as np
    
    
A python list can be turned into a numpy array by the following

    a_list = [1, 2, 3, 4]
    
    a_numpy_array = np.array(a_list)
    
    
For a two dimensional array, type the following (notice the square brackets '[]' enclosing everything):

    arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
    
With a numpy array, there are functions that are included with it. You call a function by placting a dot (.) after np and followed by the function name. For example: 
    
    variable = np.shape(arr)

This gives the shape of the array which is (2,4) which means there are two rows and 4 columns. The first dimension = 2 and the second dimension = 4.


[TOC](#toc)

<a id="pandas-package-to-read-in-and-manipulate-data"></a>
<h2 style="font-size: 2em; background:#87bcf5; padding:0.15em">8. Pandas package to read in and manipulate data</h2>

The [pandas package](https://pandas.pydata.org/) can be thought of as a python representaion of an Excel worksheet. Data is stored in rows and column cells with column headers. Data stored in a variable represented by a combination of columns and rows is referred to as a dataframe. This makes it easy to work with one column at a time or multiple. And it's intuitive to work with data having a label rather than just the numbers themselves. 

In the notebooks, the pandas package is called by abbreviating it as pd using the statement: 

    import pandas as pd

Functions associated with the pandas package are called using the format pd followed by a dot (.) and the function name. An example function call is "pd.read_csv(filename)" where filename is the name of a file you want to read in data from. This file can either be found locally or remotely. The function read_csv 'reads' or extracts data from the file and stores it in a varaible. To avoid having functions with the same name conflicting with each other, the 'pd' part is used in front of the function name. You'll also notice the variable itself can call pandas functions by using a dot (.) and a function name after it. And example is 

    df = pd.read_csv('filename.txt')
    df.head()
    
The first code line reads in the data from the file filename.txt into the variable df (called a dataframe if the data is more than one dimension meaning rows and columns), and the next code line tells the notebookk to display to the screen the first 5 lines of the dataframe. A file with the extension '.csv' means the file is a text file containing values separated by commas. Each value separated by a comma in a row is placed in a "column", meaning you can think of it like a spreadsheet column. If there are header lines, the columns are named using them. To get all the data in this column as a variable, use the following code:

    variable = df['column name']
    
To get a subset of a dataframe such as two columns, use the following code (notice the double square brackets):

    variable = df[['column name 1', 'column name 2']]

A single set of square brackets for one column, represents a "Series" or a one dimensional representation of the data. For two columns, a double set of square brackets is used which represents a "DataFrame" or a two dimensional representation of the data.

The row numbers of a dataframe are referred to as the index. As an aside, the index number is not the same as the line number. An index is a way to label each row so it is unique. It is possible for there to be 4 rows and instead of referring to the rows as 0, 1, 2, 3, the index is 0, 1, 3, 4. Note that in python, an index starts at 0 and not at 1. To get back to rows that start at 0 again and increase by 1, use the reindex function. 

To convert a pandas dataframe into a numpy array, run
    
    x = df['column_name'].to_numpy()

This is a nice introduction to pandas [https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/](https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/).


[TOC](#toc)

<a id="matplotlib-package-for-plotting"></a>
<h2 style="font-size: 2em; background:#87bcf5; padding:0.15em">9. Matplotlib package for plotting</h2>

Matplotlib is plotting package that enables plots to display when run in a Jupyter code cell. 

#### Setup

For matplotlib commands, the package is imported with the abbreviation 'plt' and then all functions follow with a dot '.' and the function name. The import command is

    import matplotlib.pyplot as plt

To display plots inside the notebook, the next line is needed and place it at the top of the notebook

    %matplotlib
    
This command will make your plot outputs appear and be stored within the notebook.


#### Plotting
There are two attributes associated with a plot, the Figure and the axes. 

The Figure refers to the plot that is created. To create a plot, one way to plot and keep track of the plot (figure) to save later is to set a variable fig equal to the output of calling plt.Figure(), e.g. fig = plt.Figure(). The variable fig is referred to as a 'handle'. A handle is a way to keep track of the plot so that you can save it later.

    fig = plt.Figure()
    
    x = [1,5]
    y = [3,4]
    
    plt.plot(x,y)
    
    fig.savefig(filename)
    
The axes refers to the graphic settings of the plot. When a plot is created, use a variable to keep track of the axes, called a handle, so that configurations of the plot can be set to customize the graphic. As an example of getting an axes handle are the statements following where 'ax' is the axes handle and fig is the figure (plot) handle. To apply a function to the axes, such as creating a label for the axis or setting the limits of the plot, type the name of the axes handle, ax, followed by dot '.' and then the function name. Below is a function call to set the plot limits of the x axis from 0 to 5.

    fig, ax = plt.subplots()
    
    x = [1,5]
    y = [3,4] 
    
    ax.set_xlim(0, 5)

Notice that 'subplots' is used. In the previous discussion about figures, plt.Figure() creates a plot with no handle to configure the plot properties. To get an axes handle, you can either define it right away with fig, ax = plt.subplots() or use fig = plt.Figure() followed by ax = fig.add_subplot(111) where the 111 refers the figure size and position.

The x and y axes have many configurations that can be set using functions on the axes handle. Some options include setting the limits of the plot, including labels for the x and y axis, and adding a title to the plot.

Along with properties, the plot function is called using the axes handle. 

    ax.plot(x, y)
    
You can plot directly without using an axes handle, plt.plot(x, y), but then you won't be able to customize the plot. 

Another feature of matplotlib is to set font and linewidth information for all plots created so that the options are set once and not each time a plot is created. The property to do this is rcParams. To set the font family, type the following:

    plt.rcParams.update({
        "font.family": "sans-serif"
        })

A great instroduction to plotting with Matplotlib is [https://matplotlib.org/stable/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py](https://matplotlib.org/stable/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py).


[TOC](#toc)

<a id="overview-of-functions-defined-in-the-notebooks"></a>
<h2 style="font-size: 2em; background:#87bcf5; padding:0.15em">10. Overview of functions defined in the notebooks</h2>

### Functions to fetch data

Two methods are used to fetch data.

1. One is using Pandas and it's function read_csv which can read in a csv file or text file locally and remotely from a computer. Here a web address, url, will be used since the notebook is both on Google Colab and the data is located in a remote GitHub repository.

    df = pd.read_csv(url)

2. Two is using the requests package. It's a python package to call a url and retrieve multiple file formats. Since multiple types can be retrieved, the response needs to know what kind, so the response is followed by a dot '.' and 'text' here because the file to be fetched is a text file. 

    response = requests.get(icecore_2K_url)
    file_text = response.text




### Examples of functions used in the plotting notebooks

This line gets the decimal date of the seasonal adjusted data out of the pandas data frame and converts it to a numeric numpy array. The plots can actually use pandas dataframes, but numpy arrays were used for consistency since on some occasions, the data are used in the numpy numeric format outside of plotting.

    mlo_date = df_mlo['date_seas_adj'].to_numpy()

This line takes the file that was read into a string and splits it on the return character and this results in a list of strings representing the lines of the file. This is to get the data into a list form that can be read into a pandas dataframe. A pandas dataframe is a very convenient form to hold and transform the data such as removing any lines with NaN CO2 values.

    text_lines = file_text.split('\n')

This line is using a list comprehension (a type of for loop) to iterate over each line of a text file to find the header at the start of icecore data to be extracted. 

    start_section = [i for i in range(len(text_lines)) if text_lines[i].startswith('2. CO2 by Core')][0]

This line gets a range of data from a text file starting at the row of the start of data to be retrieved and the end of the data section. 

    data_lines = section_lines[start_data: end_section]

This line creates a regular expression to retrieve numeric values (\d represents a numeric digit).

    r = re.compile('(.+\d+.*\d+.*\d)\s.*')

These lines convert a list of strings into a pandas dataframe and then names the column 'data'.

    df_icecore_2K = pd.DataFrame(data_list)
    df_icecore_2K.columns = ['data']

This line removes any lines with a NaN value of CO2.

    df_icecore_800K = df_icecore_800K.dropna()

This line filters the data to find icecore data going back 800K years up to 2K years back.

    df_icecore_800K = df_icecore_800K[df_icecore_800K['date_ce'] < min_2K]

This line combines two dataframes horizontally so there are 4 columns.

    df_combined = pd.concat([df_combined_icecore, df_mlo], ignore_index=True)

This function converts a datetime (a string representation of a date) into a decimal.

    dt2t(adatetime)


### Functions to configure plot properties

The gradient_fill function is used to apply a color gradient underneath a line.

The function below sets font and line width settings of the plots.

    set_matplotlib_properties

The function below sets properties of the plot like ticks, tick labels, and axes labels.

    set_website_plot_props

The function below customizes how tick labels are displayed such as shifting their position to be centered between ticks when labeling the x-axes dates as days where the left and right ticks represent the start and end of a day and not noon.

    create_xtick_labels

The function below applies a title at a custom distance from the top axis and sets its font properties.
    
    add_plot_title

The function below adds arrows at specific years by finding the CO2 value at specific years and pointing to those CO2 values. 

    apply_arrow_annotations

The function below adds a high resolution png UCSD/SIO logo to the plot.

    add_sio_logo



### Functions to save the plot

The function below saves the plots created by matplotlib into PDF and png formats. It saves them at a specified size and resolution.

    save_plot_for_website

The function below makes use of Google Colab's ability to download plot files saved in the Google Colab virtual environment. It uses ipywidgets to create a clickable button. 
    
    download_files



[TOC](#toc)