# Causality analysis - Paleoclimate timeseries

This notebook showcases the current capabilities of Pyleoclim, a python package for the analysis of paleoclimate data. In particular, it summarizes the effort to include the Liang causality algorithm implemented in Python by Han Wu as part of the CKIDS DataFest in Spring 2019.

<div class="alert alert-warning" role="alert" style="margin: 10px">
<p>NOTE</p>
<p> The Liang causality algorithm highlighted here is still in development as of April 25th 2019 and hasn't been released as part of a stable Pyleoclim version. It can be accessed through the development branch of the project on  <a href='https://github.com/LinkedEarth/Pyleoclim_util'>Github</a> </p>
</div>  

**Table of Contents**
* [Overview](#overview)
    * [Overall project goals](#autoTS)
    * [Pyleoclim](#pyleoclim)
        * [What is it?](#what)
        * [Current capabilities](#current)
* [Available data](#data)
    * [Paleoclimate Science Version](#sci)
    * [Computer Science version](#comp)
    * [LinkedEarth](#linkedearth)
* [Correlation](#corr)
* [Causality](#caus)
* [Future Work](#future)
    * [Testing the Liang Algorithm](#testing)
    * [Pyleoclim updates](#updates)
    * [Applying Pyleoclim (autoTS) to a science problem](#application) 

## <a name='overview'> Overview </a>
### <a name='autoTs'> Overall Project goals </a>
### <a name = 'pyleoclim'> Pyleoclim </a>
#### <a name = 'what'> What is it? </a>

Pyleoclim is a Python package primarily geared towards the analysis and visualization of paleoclimate data. Such data often come in the form of **timeseries** with missing values and age uncertainties, so the package includes several low-level methods to deal with this issues, as well as high-level methods that re-use thoses within scientific workflows.

High-level modelules assume that data are stored in the Linked Paleo Data (<a href='http://www.clim-past.net/12/1093/2016/'>LiPD</a>) format and makes extensive use of the <a href='http://nickmckay.github.io/LiPD-utilities/'> LiPD utilities </a>.

#### <a name = 'current'>Current Capabilities</a>
* Plotting maps, timeseries, and basic age model information <span style="background-color: #FFFF00">Migration from Basemap to Cartopy during DataFest Spring 2019</span> 
* Binning and interpolation
* Standardization
* Paleo-aware correlation analysis (isopersistent, isospectral and classical t-test)
* Weighted wavelet Z transform (WWZ) and associated coherence analysis
* Age modelling through Bchron - Bayesian algorithm
* Liang information flow analysis - <span style="background-color: #FFFF00">NEW from DataFest Spring 2019</span>

In [1]:
import pyleoclim as pyleo

## <a name='data'> Available Data </a>
### <a name='#sci'> Paleoclimate Science Version </a>

Climate observations prior to the instrumental era are necessarily indirect. These observations are made on climate proxies in various geological (e.g. lake or marine sediments, living or fossil coral reefs, cave deposits), glaciological (ice cores or snow pits) or biological (trees) archives. Many types of data can often be collected from each archives, each sensing a different aspect of the environment (sometimes, several aspects at once). A paleoclimate dataset is almost always a time series of observations made on an archive.


<img src="http://wiki.linked.earth/wiki/images/5/5f/PSM.jpg">
<i>Conceptual proxy system model after Evans et al. (2013). An archive is the medium in which the response of a sensor to environmental forcing being recorded. Source: <a href='http://wiki.linked.earth/Climate_Proxy'> LinkedEarth Wiki </a></i>


<img src="https://www2.usgs.gov/landresources/lcs/paleoclimate/images/arch.png" height="600" width="600">
<i> Example of climate proxies. Source: <a href='https://www2.usgs.gov/landresources/lcs/paleoclimate/'>USGS</a>.</i>

### <a name='comp'> Computer Science Version </a>

* **What? Timeseries data**
    ** Sampling rate varies by sample. Usually uneven requiring imputation or methods designed to deal with unevenly-spaced data
* **How? LiPD**
    ** The Linked Paleo Data Format (LiPD) is a universally-readable container that organizes data and metadata in a uniform way. The data is stored in CSV files while the meteadata is stored in JSON-LD.
    
<img src="http://wiki.linked.earth/wiki/images/d/db/Lipd_structure.png" height="600" width="600">
<i> Structure of a LiPD file. Source: <a href='http://wiki.linked.earth/Linked_Paleo_Data'> LinkedEarth Wiki </a> </i>
    
* **Where? LinkedEarth**




### 