In [1]:
%%html
<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }

  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>

# GSE52431 Analysis Notebook
## Overview
##### Introduction
This Notebook contains an analysis of GEO dataset [GSE52431](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52431).  It has been programmatically generated with the Jupyter Notebook Generator, available at the following [link](https://amp.pharm.mssm.edu/notebook-generator-web/notebook?acc=GSE68719).

##### Sections
The report is divided in two sections:

1. *Data Processing*. This section covers the process of download and preprocessing of the dataset.
2. *Data Analysis*. This section covers the application of scripts and computational tools to analyze the dataset.

## 1. Data Processing
Here we download the dataset from the ARCHS4 library, load data and metadata in pandas DataFrames.
	

In [2]:
%matplotlib inline
import archs4, sys
from importlib import reload
sys.path.append('pd-collection/')
import code_library_lily

ModuleNotFoundError: No module named 'archs4'

In [None]:
rawcount_dataframe, sample_metadata_dataframe = archs4.fetch_dataset("GSE68719")
rawcount_dataframe.head()

In [None]:
import pandas as pd
data=rawcount_dataframe
sample_characteristics = pd.read_csv('metadata_GSE68719.csv', sep=';', 
                                     index_col = 'Accession #',
                                     names = ["Accession #", "Tissue", "Gender", 
                                              "Post-morterm interval (PMI)",
                                              "RIN", "Age at Death", "Proteomics Study", "8"])
sample_characteristics.drop('8', axis=1).head()

## 2. Data Analysis
Here we analyze the processed dataset.

In [None]:
code_library_lily.sample_barchart(data)

In [None]:
code_library_lily.generate_figure_legend(1, 'Bar chart of the sample sum values.')

In [None]:
code_library_lily.gene_histogram(data)

In [None]:
code_library_lily.generate_figure_legend(2, 'Histogram of the gene median values in log scale.')

In [None]:
code_library_lily.plot_pca_3d(data, color_by_continuous=sample_characteristics['RIN'])

In [None]:
code_library_lily.plot_clustermap(data)

In [None]:
code_library_lily.generate_figure_legend(4, 'Clustermap of the dataset.')

In [None]:
code_library_lily.plot_correlation_heatmap(data)

In [None]:
code_library_lily.generate_figure_legend(5, 'Correlation heatmap of the dataset.')

In [None]:
code_library_lily.plot_clustergram(data)

In [None]:
code_library_lily.generate_figure_legend(6, 'Clustergram of the dataset.')