# SUMMARY notebook

This notebook scans the directory in which it lives to find all jupyter notebooks (other than itself) in that directory. It then prints for every notebook it finds (1) a hyperlink to the notebook, and (2) the first cell (which is always markdown) of the notebook. This way you can read a nice, automatically generated summary of all the notebooks without having to open all of them. If you find a notebook that you want to explore further, you can simply click on its link to open it.

In [1]:
# Version: 2
import os
import json
from IPython.display import display, Markdown

# the name of this file
this_fname = 'SUMMARY.ipynb'
fname_to_md = {}
for fname in sorted([x for x in os.listdir('./')]):
    if fname[-6:] == '.ipynb'  and fname != this_fname:
        # print('------------', fname)
        with open(fname, 'r', encoding="utf-8") as f:
            fdata = json.load(f)
            fname_to_md[fname] = ''.join(fdata['cells'][0]['source'])
# print(fname_to_md)
pre_sep = '\n\n<hr style="height:10px; background-color: blue;">\n\n'
full_md = ''
k = 1
num_nb = len(fname_to_md)
project_name ="gene_causal_mapper"
who ="rrtucci"
where = "jupyter_notebooks"
for fname, md in fname_to_md.items():
    sep = pre_sep
    local_link = f' [<a href="{fname}" target= "_blank">local link</a>] '
    github_link = f' [<a href="https://github.com/{who}/{project_name}/blob/master/{where}/' +\
        f'{fname}">github link</a>] '
    sep += fname + local_link + github_link + str(k) + '/' + str(num_nb) + '\n\n'
    full_md += sep + md
    k += 1
display(Markdown(full_md))



<hr style="height:10px; background-color: blue;">

dream3_DAG_20.ipynb [<a href="dream3_DAG_20.ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/gene_causal_mapper/blob/master/jupyter_notebooks/dream3_DAG_20.ipynb">github link</a>] 1/2

# Dream3 DAG for 20 genes

In this notebook, we use the Python methods in this repo to find a DAG for 20 yeast genes out of the ~ 6,000 genes possesed by wild yeast. The same functions can be used to find a DAG for all the 5112 genes measured in the Dream3 dataset, but in this notebook, for simplicity, we just deal with the first 20. Our goal is to understand the performace of our software for a low number of genes before graduating to all of them.

NOTE: For this notebook to run properly, you must first have split the full DREAM3 dataset into 4 "strain" datasets (see `jupyter_notebooks/dream3_dataset.ipynb`)

<hr style="height:10px; background-color: blue;">

exploring-dream3_dataset.ipynb [<a href="exploring-dream3_dataset.ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/gene_causal_mapper/blob/master/jupyter_notebooks/exploring-dream3_dataset.ipynb">github link</a>] 2/2

# Exploring DREAM3 dataset

In this notebook, we explore the Dream3 dataset.

Third annual DREAM challenge (DREAM3) (2008), a gene expression
prediction challenge.

The original challenge was to predict a block 
of missing data. Here we use the same dataset 
for a different purpose: to discover a gene regulatory
network for yeast

yeast (saccharomyces cerevisiae) About 6,000 genes 

ecoli has 5,000 genes 
but is a prokaryote. yeast is an eukaryote, like plants, and humans

humans ~ 21,000 genes 

fruit flies (drosophila) ~ 14,000 genes

GAT1, GCN4 and LEU3 are TFs (Transcription Factors, i.e.,  proteins that latch onto a DNA segment)

4 yeast strains
1. wild type (wt) 
2. GAT1 deletion strain (gat1$\Delta$)
3. GCN4 deletion strain (gcn4$\Delta$) 
4. LEU3 deletion strain (leu3$\Delta$)

time course (8 times): 
T= 0, 10, 20, 30, 45, 60, 90 and 120 minutes

T: time since added 3-aminotriazole (3AT), which is an
inhibitor of an enzyme in the histidine biosynthesis pathway.
No 3AT at T=0

data: expression levels for these 4 different strains of yeast, with missing data

missing data: for gat1$\Delta$ strain, block of data of shape 50 by 8 (50 genes and 8 times).

The dataset contains 9,335 rows even though yeast have ~ 6,000 genes. The reason for 
9,335 is explained in misc/Affymetric-microarray.md

The dataset contains 35 columns: (4 strains) X (8 times) + 3 = 35

The 3 extra columns are:
1. probeID
2. geneName
3. L0,  expression level (arbitrary units) for probeID of parental strain at t=0

The other columns are log_2(L/L0) where L is the expression level 

slots for missing data have been filled with "PREDICT" string


Refs.
-----

https://dreamchallenges.org/dream-3-gene-expression-prediction/

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0008944

https://github.com/s-nakhawa/DREAM3-Gene-Expression-Prediction-Challenge

https://en.wikipedia.org/wiki/DREAM_Challenges