# Data Science | Final Project | Group 02

## Identify epigenetic alterations associated with Alzheimer’s disease and classification of gene expressions between healthy and sick patients

#### Carmen Calle Huerta, Christina Kirschbaum, Pushpa Koirala, Melika Moradi

## Our Project

Alzheimer’s disease is the most prevalent kind of dementia and a fatal brain ailment. More study into this illness might lead to a better understanding of the condition and more effective treatment options. 

In this project, ChIP-seq data for H3K27ac, H3K9ac, H3K122ac and H3K4me1 as well as RNA-seq data from the from the lateral temporal lobe of the human brain for young healthy patients, old heathy patients and patients with Alzheimers disease will be analyzed and the differences between normal aging and cognitive impairment will be explored. The epigenomic or transcriptomic profiles will be analyzed to find relevant epigenetic alterations associated with the disease and help to better understand the molecular pathophysiology underlying.

Afterwards, with Machine Learning models the presence and absence of Alzheimer’s disease based on the data. The models will be built with Support Vector Machines and Random Forest.

Finally, the findings will be compared to them in related papers, we will look into relevant *in vivo* experiments with model organisms and use the Genome Browser to generate ChIP-seq tracks.

For our project we were inspirated by the paper:
Nativio R, Lan Y, Donahue G et al. ["An integrated multi-omics approach identifies
epigenetic alterations associated with Alzheimer’s disease."](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8098004/)

The data is derived from GEO, [Series GSE153875](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE153875), for the CHiP-seq data and the raw RNA-seq data [Series GSE159699](https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA670209&o=acc_s%3Aa). 

## Import packages

First, we import some of the most frequently used packages in Python. [NumPy](https://numpy.org/doc/stable/) for working with arrays, matrices and linear algebra, [pandas](https://pandas.pydata.org/docs/) for data analysis and manipulation, [matplotlib](https://matplotlib.org/) and [seaborn](https://seaborn.pydata.org/) for visualizations.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Data Integration and Preprocessing RNA-seq

In the next step, we import our RNA-seq data from the SubSeries GSE159699 of the SuperSeries GSE153875 with SRA-Toolkit and fastq-dump. To integrate this process into Python, the package subrocess is used.

In [3]:
import subprocess

# sra numbers from accession list of our RNA-seq data from SuperSeries GSE153875 / SubSeries GSE159699
sra_numbers = [
    "SRR12850830", "SRR12850831", "SRR12850832", "SRR12850833", "SRR12850834",
    "SRR12850835", "SRR12850836", "SRR12850837", "SRR12850838", "SRR12850839",
    "SRR12850840", "SRR12850841", "SRR12850842", "SRR12850843", "SRR12850844",
    "SRR12850845", "SRR12850846", "SRR12850847", "SRR12850848", "SRR12850849",
    "SRR12850850", "SRR12850851", "SRR12850852", "SRR12850853", "SRR12850854",
    "SRR12850855", "SRR12850856", "SRR12850857", "SRR12850858", "SRR12850859"
    ]

In [None]:
# code from https://erilu.github.io/python-fastq-downloader/

# this will download the .sra files to ~/ncbi/public/sra/ 
for sra_id in sra_numbers:
    print ("Currently downloading: " + sra_id)
    prefetch = "prefetch " + sra_id
    print ("The command used was: " + prefetch)
    subprocess.call(prefetch, shell=True)

# this will extract the .sra files from above into a folder named 'fastq'
for sra_id in sra_numbers:
    print ("Generating fastq for: " + sra_id)
    fastq_dump = "fastq-dump --outdir fastq --gzip --skip-technical  --readids --read-filter pass --dumpbase --split-3 --clip ~/ncbi/public/sra/" + sra_id + ".sra"
    print ("The command used was: " + fastq_dump)
    subprocess.call(fastq_dump, shell=True)

## Data Integration for ChIP-seq

The .bed and .bw files for the ChIP-seq data were downloaded. They are available as supplementary files of the SuperSeries GSE153875 from GEO. We split them into folders for H3K27ac, H3K9ac, H3K122ac and H3K4me1 (and peaks) to get smaller sets.

In [None]:
# The following code can get the metadata (if needed) for the ChIP-seq data with the library GEOparse (similar to GEOquery).
import GEOparse

gse = GEOparse.get_GEO(geo="GSE153875", destdir="./")

In [None]:
# prints the metadata for the first sample
for gsm_name, gsm in gse.gsms.items():
    print("Name: ", gsm_name)
    print("Metadata:",)
    for key, value in gsm.metadata.items():
        print(" - %s : %s" % (key, ", ".join(value)))
    break

## Quality Control RNA-seq

### FastQC

For the Quality Control of the RNA-seq data, [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was downloaded and reports were generated for every sequence. The reports can be found in the repository in the folder MultiQC_RNA.

In [1]:
import subprocess

path = "/mnt/c/Users/Christina/Documents/GitHub/DataScience_finalProjekt_Group2/data/"
path_genomeDir = "/mnt/c/Users/Christina/Documents/GitHub/DataScience_finalProjekt_Group2/GenomeDir/"
path_fastq = "/mnt/c/Users/Christina/Documents/GitHub/DataScience_finalProjekt_Group2/data/fastq/"

In [None]:
# unzip data
for fastq in sra_numbers:
    gunzip = "gunzip " + path_fastq + fastq + "_pass.fastq.gz"
    subprocess.call(gunzip, shell=True)

In [5]:
for fastq in sra_numbers:
    fastqc = "/home/christina/FastQC/fastqc \
              --readFilesIn " + path_fastq + fastq + "_pass.fastq"
    subprocess.call(fastqc, shell=True)

Unknown option: readfilesin
Started analysis of SRR12850830_pass.fastq
Approx 5% complete for SRR12850830_pass.fastq
Approx 10% complete for SRR12850830_pass.fastq
Approx 15% complete for SRR12850830_pass.fastq
Approx 20% complete for SRR12850830_pass.fastq
Approx 25% complete for SRR12850830_pass.fastq
Approx 30% complete for SRR12850830_pass.fastq
Approx 35% complete for SRR12850830_pass.fastq
Approx 40% complete for SRR12850830_pass.fastq
Approx 45% complete for SRR12850830_pass.fastq
Approx 50% complete for SRR12850830_pass.fastq
Approx 55% complete for SRR12850830_pass.fastq
Approx 60% complete for SRR12850830_pass.fastq
Approx 65% complete for SRR12850830_pass.fastq
Approx 70% complete for SRR12850830_pass.fastq
Approx 75% complete for SRR12850830_pass.fastq
Approx 80% complete for SRR12850830_pass.fastq
Approx 85% complete for SRR12850830_pass.fastq
Approx 90% complete for SRR12850830_pass.fastq
Approx 95% complete for SRR12850830_pass.fastq


Analysis complete for SRR12850830_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850831_pass.fastq
Approx 5% complete for SRR12850831_pass.fastq
Approx 10% complete for SRR12850831_pass.fastq
Approx 15% complete for SRR12850831_pass.fastq
Approx 20% complete for SRR12850831_pass.fastq
Approx 25% complete for SRR12850831_pass.fastq
Approx 30% complete for SRR12850831_pass.fastq
Approx 35% complete for SRR12850831_pass.fastq
Approx 40% complete for SRR12850831_pass.fastq
Approx 45% complete for SRR12850831_pass.fastq
Approx 50% complete for SRR12850831_pass.fastq
Approx 55% complete for SRR12850831_pass.fastq
Approx 60% complete for SRR12850831_pass.fastq
Approx 65% complete for SRR12850831_pass.fastq
Approx 70% complete for SRR12850831_pass.fastq
Approx 75% complete for SRR12850831_pass.fastq
Approx 80% complete for SRR12850831_pass.fastq
Approx 85% complete for SRR12850831_pass.fastq
Approx 90% complete for SRR12850831_pass.fastq
Approx 95% complete for SRR12850831_pass.fastq


Analysis complete for SRR12850831_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850832_pass.fastq
Approx 5% complete for SRR12850832_pass.fastq
Approx 10% complete for SRR12850832_pass.fastq
Approx 15% complete for SRR12850832_pass.fastq
Approx 20% complete for SRR12850832_pass.fastq
Approx 25% complete for SRR12850832_pass.fastq
Approx 30% complete for SRR12850832_pass.fastq
Approx 35% complete for SRR12850832_pass.fastq
Approx 40% complete for SRR12850832_pass.fastq
Approx 45% complete for SRR12850832_pass.fastq
Approx 50% complete for SRR12850832_pass.fastq
Approx 55% complete for SRR12850832_pass.fastq
Approx 60% complete for SRR12850832_pass.fastq
Approx 65% complete for SRR12850832_pass.fastq
Approx 70% complete for SRR12850832_pass.fastq
Approx 75% complete for SRR12850832_pass.fastq
Approx 80% complete for SRR12850832_pass.fastq
Approx 85% complete for SRR12850832_pass.fastq
Approx 90% complete for SRR12850832_pass.fastq
Approx 95% complete for SRR12850832_pass.fastq


Analysis complete for SRR12850832_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850833_pass.fastq
Approx 5% complete for SRR12850833_pass.fastq
Approx 10% complete for SRR12850833_pass.fastq
Approx 15% complete for SRR12850833_pass.fastq
Approx 20% complete for SRR12850833_pass.fastq
Approx 25% complete for SRR12850833_pass.fastq
Approx 30% complete for SRR12850833_pass.fastq
Approx 35% complete for SRR12850833_pass.fastq
Approx 40% complete for SRR12850833_pass.fastq
Approx 45% complete for SRR12850833_pass.fastq
Approx 50% complete for SRR12850833_pass.fastq
Approx 55% complete for SRR12850833_pass.fastq
Approx 60% complete for SRR12850833_pass.fastq
Approx 65% complete for SRR12850833_pass.fastq
Approx 70% complete for SRR12850833_pass.fastq
Approx 75% complete for SRR12850833_pass.fastq
Approx 80% complete for SRR12850833_pass.fastq
Approx 85% complete for SRR12850833_pass.fastq
Approx 90% complete for SRR12850833_pass.fastq
Approx 95% complete for SRR12850833_pass.fastq


Analysis complete for SRR12850833_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850834_pass.fastq
Approx 5% complete for SRR12850834_pass.fastq
Approx 10% complete for SRR12850834_pass.fastq
Approx 15% complete for SRR12850834_pass.fastq
Approx 20% complete for SRR12850834_pass.fastq
Approx 25% complete for SRR12850834_pass.fastq
Approx 30% complete for SRR12850834_pass.fastq
Approx 35% complete for SRR12850834_pass.fastq
Approx 40% complete for SRR12850834_pass.fastq
Approx 45% complete for SRR12850834_pass.fastq
Approx 50% complete for SRR12850834_pass.fastq
Approx 55% complete for SRR12850834_pass.fastq
Approx 60% complete for SRR12850834_pass.fastq
Approx 65% complete for SRR12850834_pass.fastq
Approx 70% complete for SRR12850834_pass.fastq
Approx 75% complete for SRR12850834_pass.fastq
Approx 80% complete for SRR12850834_pass.fastq
Approx 85% complete for SRR12850834_pass.fastq
Approx 90% complete for SRR12850834_pass.fastq
Approx 95% complete for SRR12850834_pass.fastq


Analysis complete for SRR12850834_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850835_pass.fastq
Approx 5% complete for SRR12850835_pass.fastq
Approx 10% complete for SRR12850835_pass.fastq
Approx 15% complete for SRR12850835_pass.fastq
Approx 20% complete for SRR12850835_pass.fastq
Approx 25% complete for SRR12850835_pass.fastq
Approx 30% complete for SRR12850835_pass.fastq
Approx 35% complete for SRR12850835_pass.fastq
Approx 40% complete for SRR12850835_pass.fastq
Approx 45% complete for SRR12850835_pass.fastq
Approx 50% complete for SRR12850835_pass.fastq
Approx 55% complete for SRR12850835_pass.fastq
Approx 60% complete for SRR12850835_pass.fastq
Approx 65% complete for SRR12850835_pass.fastq
Approx 70% complete for SRR12850835_pass.fastq
Approx 75% complete for SRR12850835_pass.fastq
Approx 80% complete for SRR12850835_pass.fastq
Approx 85% complete for SRR12850835_pass.fastq
Approx 90% complete for SRR12850835_pass.fastq
Approx 95% complete for SRR12850835_pass.fastq


Analysis complete for SRR12850835_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850836_pass.fastq
Approx 5% complete for SRR12850836_pass.fastq
Approx 10% complete for SRR12850836_pass.fastq
Approx 15% complete for SRR12850836_pass.fastq
Approx 20% complete for SRR12850836_pass.fastq
Approx 25% complete for SRR12850836_pass.fastq
Approx 30% complete for SRR12850836_pass.fastq
Approx 35% complete for SRR12850836_pass.fastq
Approx 40% complete for SRR12850836_pass.fastq
Approx 45% complete for SRR12850836_pass.fastq
Approx 50% complete for SRR12850836_pass.fastq
Approx 55% complete for SRR12850836_pass.fastq
Approx 60% complete for SRR12850836_pass.fastq
Approx 65% complete for SRR12850836_pass.fastq
Approx 70% complete for SRR12850836_pass.fastq
Approx 75% complete for SRR12850836_pass.fastq
Approx 80% complete for SRR12850836_pass.fastq
Approx 85% complete for SRR12850836_pass.fastq
Approx 90% complete for SRR12850836_pass.fastq
Approx 95% complete for SRR12850836_pass.fastq


Analysis complete for SRR12850836_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850837_pass.fastq
Approx 5% complete for SRR12850837_pass.fastq
Approx 10% complete for SRR12850837_pass.fastq
Approx 15% complete for SRR12850837_pass.fastq
Approx 20% complete for SRR12850837_pass.fastq
Approx 25% complete for SRR12850837_pass.fastq
Approx 30% complete for SRR12850837_pass.fastq
Approx 35% complete for SRR12850837_pass.fastq
Approx 40% complete for SRR12850837_pass.fastq
Approx 45% complete for SRR12850837_pass.fastq
Approx 50% complete for SRR12850837_pass.fastq
Approx 55% complete for SRR12850837_pass.fastq
Approx 60% complete for SRR12850837_pass.fastq
Approx 65% complete for SRR12850837_pass.fastq
Approx 70% complete for SRR12850837_pass.fastq
Approx 75% complete for SRR12850837_pass.fastq
Approx 80% complete for SRR12850837_pass.fastq
Approx 85% complete for SRR12850837_pass.fastq
Approx 90% complete for SRR12850837_pass.fastq
Approx 95% complete for SRR12850837_pass.fastq


Analysis complete for SRR12850837_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850838_pass.fastq
Approx 5% complete for SRR12850838_pass.fastq
Approx 10% complete for SRR12850838_pass.fastq
Approx 15% complete for SRR12850838_pass.fastq
Approx 20% complete for SRR12850838_pass.fastq
Approx 25% complete for SRR12850838_pass.fastq
Approx 30% complete for SRR12850838_pass.fastq
Approx 35% complete for SRR12850838_pass.fastq
Approx 40% complete for SRR12850838_pass.fastq
Approx 45% complete for SRR12850838_pass.fastq
Approx 50% complete for SRR12850838_pass.fastq
Approx 55% complete for SRR12850838_pass.fastq
Approx 60% complete for SRR12850838_pass.fastq
Approx 65% complete for SRR12850838_pass.fastq
Approx 70% complete for SRR12850838_pass.fastq
Approx 75% complete for SRR12850838_pass.fastq
Approx 80% complete for SRR12850838_pass.fastq
Approx 85% complete for SRR12850838_pass.fastq
Approx 90% complete for SRR12850838_pass.fastq
Approx 95% complete for SRR12850838_pass.fastq


Analysis complete for SRR12850838_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850839_pass.fastq
Approx 5% complete for SRR12850839_pass.fastq
Approx 10% complete for SRR12850839_pass.fastq
Approx 15% complete for SRR12850839_pass.fastq
Approx 20% complete for SRR12850839_pass.fastq
Approx 25% complete for SRR12850839_pass.fastq
Approx 30% complete for SRR12850839_pass.fastq
Approx 35% complete for SRR12850839_pass.fastq
Approx 40% complete for SRR12850839_pass.fastq
Approx 45% complete for SRR12850839_pass.fastq
Approx 50% complete for SRR12850839_pass.fastq
Approx 55% complete for SRR12850839_pass.fastq
Approx 60% complete for SRR12850839_pass.fastq
Approx 65% complete for SRR12850839_pass.fastq
Approx 70% complete for SRR12850839_pass.fastq
Approx 75% complete for SRR12850839_pass.fastq
Approx 80% complete for SRR12850839_pass.fastq
Approx 85% complete for SRR12850839_pass.fastq
Approx 90% complete for SRR12850839_pass.fastq
Approx 95% complete for SRR12850839_pass.fastq


Analysis complete for SRR12850839_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850840_pass.fastq
Approx 5% complete for SRR12850840_pass.fastq
Approx 10% complete for SRR12850840_pass.fastq
Approx 15% complete for SRR12850840_pass.fastq
Approx 20% complete for SRR12850840_pass.fastq
Approx 25% complete for SRR12850840_pass.fastq
Approx 30% complete for SRR12850840_pass.fastq
Approx 35% complete for SRR12850840_pass.fastq
Approx 40% complete for SRR12850840_pass.fastq
Approx 45% complete for SRR12850840_pass.fastq
Approx 50% complete for SRR12850840_pass.fastq
Approx 55% complete for SRR12850840_pass.fastq
Approx 60% complete for SRR12850840_pass.fastq
Approx 65% complete for SRR12850840_pass.fastq
Approx 70% complete for SRR12850840_pass.fastq
Approx 75% complete for SRR12850840_pass.fastq
Approx 80% complete for SRR12850840_pass.fastq
Approx 85% complete for SRR12850840_pass.fastq
Approx 90% complete for SRR12850840_pass.fastq
Approx 95% complete for SRR12850840_pass.fastq


Analysis complete for SRR12850840_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850841_pass.fastq
Approx 5% complete for SRR12850841_pass.fastq
Approx 10% complete for SRR12850841_pass.fastq
Approx 15% complete for SRR12850841_pass.fastq
Approx 20% complete for SRR12850841_pass.fastq
Approx 25% complete for SRR12850841_pass.fastq
Approx 30% complete for SRR12850841_pass.fastq
Approx 35% complete for SRR12850841_pass.fastq
Approx 40% complete for SRR12850841_pass.fastq
Approx 45% complete for SRR12850841_pass.fastq
Approx 50% complete for SRR12850841_pass.fastq
Approx 55% complete for SRR12850841_pass.fastq
Approx 60% complete for SRR12850841_pass.fastq
Approx 65% complete for SRR12850841_pass.fastq
Approx 70% complete for SRR12850841_pass.fastq
Approx 75% complete for SRR12850841_pass.fastq
Approx 80% complete for SRR12850841_pass.fastq
Approx 85% complete for SRR12850841_pass.fastq
Approx 90% complete for SRR12850841_pass.fastq
Approx 95% complete for SRR12850841_pass.fastq


Analysis complete for SRR12850841_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850842_pass.fastq
Approx 5% complete for SRR12850842_pass.fastq
Approx 10% complete for SRR12850842_pass.fastq
Approx 15% complete for SRR12850842_pass.fastq
Approx 20% complete for SRR12850842_pass.fastq
Approx 25% complete for SRR12850842_pass.fastq
Approx 30% complete for SRR12850842_pass.fastq
Approx 35% complete for SRR12850842_pass.fastq
Approx 40% complete for SRR12850842_pass.fastq
Approx 45% complete for SRR12850842_pass.fastq
Approx 50% complete for SRR12850842_pass.fastq
Approx 55% complete for SRR12850842_pass.fastq
Approx 60% complete for SRR12850842_pass.fastq
Approx 65% complete for SRR12850842_pass.fastq
Approx 70% complete for SRR12850842_pass.fastq
Approx 75% complete for SRR12850842_pass.fastq
Approx 80% complete for SRR12850842_pass.fastq
Approx 85% complete for SRR12850842_pass.fastq
Approx 90% complete for SRR12850842_pass.fastq
Approx 95% complete for SRR12850842_pass.fastq


Analysis complete for SRR12850842_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850843_pass.fastq
Approx 5% complete for SRR12850843_pass.fastq
Approx 10% complete for SRR12850843_pass.fastq
Approx 15% complete for SRR12850843_pass.fastq
Approx 20% complete for SRR12850843_pass.fastq
Approx 25% complete for SRR12850843_pass.fastq
Approx 30% complete for SRR12850843_pass.fastq
Approx 35% complete for SRR12850843_pass.fastq
Approx 40% complete for SRR12850843_pass.fastq
Approx 45% complete for SRR12850843_pass.fastq
Approx 50% complete for SRR12850843_pass.fastq
Approx 55% complete for SRR12850843_pass.fastq
Approx 60% complete for SRR12850843_pass.fastq
Approx 65% complete for SRR12850843_pass.fastq
Approx 70% complete for SRR12850843_pass.fastq
Approx 75% complete for SRR12850843_pass.fastq
Approx 80% complete for SRR12850843_pass.fastq
Approx 85% complete for SRR12850843_pass.fastq
Approx 90% complete for SRR12850843_pass.fastq
Approx 95% complete for SRR12850843_pass.fastq


Analysis complete for SRR12850843_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850844_pass.fastq
Approx 5% complete for SRR12850844_pass.fastq
Approx 10% complete for SRR12850844_pass.fastq
Approx 15% complete for SRR12850844_pass.fastq
Approx 20% complete for SRR12850844_pass.fastq
Approx 25% complete for SRR12850844_pass.fastq
Approx 30% complete for SRR12850844_pass.fastq
Approx 35% complete for SRR12850844_pass.fastq
Approx 40% complete for SRR12850844_pass.fastq
Approx 45% complete for SRR12850844_pass.fastq
Approx 50% complete for SRR12850844_pass.fastq
Approx 55% complete for SRR12850844_pass.fastq
Approx 60% complete for SRR12850844_pass.fastq
Approx 65% complete for SRR12850844_pass.fastq
Approx 70% complete for SRR12850844_pass.fastq
Approx 75% complete for SRR12850844_pass.fastq
Approx 80% complete for SRR12850844_pass.fastq
Approx 85% complete for SRR12850844_pass.fastq
Approx 90% complete for SRR12850844_pass.fastq
Approx 95% complete for SRR12850844_pass.fastq


Analysis complete for SRR12850844_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850845_pass.fastq
Approx 5% complete for SRR12850845_pass.fastq
Approx 10% complete for SRR12850845_pass.fastq
Approx 15% complete for SRR12850845_pass.fastq
Approx 20% complete for SRR12850845_pass.fastq
Approx 25% complete for SRR12850845_pass.fastq
Approx 30% complete for SRR12850845_pass.fastq
Approx 35% complete for SRR12850845_pass.fastq
Approx 40% complete for SRR12850845_pass.fastq
Approx 45% complete for SRR12850845_pass.fastq
Approx 50% complete for SRR12850845_pass.fastq
Approx 55% complete for SRR12850845_pass.fastq
Approx 60% complete for SRR12850845_pass.fastq
Approx 65% complete for SRR12850845_pass.fastq
Approx 70% complete for SRR12850845_pass.fastq
Approx 75% complete for SRR12850845_pass.fastq
Approx 80% complete for SRR12850845_pass.fastq
Approx 85% complete for SRR12850845_pass.fastq
Approx 90% complete for SRR12850845_pass.fastq
Approx 95% complete for SRR12850845_pass.fastq


Analysis complete for SRR12850845_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850846_pass.fastq
Approx 5% complete for SRR12850846_pass.fastq
Approx 10% complete for SRR12850846_pass.fastq
Approx 15% complete for SRR12850846_pass.fastq
Approx 20% complete for SRR12850846_pass.fastq
Approx 25% complete for SRR12850846_pass.fastq
Approx 30% complete for SRR12850846_pass.fastq
Approx 35% complete for SRR12850846_pass.fastq
Approx 40% complete for SRR12850846_pass.fastq
Approx 45% complete for SRR12850846_pass.fastq
Approx 50% complete for SRR12850846_pass.fastq
Approx 55% complete for SRR12850846_pass.fastq
Approx 60% complete for SRR12850846_pass.fastq
Approx 65% complete for SRR12850846_pass.fastq
Approx 70% complete for SRR12850846_pass.fastq
Approx 75% complete for SRR12850846_pass.fastq
Approx 80% complete for SRR12850846_pass.fastq
Approx 85% complete for SRR12850846_pass.fastq
Approx 90% complete for SRR12850846_pass.fastq
Approx 95% complete for SRR12850846_pass.fastq


Analysis complete for SRR12850846_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850847_pass.fastq
Approx 5% complete for SRR12850847_pass.fastq
Approx 10% complete for SRR12850847_pass.fastq
Approx 15% complete for SRR12850847_pass.fastq
Approx 20% complete for SRR12850847_pass.fastq
Approx 25% complete for SRR12850847_pass.fastq
Approx 30% complete for SRR12850847_pass.fastq
Approx 35% complete for SRR12850847_pass.fastq
Approx 40% complete for SRR12850847_pass.fastq
Approx 45% complete for SRR12850847_pass.fastq
Approx 50% complete for SRR12850847_pass.fastq
Approx 55% complete for SRR12850847_pass.fastq
Approx 60% complete for SRR12850847_pass.fastq
Approx 65% complete for SRR12850847_pass.fastq
Approx 70% complete for SRR12850847_pass.fastq
Approx 75% complete for SRR12850847_pass.fastq
Approx 80% complete for SRR12850847_pass.fastq
Approx 85% complete for SRR12850847_pass.fastq
Approx 90% complete for SRR12850847_pass.fastq
Approx 95% complete for SRR12850847_pass.fastq


Analysis complete for SRR12850847_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850848_pass.fastq
Approx 5% complete for SRR12850848_pass.fastq
Approx 10% complete for SRR12850848_pass.fastq
Approx 15% complete for SRR12850848_pass.fastq
Approx 20% complete for SRR12850848_pass.fastq
Approx 25% complete for SRR12850848_pass.fastq
Approx 30% complete for SRR12850848_pass.fastq
Approx 35% complete for SRR12850848_pass.fastq
Approx 40% complete for SRR12850848_pass.fastq
Approx 45% complete for SRR12850848_pass.fastq
Approx 50% complete for SRR12850848_pass.fastq
Approx 55% complete for SRR12850848_pass.fastq
Approx 60% complete for SRR12850848_pass.fastq
Approx 65% complete for SRR12850848_pass.fastq
Approx 70% complete for SRR12850848_pass.fastq
Approx 75% complete for SRR12850848_pass.fastq
Approx 80% complete for SRR12850848_pass.fastq
Approx 85% complete for SRR12850848_pass.fastq
Approx 90% complete for SRR12850848_pass.fastq
Approx 95% complete for SRR12850848_pass.fastq


Analysis complete for SRR12850848_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850849_pass.fastq
Approx 5% complete for SRR12850849_pass.fastq
Approx 10% complete for SRR12850849_pass.fastq
Approx 15% complete for SRR12850849_pass.fastq
Approx 20% complete for SRR12850849_pass.fastq
Approx 25% complete for SRR12850849_pass.fastq
Approx 30% complete for SRR12850849_pass.fastq
Approx 35% complete for SRR12850849_pass.fastq
Approx 40% complete for SRR12850849_pass.fastq
Approx 45% complete for SRR12850849_pass.fastq
Approx 50% complete for SRR12850849_pass.fastq
Approx 55% complete for SRR12850849_pass.fastq
Approx 60% complete for SRR12850849_pass.fastq
Approx 65% complete for SRR12850849_pass.fastq
Approx 70% complete for SRR12850849_pass.fastq
Approx 75% complete for SRR12850849_pass.fastq
Approx 80% complete for SRR12850849_pass.fastq
Approx 85% complete for SRR12850849_pass.fastq
Approx 90% complete for SRR12850849_pass.fastq
Approx 95% complete for SRR12850849_pass.fastq


Analysis complete for SRR12850849_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850850_pass.fastq
Approx 5% complete for SRR12850850_pass.fastq
Approx 10% complete for SRR12850850_pass.fastq
Approx 15% complete for SRR12850850_pass.fastq
Approx 20% complete for SRR12850850_pass.fastq
Approx 25% complete for SRR12850850_pass.fastq
Approx 30% complete for SRR12850850_pass.fastq
Approx 35% complete for SRR12850850_pass.fastq
Approx 40% complete for SRR12850850_pass.fastq
Approx 45% complete for SRR12850850_pass.fastq
Approx 50% complete for SRR12850850_pass.fastq
Approx 55% complete for SRR12850850_pass.fastq
Approx 60% complete for SRR12850850_pass.fastq
Approx 65% complete for SRR12850850_pass.fastq
Approx 70% complete for SRR12850850_pass.fastq
Approx 75% complete for SRR12850850_pass.fastq
Approx 80% complete for SRR12850850_pass.fastq
Approx 85% complete for SRR12850850_pass.fastq
Approx 90% complete for SRR12850850_pass.fastq
Approx 95% complete for SRR12850850_pass.fastq


Analysis complete for SRR12850850_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850851_pass.fastq
Approx 5% complete for SRR12850851_pass.fastq
Approx 10% complete for SRR12850851_pass.fastq
Approx 15% complete for SRR12850851_pass.fastq
Approx 20% complete for SRR12850851_pass.fastq
Approx 25% complete for SRR12850851_pass.fastq
Approx 30% complete for SRR12850851_pass.fastq
Approx 35% complete for SRR12850851_pass.fastq
Approx 40% complete for SRR12850851_pass.fastq
Approx 45% complete for SRR12850851_pass.fastq
Approx 50% complete for SRR12850851_pass.fastq
Approx 55% complete for SRR12850851_pass.fastq
Approx 60% complete for SRR12850851_pass.fastq
Approx 65% complete for SRR12850851_pass.fastq
Approx 70% complete for SRR12850851_pass.fastq
Approx 75% complete for SRR12850851_pass.fastq
Approx 80% complete for SRR12850851_pass.fastq
Approx 85% complete for SRR12850851_pass.fastq
Approx 90% complete for SRR12850851_pass.fastq
Approx 95% complete for SRR12850851_pass.fastq


Analysis complete for SRR12850851_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850852_pass.fastq
Approx 5% complete for SRR12850852_pass.fastq
Approx 10% complete for SRR12850852_pass.fastq
Approx 15% complete for SRR12850852_pass.fastq
Approx 20% complete for SRR12850852_pass.fastq
Approx 25% complete for SRR12850852_pass.fastq
Approx 30% complete for SRR12850852_pass.fastq
Approx 35% complete for SRR12850852_pass.fastq
Approx 40% complete for SRR12850852_pass.fastq
Approx 45% complete for SRR12850852_pass.fastq
Approx 50% complete for SRR12850852_pass.fastq
Approx 55% complete for SRR12850852_pass.fastq
Approx 60% complete for SRR12850852_pass.fastq
Approx 65% complete for SRR12850852_pass.fastq
Approx 70% complete for SRR12850852_pass.fastq
Approx 75% complete for SRR12850852_pass.fastq
Approx 80% complete for SRR12850852_pass.fastq
Approx 85% complete for SRR12850852_pass.fastq
Approx 90% complete for SRR12850852_pass.fastq
Approx 95% complete for SRR12850852_pass.fastq


Analysis complete for SRR12850852_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850853_pass.fastq
Approx 5% complete for SRR12850853_pass.fastq
Approx 10% complete for SRR12850853_pass.fastq
Approx 15% complete for SRR12850853_pass.fastq
Approx 20% complete for SRR12850853_pass.fastq
Approx 25% complete for SRR12850853_pass.fastq
Approx 30% complete for SRR12850853_pass.fastq
Approx 35% complete for SRR12850853_pass.fastq
Approx 40% complete for SRR12850853_pass.fastq
Approx 45% complete for SRR12850853_pass.fastq
Approx 50% complete for SRR12850853_pass.fastq
Approx 55% complete for SRR12850853_pass.fastq
Approx 60% complete for SRR12850853_pass.fastq
Approx 65% complete for SRR12850853_pass.fastq
Approx 70% complete for SRR12850853_pass.fastq
Approx 75% complete for SRR12850853_pass.fastq
Approx 80% complete for SRR12850853_pass.fastq
Approx 85% complete for SRR12850853_pass.fastq
Approx 90% complete for SRR12850853_pass.fastq
Approx 95% complete for SRR12850853_pass.fastq


Analysis complete for SRR12850853_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850854_pass.fastq
Approx 5% complete for SRR12850854_pass.fastq
Approx 10% complete for SRR12850854_pass.fastq
Approx 15% complete for SRR12850854_pass.fastq
Approx 20% complete for SRR12850854_pass.fastq
Approx 25% complete for SRR12850854_pass.fastq
Approx 30% complete for SRR12850854_pass.fastq
Approx 35% complete for SRR12850854_pass.fastq
Approx 40% complete for SRR12850854_pass.fastq
Approx 45% complete for SRR12850854_pass.fastq
Approx 50% complete for SRR12850854_pass.fastq
Approx 55% complete for SRR12850854_pass.fastq
Approx 60% complete for SRR12850854_pass.fastq
Approx 65% complete for SRR12850854_pass.fastq
Approx 70% complete for SRR12850854_pass.fastq
Approx 75% complete for SRR12850854_pass.fastq
Approx 80% complete for SRR12850854_pass.fastq
Approx 85% complete for SRR12850854_pass.fastq
Approx 90% complete for SRR12850854_pass.fastq
Approx 95% complete for SRR12850854_pass.fastq


Analysis complete for SRR12850854_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850855_pass.fastq
Approx 5% complete for SRR12850855_pass.fastq
Approx 10% complete for SRR12850855_pass.fastq
Approx 15% complete for SRR12850855_pass.fastq
Approx 20% complete for SRR12850855_pass.fastq
Approx 25% complete for SRR12850855_pass.fastq
Approx 30% complete for SRR12850855_pass.fastq
Approx 35% complete for SRR12850855_pass.fastq
Approx 40% complete for SRR12850855_pass.fastq
Approx 45% complete for SRR12850855_pass.fastq
Approx 50% complete for SRR12850855_pass.fastq
Approx 55% complete for SRR12850855_pass.fastq
Approx 60% complete for SRR12850855_pass.fastq
Approx 65% complete for SRR12850855_pass.fastq
Approx 70% complete for SRR12850855_pass.fastq
Approx 75% complete for SRR12850855_pass.fastq
Approx 80% complete for SRR12850855_pass.fastq
Approx 85% complete for SRR12850855_pass.fastq
Approx 90% complete for SRR12850855_pass.fastq
Approx 95% complete for SRR12850855_pass.fastq


Analysis complete for SRR12850855_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850856_pass.fastq
Approx 5% complete for SRR12850856_pass.fastq
Approx 10% complete for SRR12850856_pass.fastq
Approx 15% complete for SRR12850856_pass.fastq
Approx 20% complete for SRR12850856_pass.fastq
Approx 25% complete for SRR12850856_pass.fastq
Approx 30% complete for SRR12850856_pass.fastq
Approx 35% complete for SRR12850856_pass.fastq
Approx 40% complete for SRR12850856_pass.fastq
Approx 45% complete for SRR12850856_pass.fastq
Approx 50% complete for SRR12850856_pass.fastq
Approx 55% complete for SRR12850856_pass.fastq
Approx 60% complete for SRR12850856_pass.fastq
Approx 65% complete for SRR12850856_pass.fastq
Approx 70% complete for SRR12850856_pass.fastq
Approx 75% complete for SRR12850856_pass.fastq
Approx 80% complete for SRR12850856_pass.fastq
Approx 85% complete for SRR12850856_pass.fastq
Approx 90% complete for SRR12850856_pass.fastq
Approx 95% complete for SRR12850856_pass.fastq


Analysis complete for SRR12850856_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850857_pass.fastq
Approx 5% complete for SRR12850857_pass.fastq
Approx 10% complete for SRR12850857_pass.fastq
Approx 15% complete for SRR12850857_pass.fastq
Approx 20% complete for SRR12850857_pass.fastq
Approx 25% complete for SRR12850857_pass.fastq
Approx 30% complete for SRR12850857_pass.fastq
Approx 35% complete for SRR12850857_pass.fastq
Approx 40% complete for SRR12850857_pass.fastq
Approx 45% complete for SRR12850857_pass.fastq
Approx 50% complete for SRR12850857_pass.fastq
Approx 55% complete for SRR12850857_pass.fastq
Approx 60% complete for SRR12850857_pass.fastq
Approx 65% complete for SRR12850857_pass.fastq
Approx 70% complete for SRR12850857_pass.fastq
Approx 75% complete for SRR12850857_pass.fastq
Approx 80% complete for SRR12850857_pass.fastq
Approx 85% complete for SRR12850857_pass.fastq
Approx 90% complete for SRR12850857_pass.fastq
Approx 95% complete for SRR12850857_pass.fastq


Analysis complete for SRR12850857_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850858_pass.fastq
Approx 5% complete for SRR12850858_pass.fastq
Approx 10% complete for SRR12850858_pass.fastq
Approx 15% complete for SRR12850858_pass.fastq
Approx 20% complete for SRR12850858_pass.fastq
Approx 25% complete for SRR12850858_pass.fastq
Approx 30% complete for SRR12850858_pass.fastq
Approx 35% complete for SRR12850858_pass.fastq
Approx 40% complete for SRR12850858_pass.fastq
Approx 45% complete for SRR12850858_pass.fastq
Approx 50% complete for SRR12850858_pass.fastq
Approx 55% complete for SRR12850858_pass.fastq
Approx 60% complete for SRR12850858_pass.fastq
Approx 65% complete for SRR12850858_pass.fastq
Approx 70% complete for SRR12850858_pass.fastq
Approx 75% complete for SRR12850858_pass.fastq
Approx 80% complete for SRR12850858_pass.fastq
Approx 85% complete for SRR12850858_pass.fastq
Approx 90% complete for SRR12850858_pass.fastq
Approx 95% complete for SRR12850858_pass.fastq


Analysis complete for SRR12850858_pass.fastq


Unknown option: readfilesin
Started analysis of SRR12850859_pass.fastq
Approx 5% complete for SRR12850859_pass.fastq
Approx 10% complete for SRR12850859_pass.fastq
Approx 15% complete for SRR12850859_pass.fastq
Approx 20% complete for SRR12850859_pass.fastq
Approx 25% complete for SRR12850859_pass.fastq
Approx 30% complete for SRR12850859_pass.fastq
Approx 35% complete for SRR12850859_pass.fastq
Approx 40% complete for SRR12850859_pass.fastq
Approx 45% complete for SRR12850859_pass.fastq
Approx 50% complete for SRR12850859_pass.fastq
Approx 55% complete for SRR12850859_pass.fastq
Approx 60% complete for SRR12850859_pass.fastq
Approx 65% complete for SRR12850859_pass.fastq
Approx 70% complete for SRR12850859_pass.fastq
Approx 75% complete for SRR12850859_pass.fastq
Approx 80% complete for SRR12850859_pass.fastq
Approx 85% complete for SRR12850859_pass.fastq
Approx 90% complete for SRR12850859_pass.fastq
Approx 95% complete for SRR12850859_pass.fastq


Analysis complete for SRR12850859_pass.fastq


### STAR

Afterwards, we will now preprocess the data with STAR, an aligner for RNA-seq data mapping. Like in the paper of *Nativio et al.*, we will align our RNA-seq reads to the human reference genome (assembly GRCh37.75/hg19) using [STAR](https://github.com/alexdobin/STAR) with default parameters.
The [fasta](http://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/) and the [gtf](http://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/) file used for the genomeGenerate run were downloaded from Ensembl. 

This will create .sam files out of the .fastq files. The logfiles for all sequences are saved in the folder MultiQC_RNA.

In [None]:
# get Genome Index (assembly GRCh37.75)
genomeIndex = "STAR --runThreadN 4 \
               --runMode genomeGenerate \
               --genomeDir " + path_genomeDir + " \
               --genomeFastaFiles " + path + "Homo_sapiens.GRCh37.75.dna_sm.primary_assembly.fa \
               --sjdbGTFfile " + path + "Homo_sapiens.GRCh37.75.gtf"
subprocess.call(genomeIndex, shell=True)

The logfile for genomeGenerate included many warnings for assembly GRCh37.75/h19, so the most recent version available was taken from Ensembl to check. The [fasta](http://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/) (primary assembly soft masked) and the [gtf](http://ftp.ensembl.org/pub/release-106/gtf/homo_sapiens/) file can be found following the links.

In the check with the most recent version of GRCh38, the warnings were absent. 
The problem was that the patches, which suggest how gaps in the genome sequence can be filled, are not integrated in the version GRCh37.75/h19, but in the most recent version of GRCh38.

In [None]:
# get Genome Index (most recent)
genomeIndex = "STAR --runThreadN 4 \
               --runMode genomeGenerate \
               --genomeDir " + path_genomeDir + " \
               --genomeFastaFiles " + path + "Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa \
               --sjdbGTFfile " + path + "Homo_sapiens.GRCh38.106.gtf"
subprocess.call(genomeIndex, shell=True)

To get rid of the warnings, the patches were removed from the assembly and the Genome Index was built with the new file. Due to the removed patches the read count can be distorted because reads that should align to the patches align somewhere else.

The argument `genomeSAindexNbases` was added and set to 12 according to the formula min(14, log2(GenomeLength)/2-1). The value is 14 by default and typically between 10 and 15. Longer strings use more memory but allow faster searches. Also, `genomeSAsparseD` was added and set to 3 because of the memory. This are also the recommended values for 16GB RAM from the inventor of STAR.

In [None]:
genomeIndex = "STAR --runThreadN 4 \
               --runMode genomeGenerate \
               --genomeDir " + path_genomeDir + " \
               --genomeFastaFiles " + path + "Homo_sapiens.GRCh37.75.dna_sm.primary_assembly.fa \
               --sjdbGTFfile " + path + "Homo_sapiens.GRCh37.75.without_patch.gtf \
               --genomeSAindexNbases 10"
subprocess.call(genomeIndex, shell=True)

In [None]:
# Run mapping
for fastq in sra_numbers:
    mapping = "STAR --runThreadN 4 \
               --genomeDir " + path_genomeDir + " \
               --readFilesIn " + path_fastq + fastq + "_pass.fastq \
               --outFileNamePrefix " + path + fastq + "/"
    subprocess.call(mapping, shell=True)

In the end, the STAR process was killed on several computers in the step "... sorting Suffix Array chunks and saving them to disk...". This also happened with `genomeSAindexNbases 10`. We decided to apply a pseudoalignment with kallisto instead, since it needs much less memory.

### kallisto

First of all, we build an index with [kallisto](https://github.com/pachterlab/kallisto) to from the file [Homo_sapiens.GRCh37.75.cdna.all.fa.gz](http://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/cdna/).

In [4]:
import subprocess

kallistoIndex = "kallisto index -i Homo_sapiens.idx /mnt/c/Users/Christina/Documents/GitHub/DataScience_finalProjekt_Group2/data/Homo_sapiens.GRCh37.75.cdna.all.fa.gz"
subprocess.call(kallistoIndex, shell=True)


[build] loading fasta file /mnt/c/Users/Christina/Documents/GitHub/DataScience_finalProjekt_Group2/data/Homo_sapiens.GRCh37.75.cdna.all.fa.gz
[build] k-mer length: 31
        from 1401 target sequences
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done 
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 1022307 contigs and contains 101446106 k-mers 



0

In the second step, we align all our RNA sequences. Because we want .bam files for featureCounts, the parameters `genomebam` and `gtf` need to be added.

In [2]:
for fastq in sra_numbers:
    kallisto = "(kallisto quant -t 2 --single -l 200 -s 30 \
                --genomebam --gtf /mnt/c/Users/Christina/Documents/GitHub/DataScience_finalProjekt_Group2/data/Homo_sapiens.GRCh37.75.without_patch.gtf \
                -i Homo_sapiens.idx \
                -o /mnt/c/Users/Christina/Documents/GitHub/DataScience_finalProjekt_Group2/data/" + fastq + " \
                /mnt/c/Users/Christina/Documents/GitHub/DataScience_finalProjekt_Group2/data/fastq/" + fastq + "_pass.fastq) \
                2>" + fastq + ".log"
    subprocess.call(kallisto, shell=True)

### featureCounts

Finally, they used featureCounts in the paper. featureCounts is part of the [Subread/RSubread](http://subread.sourceforge.net/) package.

In [9]:
path = "/mnt/c/Users/Christina/Documents/GitHub/DataScience_finalProjekt_Group2/"

featurecounts = "(/home/christina/subread-2.0.3-Linux-x86_64/bin/featureCounts -T 2 -t exon -g gene_id \
                 -a " + path + "data/Homo_sapiens.GRCh37.75.gtf \
                 -o " + path + "/MultiQC_RNA/counts.txt \
                 " + path + "/data/*.bam) \
                 2> counts.log"
subprocess.call(featurecounts, shell=True)

0

### MultiQC

Now all the generated reports and logfiles for the RNA-seq will be summarized by the tool [MultiQC](https://multiqc.info/).

Note that MultiQC can not run in JupyterLab because JavaScript is blocked and so the figures in the report are not created. Alternatively, it can run in JupyterNotebook or Visual Studio Code. It can also be applied through running the multiqc command in the terminal.

In [None]:
#multiqc = "multiqc /mnt/c/Users/Christina/Documents/GitHub/DataScience_finalProjekt_Group2/MultiQC_RNA/"
#subprocess.call(multiqc, shell=True)

import multiqc
multiqc.run("/mnt/c/Users/Christina/Documents/GitHub/DataScience_finalProjekt_Group2/MultiQC_RNA/")

## Quality Control ChIP-seq

To get an insight into the quality of the BigWig files which are used for the peak analysis, they were converted to fastq files so that FastQC could be applied. The following commands were used for the conversion, in this example the file GSM3752862_O-10A-H3K27ac.bw was converted:

`bigWigTowig GSM3752862_O-10A-H3K27ac.bw GSM3752862_O-10A-H3K27ac.wig`

For the `wig2bed` command, the -x was added. `bigWigToWig` generates zero-indexed, half-open WIG files. The option allows to generate coordinate output without reindexing.

`wig2bed -x  < GSM3752862_O-10A-H3K27ac.wig > GSM3752862_O-10A-H3K27ac.bed`

The file [hg19.fa](https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/) was downloaded from the link and unzipped before the first usage of bedtools.

`bedtools getfasta -fi hg19.fa -bed GSM3752862_O-10A-H3K27ac.bed -fo GSM3752862_O-10A-H3K27ac.fa`

The perl script [fasta_to_fastq.pl](https://github.com/ekg/fasta-to-fastq) was downloaded from GitHub.

`perl fasta_to_fastq.pl GSM3752862_O-10A-H3K27ac.fa > GSM3752862_O-10A-H3K27ac.fq`

Afterwards, FastQC was applied and a summary report was created with MultiQC. The reports can be found in the repository.

## Data Analysis RNA-seq

## Data Analysis ChIP-seq