Skip to content
Haries Ramdhani edited this page Feb 7, 2017 · 3 revisions

Day 13 - February 3, 2017

I started the day reading the DESeq2 documentation, tried to understand what's in the result (differential expression) data and tried to compare it with the data containing normalized counts of reads to get a grip what will I do next.

Differential Expression Data

-h Ubuntu terminal

Normalized Counts Data

-h Ubuntu terminal

I found that the differential expression data has eight columns, two of them are the same as the ones in the normalized data (EnsemblID and symbol) and the other six have to be calculated using statistical methods. To calculate these values I decided to code on Jupyter Notebook. The code for this analysis is saved to /src folder titled parkinsonDE.ipynb (the code will always be updated).

Since the real data is big, I decided to create a copy of the data called test.txt which contains only three different genes, thus I don't have to test it on all of the data unless the code is fully written. Using Python pandas and scipy library I could calculate several values like the base mean, fold change and log 2 fold change but the values are very different from the DESeq2 results and as far as I know, this was caused by the fact that the values from DESeq2 results were shrunk for more precision.

-h Ubuntu terminal

I then decided to just analyze the differential expressed gene data because I can come up with more complicated mathematical equations.

Clone this wiki locally