-
Notifications
You must be signed in to change notification settings - Fork 1
Day 13
I started the day reading the DESeq2 documentation, tried to understand what's in the result (differential expression) data and tried to compare it with the data containing normalized counts of reads to get a grip what will I do next.
Differential Expression Data
Normalized Counts Data
I found that the differential expression data has eight columns, two of them are the same as the ones in the normalized data (EnsemblID
and symbol
) and the other six have to be calculated using statistical methods. To calculate these values I decided to code on Jupyter Notebook. The code for this analysis is saved to /src folder titled parkinsonDE.ipynb (the code will always be updated).
Since the real data is big, I decided to create a copy of the data called test.txt which contains only three different genes, thus I don't have to test it on all of the data unless the code is fully written. Using Python pandas
and scipy
library I could calculate several values like the base mean
, fold change
and log 2 fold change
but the values are very different from the DESeq2 results and as far as I know, this was caused by the fact that the values from DESeq2 results were shrunk for more precision.
I then decided to just analyze the differential expressed gene data because I can come up with more complicated mathematical equations.