-
Notifications
You must be signed in to change notification settings - Fork 1
Day 6
Today, I decided to learn how to find differentially expressed genes (again) then generating Heatmaps
from the RNA-Seq data using ComplexHeatmap
library from Bioconductor R. For this purpose I followed the edgeR tutorial (command edgeRUserGuide
) for finding the differentially expressed genes and this tutorial to learn how to generate Heatmaps from biological data.
For the data I used GSE93299, RNA-seq of zebrafish ZMEL1 melanoma cells versus BRAF inhibitor resistant ZMELR1 melanoma cells (actually the reason I chose this data is because I couldn't find any appropriate data that fitted the tutorial thus it could be easily manipulated). I used the raw data titled 'GSE93299_ZMELR1_HTSeq_Counts_GEO.xlsx' since it was in .xlsx extension, I converted it to .tsv file so itd'd be easily manipulated (for this purpose also I also uploaded the data to the ./data folder).
The heatmap was generated by looking at the number of reads, it only shows the first 50 genes with abundant reads and it the genes were also clustered.
In the meantime I also:
- Took a look on Text Mining in R article[1]
- Took a look at Heatmap examples on Bioconductor and tried to understand Heatmap deeper[2][3][4][5][6]
- Learned about clustering RNA-seq data[7][8][9][10][11]
- Tried to understand, analyze and learn how GEOparse codes works, read the documentations[12]
- Discovered new databases for cancer data [13]
- Learned how to distinguish RPKM and FPKM; counts data, normalized data and log2 data.
- Took a look at SRA database[14]
and here comes the best part:
- I got to learn python Pandas again[15]