Skip to content
Haries Ramdhani edited this page Jan 26, 2017 · 2 revisions

Day 6 - January 25, 2017

Today, I decided to learn how to find differentially expressed genes (again) then generating Heatmaps from the RNA-Seq data using ComplexHeatmap library from Bioconductor R. For this purpose I followed the edgeR tutorial (command edgeRUserGuide) for finding the differentially expressed genes and this tutorial to learn how to generate Heatmaps from biological data.

For the data I used GSE93299, RNA-seq of zebrafish ZMEL1 melanoma cells versus BRAF inhibitor resistant ZMELR1 melanoma cells (actually the reason I chose this data is because I couldn't find any appropriate data that fitted the tutorial thus it could be easily manipulated). I used the raw data titled 'GSE93299_ZMELR1_HTSeq_Counts_GEO.xlsx' since it was in .xlsx extension, I converted it to .tsv file so itd'd be easily manipulated (for this purpose also I also uploaded the data to the ./data folder).

The heatmap was generated by looking at the number of reads, it only shows the first 50 genes with abundant reads and it the genes were also clustered.

-h Ubuntu terminal

In the meantime I also:

  • Took a look on Text Mining in R article[1]
  • Took a look at Heatmap examples on Bioconductor and tried to understand Heatmap deeper[2][3][4][5][6]
  • Learned about clustering RNA-seq data[7][8][9][10][11]
  • Tried to understand, analyze and learn how GEOparse codes works, read the documentations[12]
  • Discovered new databases for cancer data [13]
  • Learned how to distinguish RPKM and FPKM; counts data, normalized data and log2 data.
  • Took a look at SRA database[14]

and here comes the best part:


source

  • I got to learn python Pandas again[15]
Clone this wiki locally