Skip to content

rforbiodatascience21/2021_group20_final_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

R Notebook

2021_group20_final_project

Final project for the course “22100 - R for Bio Data Science”. The contributors to the project are Mikkel Spallou Eriksen, Rut Mas de Les Valls, Anna Oliver Almirall, Paul Jeremy Simon and Elisa Catafal Tardos.

Dataset

Original paper : Yeoh EJ, Ross ME, Shurtleff SA, et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002;1(2):133-143. doi:10.1016/s1535-6108(02)00032-6

The dataset has been downloaded from

library(stjudem)

It has been tidied to reach a clean dataset similar to the one given here.

Outline

All R scripts are available in the folder R/
1. 01_load loads the dataset and writes it into tsv.gz files
2. 02_clean filters cancer types and convert probe names into gene names
3. 03_pre_analysis selects the top40 differetially expressed gene for each cancer subtype
4. 04_plot performs several kind of statistical descriptive plots
5. 05_plots_heatmap aims at reproducing heatmaps from the paper
6. 06_kmeans performs a k means clustering on raw dataset and top40 genes dataset
7. 07_pca performs a PCA on raw dataset and top40 genes dataset
8. 08_regression.R performs a simple multinomial regression

A shiny app with plots has been created and is available here.

Presentation

The final presentation can be found under doc/presentation.html

Releases

No releases published

Packages

No packages published

Languages