Skip to content

leofabrizio/somatic_mutation_ml

Repository files navigation

somatic_mutation_ml

Classification and prediction of cancer type using somatic mutation profiles and machine learning approaches

This repository contains a dataset including information about somatic mutations in cancer patients as well as the type of cancer for each patient. In particular, data nare contained in three text files:

snvs.txt -> mutation file for each patient. Each row is assigned to a specific patient. The first element of the row is the patient ID whereas the following elements are the names of mutated genes in that patient. Each element is separated by a TAB character.

samples_labels.txt -> each row is a patient. The first element of the row is the patient ID whereas the second element is the cancer type.

Compendium_Cancer_Genes.txt -> it is the list of genes that are considered as relevant in cancer development. This list is useful in case it would not be possible to consider all the genes in the analyses.

Please see the following website for more details: https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga

The project aimed at answering to three different questions, i.e.:

1) Is it possible to predict cancer type based on genes with somatic mutation in a patient?

2) Is there a «small» set of genes having a good predictive power, or at least as good as the entire set of genes?

3) Does the patient grouping based on similarity of mutated genes reflect the grouping based on cancer type?

Using R software I answered to these three questions. The file preprocessing.R contains the R code I wrote to do some data preprocessing whereas the file analyses.R contains the R code I wrote to answer the research questions using machine learning methods.

Finally, the file project_work_ENG.pdf contains a powerpoint presentation highlighting the main results of my analyses.

This project was done and defended as a requirement for the 2nd level Master in "Machine learning and big data for precision medicine and biomedical research" of the Università degli Studi di Padova, Italy.

About

Classification and prediction of cancer type using somatic mutation profiles and machine learning approaches

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages