## About

In lights of the success of the ProTargetMiner study (Saei et al. - ProTargetMiner as a proteome signature library of anticancer molecules for functional discovery) further data mining from the same data set, but extended with the proteome of dying cells will be conducted in this project. The overall goal of this project is to analyze the proteome for living and dying cells to understand cell death, specifically to find proteome signatures relating to cell death regardless of which treatment or cell line the cells belongs to, as well as be able to identify different apoptotic processes in the cell. The dataset consists of living and dead cells of three cell lines (cancer cell lines; A549, RKO and MCF-7). The apoptotic process has been introduced by nine different cancer treatments (8-zaguanine, Raltitrexed, Topotecan, Floxuridine, Nutlin, Dasatinib, Gefitinib, Vincristine and Bortezomib), and there are three replicates, meaning that in total there are 60 different conditions in 3 replicates.  

For this project there is a couple of things we would like to investigate:

1) The number of apoptotic pathways. It has been suggested that there are at least two major signalling pathways trigger apoptotic cell death in [Geen et al.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4665079/), but we would like to investigate exactly how many there are. 
2) Proteome signature for surviving and dying cells regardless of treatment or cell line. To do this we would need to investigate target regulation in dying vs. surviving cells for each treatment and cell line as well.   

The project is related to:

[ProTargetMiner as a proteome signature library of anticancer molecules for functional discovery](https://www.nature.com/articles/s41467-019-13582-8)

[Comparative Proteomics of Dying and Surviving Cancer Cells Improves the Identification of Drug Targets and Sheds Light on Cell Life / Death Decisions](https://pubmed.ncbi.nlm.nih.gov/29572246/)


## Problem

The apoptotic process is related to many human diseases, which may result when cells die that shouldn't and other live that should die. Modulation of apoptotic processes may therefore offer valuable methods of treatment ([Renehan et al.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1120576/)). Understanding the apoptotic process is therefore very important. A dataset used in [Saei et al.](https://www.nature.com/articles/s41467-019-13582-8) proved to be of valuable for uncovering protein signatures relating to drug compound targets and actions mechnisms. Therefore an extension of this dataset with the proteome of the dead cell states gives hopes in uncovering information about the apoptotic process. 


## Preliminary Research Question

The projects aims answer the following questions:
- Can we uncover information about the apoptotic processes from the an extended dataset?

This by answering the following sub-questions:
- Can we differentiate the live and dead cells by the proteome? I.e. is it possible to identify protein signatures relating to the states alive and dead.
- If we can identify the difference between alive and dead states; what are the up- and down-regulated proteins regardless of drug targets?
- What are the target regulation for dying vs surviving cells?
- Are there specific proteins that are unaltered throughout?
- (What is the difference in protein correlation between dying nad surviving cells?)


## Limitations 
My role in this project will be limited to the data analysis of provided data set. 

## Previous Studies

### The importance of apoptosis for medicin
Apoptosis is the the mechanism of controlled cell death. It has a complementary, but opposite role to mitosis (cell division) in the regulation of cell populations [Kett et al. 1972](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2008650/). The importance of understanding has wide interest in the biomedical research community because of its linkage to many biological processes ([Renehan et al. 2001](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1120576/)); ranging form celllular homeostatsis and tumorigenesis ([Renehan et al. 2001](https://pubmed.ncbi.nlm.nih.gov/11264570/)), teratogenesis ([Norimura et al 1996](https://pubmed.ncbi.nlm.nih.gov/8616719/)), pathogenesis ([Thompson C.B.](https://pubmed.ncbi.nlm.nih.gov/7878464/)), cancerogenesis ([Wyllie et al. 1999](https://pubmed.ncbi.nlm.nih.gov/10466759/), [O'Connell et al. 1999](https://pubmed.ncbi.nlm.nih.gov/10086376/)) as well as its importance to developing therapeutic agens [Nicholson D.W.](https://pubmed.ncbi.nlm.nih.gov/11048733/).

### Analysis of dying and suriving cells
The hypothesis that dead cells contains valuable information has previously been proven in [Saei et al. 2018](https://pubmed.ncbi.nlm.nih.gov/29572246/). In [Chernobrovkin et al. 2015](https://pubmed.ncbi.nlm.nih.gov/26052917/) it was shown that the abundance change in late apoptosis is exceptional compared to the expectations based on the abundances of co-regulated proteins, which further supports the fact that dead cells contain valuable information. Also, in [Chernobrovkin et al. 2016](https://www.nature.com/articles/cddiscovery201668) it is found that the regulation patterns of the 200-500 most abundant proteins typically attributed to household type proteins (proteins that are needed by every cell type and that are highly abundant[Geiger et al. 2012](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3316730/)) more accurately reflect that of the proteins directly interacting with the drug than any other protein subset grouped by common function or biological process, including cell death. The fact that proteomics display information about the death mechanics tells us that proteomics can be used for the assessment of death modalities [Chernobrovkin et al. 2016](https://www.nature.com/articles/cddiscovery201668).

### Data

The data consists of three cell lines (A549, MCF7 and RKO), two states (alive and dead), nine treatment (8-zaguanine, Raltitrexed, Topotecan, Floxuridine, Nutlin, Dasatinib, Gefitinib, Vincristine and Bortezomib) and control group and three replicates for each sample. Each sample contains 11 511 proteins. TMT-labelling is used for protein quantification, which means that our protein abundances are relative to each other. 

### Method

There was normalization problem for TMT-data which made it impossible to work with absolute values of our intensities? (Check this again!)

Think about relative values?

- Can we differentiate the live and dead cells by the proteome? I.e. is it possible to identify protein signatures relating to the states alive and dead.



- If we can identify the difference between alive and dead states; what are the up- and down-regulated proteins regardless of drug targets?

We compute the logFC between control and treatment group and differentiate alive and dead states for each treatment group and check top ranked proteins and check which protein correlate which each other through the samples. If we group by treatment group each protein should have 9 intensities (3 cell lines x 3 replicates).


- What are the target regulation for dying vs surviving cells?
We compute the logFC between control and treatment, and look at the up- and down regulations for each treatment (I can recall this was not possible, due to relative abundances, why?)

- Are there specific proteins that are unaltered throughout?
These should be random? but, how do we check?

- (What is the difference in protein correlation between dying nad surviving cells?)
