/
index.Rmd
149 lines (98 loc) · 8.07 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
title: "Home"
output:
html_document:
toc: TRUE
toc_float: FALSE
---
---
## Overview
* [Data description](data-overview.html)
---
## Summary analysis
* [Seminar talk 2018-04-16](talk-20180416.html)
* Some code for generating plots for the paper
* `code/paper.plot.images.R`
---
## Prediction results
* Evaluate the withheld samples [on top 101 cyclical genes](method-eval-withheld.html), and on [top 10 cyclial genes](method-eval-withheld-top10.html)
* Explore method performances: [compare peco with Seurat](method-eval-withheld-explore.html)
* Eval Leng et al. data [on top 101 cyclical genes](method-eval-leng.html).
* Eval Bottcher et al. data [on top 101 cyclical genes](method-eval-bottcher.html).
* Eval Buettner et al. data [on top 101 cyclical genes](method-eval-buettner.html).
---
## Model training
* [Early analysis using trendfilter](method-npreg.html) (to be updated...)
* Investigate the property and quality of [the cell time labels derived from GFP and RFP](method-labels.html): PVE of intensities by cell times, comparing PVEs by cell times from DAPI and FUCCI vs from FUCCI; comparing prediction before and after removing PC outliers (after is slightly better), and in different sets of genes.
* Training dataset
* Prediction results from trendfilter order 2 and order 3 with npcirc.nw [on the odd-numbered genes in the top 101 cyclical genes](method-npreg-prelim-results.html)
* Prediction results of the various classifers (supervised, PC, unsupervised) [on top 101 cyclical genes](method-train-classifiers.html), and on [top 10 cyclial genes](method-train-classifiers-top10.html).
method-train-classifiers-non101.html
---
## Explore aproaches to fitting cyclical trend
* Apply spherically projected multivariate linear model
* [First trying out spml package](model-spml.pilot.html)
* [To predictd cell times using traing/test samples set-up](method-spml.html)
* Learning about correlation for circular response variable: some simulations to check [its property and also relations with Fisher's z transformation](circ-simulation-correlation.html)
* CellcycleR 0.1.6
* [Model convergence assessment](images-cellcycleR-convergence.html)
* [Fitting on intensities across plates and individuals](images-cellcycleR.html)
* [Fitting on Leng data)](cellcycler-seqdata-leng.html)
* [Fitting on fucci-seq RNA-seq data)](cellcycler-seqdata-fucci.html)
---
## Cell cycle signal in gene expression data
1. We investigated cell cycle signals in the sequencing data alone.
* Consider [transgene count in sequencing data](images-transgene.html)
* Compute linear correlation between gene expression levels [with DAPI and FUCCI intensities](images-seq-correlation.html)
2. We then assign categorical labels of cell cycle and explored the expresson profiles of these categories.
* Cluster samples by [Partition around medoids(PAM)](images-pam.html)
* Cluster samples by [Guassian mixture modeling](images-mclust.html)
* Select a subset of samples that are closet to the cluster centers (cluster representives) [using silhouette index](images-subset-silhouette.html)
* Examine gene pression scores defined by the Macosko paper in the selected cluster representatives [before confounding correction](images-classify-fucci.html) and [after confounding correction](images-classify-fucci-adjusted.html)
* Examine gene expression scores defined by the Macosk paer [in the sorted cells of the Leng et al. 2015 paper](images-classify-leng.html)
3. We ordered cells on a circle using FUCCI intensities alone.
* First, I used GFP and RFP intensities to estimate a [least-square fit of unit circle which approximate the relative ordering of cells on cell cycle](images-circle-ordering.html)
* I computed the PCs of GFP and RFP and used these to infer the relative ordering on a circle (i.e., polar coordinates), and [to evaluate the circle fit, I computed circular-circular and circular-linear correlation with DAPI and gene expression](images-circle-ordering-eval.html)
* Here I put together lists of genes identified as significantly correlatd with the PC-based fit [by linear correlation and circular-linear correlation, also cyclical genes by smash](images-circle-ordering-sigcorgenes.html)
4. I used nonparametric methods to identify genes that may be cyclical along cell cycle phases.
* Fit [smash and kernel regression on circular variables](npreg-methods.html) on a subset of genes with detection rate > .8.
* Fit [trendfilter](npreg-trendfilter-prelim.html) on a subset of genes (5) that are observed (visually) to have cyclical pattern. trendfilter is robust to small proportion of undetected cells, approx 2 or 3%. In cases of simulation when increasing proportion of undetected cells to 20%, we observed a flat line in gene expression for genes previously identified to tend to a cyclical pattern.
* Next, we fit trendfilter on all genes after transforming the data to follow standard normal distribution, permutation-based p-values for PVE are used to select [101 significant cyclical genes](npreg-trendfilter-quantile.html).
---
## RNA-seq data preprcessing
1. The first step in preprocessing RNA-seq data consists of QC and filtering.
* Sample QC and filtering
* [Sample QC criteria](sampleqc.html)
* [Sequencing depth](totals.html)
* [Reads versus molecules](reads-v-molecules.html)
* Gene QC and filtering
* [gene filtering](gene-filtering.html)
* [PCA with technical fators](pca-tf.html)
2. We then analyzed and corrected for batch effect due to C1 plate in the sequencing data
* [Estimate variance explained in IBD and correct for batch effects](seqdata-batch-correction.html)
---
## Microscopy image analysis
* [Processing images - from images to intensities](images-process.html)
We evaluated and pre-processed the results of image analysis as follows:
1. We visually inspect images deteced to have none or more than one nucleus. For cases that are inconsistent with visual inspection, we correct the number of nuclei detected.
* [Inspect images with multiple nuclei](images-multiple-nuclei.html)
* [Inspect images with no nucleus](images-zero-nuclei.html)
2. We applied background correction to the intensity measurements of GFP, RFP and DAPI based on the following analyses.
* [CONFESS results](confess-prelim.html)
* [QC analysis including no. nuclei detected, DAPI, and intensity variation](images-qc.html)
* [Explore using log10 sum pixel intensity for signal metrics](images-qc-followup.html)
* [Compare correction approaches using median versus mean background](images-metrics.html)
* [Explore associations between nucleus shape metrics vs intensities](images-metrics-cell-shape.html)
3. We analyzed intensity variation across individuals and batches and considers approaches for removing batch effects in the data.
* [Visualize signal variation by plate and individual identity](images-qc-labels.html)
* [Visualize the structure of signal variation by individual identity](images-qc-variation.html)
* [Quantile normalization for GFP, RFP and DAPI](images-normalize-quantile.html)
* [Estimate variance explained in IBD and correct for batch effects in intensities](images-normalize-anova.html)
4. We investigated the cell time estimates based on FUCCI intensities.
* Consider the mean-adjusted FUCCI intensities, what's the relationship between cell time [estimates and DAPI and sample molecule count](images-time-eval.html)?
* Consider the quantile-normalized FUCCI intensities versus the mean-adjusted intensities (for adjusting batch effect). What's the differences between cell time estimates from [the two normalization approaches](images-time-compare-normalize.html)?
---
## One-time investigations
* Why some gene symbols (genes) correspond to [multiple Ensembl IDs?](ensembl.html)
* I selected a set of cell cycle [genes that belong to GO term Cell Cycle and looked at the overlap with the detected genes in our data.](seqdata-select-cellcyclegenes.html)
* [Replicate Seurat example of scoring cell cycle phases](seurat-cellcycle.Rmd)