update the course website

Former-commit-id: f597bbf
hemberg-lab · Oct 30, 2017 · f77d10e · f77d10e
1 parent 9f48740
commit f77d10e
Show file tree

Hide file tree

Showing 43 changed files with 38,765 additions and 188 deletions.
diff --git a/deng/deng-reads.rds b/deng/deng-reads.rds
diff --git a/deng/deng-reads.rds.REMOVED.git-id b/deng/deng-reads.rds.REMOVED.git-id
diff --git a/docs/07-exprs-qc.md b/docs/07-exprs-qc.md
@@ -521,10 +521,10 @@ Perform exactly the same QC analysis with read counts of the same Blischak data.
 ##  [27] shinydashboard_0.6.1    shiny_1.0.5            
 ##  [29] rrcov_1.4-3             compiler_3.4.2         
 ##  [31] backports_1.1.1         assertthat_0.2.0       
-##  [33] Matrix_1.2-7.1          lazyeval_0.2.0         
+##  [33] Matrix_1.2-7.1          lazyeval_0.2.1         
 ##  [35] htmltools_0.3.6         quantreg_5.34          
 ##  [37] tools_3.4.2             bindrcpp_0.2           
-##  [39] gtable_0.2.0            glue_1.1.1             
+##  [39] gtable_0.2.0            glue_1.2.0             
 ##  [41] GenomeInfoDbData_0.99.0 reshape2_1.4.2         
 ##  [43] dplyr_0.7.4             Rcpp_0.12.13           
 ##  [45] trimcluster_0.1-2       sgeostat_1.0-27        

diff --git a/docs/08-exprs-qc-reads.md b/docs/08-exprs-qc-reads.md
@@ -428,10 +428,10 @@ sessionInfo()
 ##  [27] shinydashboard_0.6.1    shiny_1.0.5            
 ##  [29] rrcov_1.4-3             compiler_3.4.2         
 ##  [31] backports_1.1.1         assertthat_0.2.0       
-##  [33] Matrix_1.2-7.1          lazyeval_0.2.0         
+##  [33] Matrix_1.2-7.1          lazyeval_0.2.1         
 ##  [35] htmltools_0.3.6         quantreg_5.34          
 ##  [37] tools_3.4.2             bindrcpp_0.2           
-##  [39] gtable_0.2.0            glue_1.1.1             
+##  [39] gtable_0.2.0            glue_1.2.0             
 ##  [41] GenomeInfoDbData_0.99.0 reshape2_1.4.2         
 ##  [43] dplyr_0.7.4             Rcpp_0.12.13           
 ##  [45] trimcluster_0.1-2       sgeostat_1.0-27        

diff --git a/docs/09-exprs-overview.md b/docs/09-exprs-overview.md
@@ -265,7 +265,7 @@ Perform the same analysis with read counts of the Blischak data. Use `tung/reads
 ##  [7] blob_1.1.0              GenomeInfoDbData_0.99.0
 ##  [9] vipor_0.4.5             yaml_2.1.14            
 ## [11] RSQLite_2.0             backports_1.1.1        
-## [13] lattice_0.20-34         glue_1.1.1             
+## [13] lattice_0.20-34         glue_1.2.0             
 ## [15] limma_3.32.10           digest_0.6.12          
 ## [17] XVector_0.16.0          colorspace_1.3-2       
 ## [19] cowplot_0.8.0           htmltools_0.3.6        
@@ -275,7 +275,7 @@ Perform the same analysis with read counts of the Blischak data. Use `tung/reads
 ## [27] bookdown_0.5            zlibbioc_1.22.0        
 ## [29] xtable_1.8-2            scales_0.5.0           
 ## [31] Rtsne_0.13              tibble_1.3.4           
-## [33] lazyeval_0.2.0          magrittr_1.5           
+## [33] lazyeval_0.2.1          magrittr_1.5           
 ## [35] mime_0.5                memoise_1.1.0          
 ## [37] evaluate_0.10.1         beeswarm_0.2.3         
 ## [39] shinydashboard_0.6.1    tools_3.4.2            

diff --git a/docs/10-exprs-overview-reads.md b/docs/10-exprs-overview-reads.md
@@ -178,7 +178,7 @@ sessionInfo()
 ##  [7] blob_1.1.0              GenomeInfoDbData_0.99.0
 ##  [9] vipor_0.4.5             yaml_2.1.14            
 ## [11] RSQLite_2.0             backports_1.1.1        
-## [13] lattice_0.20-34         glue_1.1.1             
+## [13] lattice_0.20-34         glue_1.2.0             
 ## [15] limma_3.32.10           digest_0.6.12          
 ## [17] XVector_0.16.0          colorspace_1.3-2       
 ## [19] cowplot_0.8.0           htmltools_0.3.6        
@@ -188,7 +188,7 @@ sessionInfo()
 ## [27] bookdown_0.5            zlibbioc_1.22.0        
 ## [29] xtable_1.8-2            scales_0.5.0           
 ## [31] Rtsne_0.13              tibble_1.3.4           
-## [33] lazyeval_0.2.0          magrittr_1.5           
+## [33] lazyeval_0.2.1          magrittr_1.5           
 ## [35] mime_0.5                memoise_1.1.0          
 ## [37] evaluate_0.10.1         beeswarm_0.2.3         
 ## [39] shinydashboard_0.6.1    tools_3.4.2            

diff --git a/docs/11-confounders.md b/docs/11-confounders.md
@@ -158,7 +158,7 @@ Perform the same analysis with read counts of the Blischak data. Use `tung/reads
 ##  [7] blob_1.1.0              GenomeInfoDbData_0.99.0
 ##  [9] vipor_0.4.5             yaml_2.1.14            
 ## [11] RSQLite_2.0             backports_1.1.1        
-## [13] lattice_0.20-34         glue_1.1.1             
+## [13] lattice_0.20-34         glue_1.2.0             
 ## [15] limma_3.32.10           digest_0.6.12          
 ## [17] XVector_0.16.0          colorspace_1.3-2       
 ## [19] cowplot_0.8.0           htmltools_0.3.6        
@@ -167,7 +167,7 @@ Perform the same analysis with read counts of the Blischak data. Use `tung/reads
 ## [25] pkgconfig_2.0.1         biomaRt_2.32.1         
 ## [27] bookdown_0.5            zlibbioc_1.22.0        
 ## [29] xtable_1.8-2            scales_0.5.0           
-## [31] tibble_1.3.4            lazyeval_0.2.0         
+## [31] tibble_1.3.4            lazyeval_0.2.1         
 ## [33] magrittr_1.5            mime_0.5               
 ## [35] memoise_1.1.0           evaluate_0.10.1        
 ## [37] beeswarm_0.2.3          shinydashboard_0.6.1   

diff --git a/docs/12-confounders-reads.md b/docs/12-confounders-reads.md
@@ -70,7 +70,7 @@ knit: bookdown::preview_chapter
 ##  [7] blob_1.1.0              GenomeInfoDbData_0.99.0
 ##  [9] vipor_0.4.5             yaml_2.1.14            
 ## [11] RSQLite_2.0             backports_1.1.1        
-## [13] lattice_0.20-34         glue_1.1.1             
+## [13] lattice_0.20-34         glue_1.2.0             
 ## [15] limma_3.32.10           digest_0.6.12          
 ## [17] XVector_0.16.0          colorspace_1.3-2       
 ## [19] cowplot_0.8.0           htmltools_0.3.6        
@@ -79,7 +79,7 @@ knit: bookdown::preview_chapter
 ## [25] pkgconfig_2.0.1         biomaRt_2.32.1         
 ## [27] bookdown_0.5            zlibbioc_1.22.0        
 ## [29] xtable_1.8-2            scales_0.5.0           
-## [31] tibble_1.3.4            lazyeval_0.2.0         
+## [31] tibble_1.3.4            lazyeval_0.2.1         
 ## [33] magrittr_1.5            mime_0.5               
 ## [35] memoise_1.1.0           evaluate_0.10.1        
 ## [37] beeswarm_0.2.3          shinydashboard_0.6.1   

diff --git a/docs/13-exprs-norm.md b/docs/13-exprs-norm.md
@@ -664,7 +664,7 @@ Perform the same analysis with read counts of the `tung` data. Use `tung/reads.r
 ##  [5] tools_3.4.2             backports_1.1.1        
 ##  [7] DT_0.2                  R6_2.2.2               
 ##  [9] hypergeo_1.2-13         vipor_0.4.5            
-## [11] DBI_0.7                 lazyeval_0.2.0         
+## [11] DBI_0.7                 lazyeval_0.2.1         
 ## [13] colorspace_1.3-2        gridExtra_2.3          
 ## [15] moments_0.14            bit_1.1-12             
 ## [17] compiler_3.4.2          orthopolynom_1.0-5     
@@ -691,7 +691,7 @@ Perform the same analysis with read counts of the `tung` data. Use `tung/reads.r
 ## [59] splines_3.4.2           locfit_1.5-9.1         
 ## [61] igraph_1.1.2            rjson_0.2.15           
 ## [63] reshape2_1.4.2          biomaRt_2.32.1         
-## [65] XML_3.98-1.9            glue_1.1.1             
+## [65] XML_3.98-1.9            glue_1.2.0             
 ## [67] evaluate_0.10.1         data.table_1.10.4-3    
 ## [69] deSolve_1.20            httpuv_1.3.5           
 ## [71] gtable_0.2.0            assertthat_0.2.0       

diff --git a/docs/14-exprs-norm-reads.md b/docs/14-exprs-norm-reads.md
@@ -211,7 +211,7 @@ output: html_document
 ##  [5] tools_3.4.2             backports_1.1.1        
 ##  [7] DT_0.2                  R6_2.2.2               
 ##  [9] hypergeo_1.2-13         vipor_0.4.5            
-## [11] DBI_0.7                 lazyeval_0.2.0         
+## [11] DBI_0.7                 lazyeval_0.2.1         
 ## [13] colorspace_1.3-2        gridExtra_2.3          
 ## [15] moments_0.14            bit_1.1-12             
 ## [17] compiler_3.4.2          orthopolynom_1.0-5     
@@ -238,7 +238,7 @@ output: html_document
 ## [59] splines_3.4.2           locfit_1.5-9.1         
 ## [61] igraph_1.1.2            rjson_0.2.15           
 ## [63] reshape2_1.4.2          biomaRt_2.32.1         
-## [65] XML_3.98-1.9            glue_1.1.1             
+## [65] XML_3.98-1.9            glue_1.2.0             
 ## [67] evaluate_0.10.1         data.table_1.10.4-3    
 ## [69] deSolve_1.20            httpuv_1.3.5           
 ## [71] gtable_0.2.0            assertthat_0.2.0       

diff --git a/docs/15-remove-conf.md b/docs/15-remove-conf.md
@@ -558,10 +558,10 @@ Perform the same analysis with read counts of the `tung` data. Use `tung/reads.r
 ## [19] shinydashboard_0.6.1    shiny_1.0.5            
 ## [21] compiler_3.4.2          backports_1.1.1        
 ## [23] assertthat_0.2.0        Matrix_1.2-7.1         
-## [25] lazyeval_0.2.0          htmltools_0.3.6        
+## [25] lazyeval_0.2.1          htmltools_0.3.6        
 ## [27] tools_3.4.2             igraph_1.1.2           
 ## [29] bindrcpp_0.2            gtable_0.2.0           
-## [31] glue_1.1.1              GenomeInfoDbData_0.99.0
+## [31] glue_1.2.0              GenomeInfoDbData_0.99.0
 ## [33] dplyr_0.7.4             Rcpp_0.12.13           
 ## [35] rtracklayer_1.36.6      stringr_1.2.0          
 ## [37] mime_0.5                hypergeo_1.2-13        

diff --git a/docs/16-remove-conf-reads.md b/docs/16-remove-conf-reads.md
@@ -415,10 +415,10 @@ ggplot(dod, aes(Normalisation, Individual, fill=kBET)) +
 ## [19] shinydashboard_0.6.1    shiny_1.0.5            
 ## [21] compiler_3.4.2          backports_1.1.1        
 ## [23] assertthat_0.2.0        Matrix_1.2-7.1         
-## [25] lazyeval_0.2.0          htmltools_0.3.6        
+## [25] lazyeval_0.2.1          htmltools_0.3.6        
 ## [27] tools_3.4.2             igraph_1.1.2           
 ## [29] bindrcpp_0.2            gtable_0.2.0           
-## [31] glue_1.1.1              GenomeInfoDbData_0.99.0
+## [31] glue_1.2.0              GenomeInfoDbData_0.99.0
 ## [33] dplyr_0.7.4             Rcpp_0.12.13           
 ## [35] rtracklayer_1.36.6      stringr_1.2.0          
 ## [37] mime_0.5                hypergeo_1.2-13        

diff --git a/docs/18-clustering.md b/docs/18-clustering.md
@@ -67,6 +67,7 @@ plotPCA(deng, colour_by = "cell_type2")
 
 
 \begin{center}\includegraphics{18-clustering_files/figure-latex/unnamed-chunk-5-1} \end{center}
+As you can see, the early cell types separate quite well, but the three blastocyst timepoints are more difficult to distinguish.
 
 ### SC3
 
@@ -196,14 +197,16 @@ adjustedRandIndex(colData(deng)$cell_type2, colData(deng)$sc3_10_clusters)
 ## [1] 0.7705208
 ```
 
-Note, that one can also run `SC3` in an interactive `Shiny` session:
+__Note__ `SC3` can also be run in an interactive `Shiny` session:
 
 ```r
 sc3_interactive(deng)
 ```
 
 This command will open `SC3` in a web browser.
 
+__Note__ Due to direct calculation of distances `SC3` becomes very slow when the number of cells is $>5000$. For large datasets containing up to $10^5$ cells we recomment using `Seurat` (see chapter \@ref(seurat-chapter)).
+
 * __Exercise 1__: Run `SC3` for $k$ from 8 to 12 and explore different clustering solutions in your web browser.
 
 * __Exercise 2__: Which clusters are the most stable when $k$ is changed from 8 to 12? (Look at the "Stability" tab)
@@ -258,7 +261,7 @@ __Our solution__:
 \caption{Clustering solutions of pcaReduce method for $k=2$.}(\#fig:clust-pca-reduce2)
 \end{figure}
 
-__Exercise 6__: Compare the results between `SC3` and the original publication cell types for $k=10$.
+__Exercise 6__: Compare the results between `pcaReduce` and the original publication cell types for $k=10$.
 
 __Our solution__:
 
@@ -490,10 +493,10 @@ __Exercise 11__: Is using the singleton cluster criteria for finding __k__ a goo
 ##  [17] shinydashboard_0.6.1    shiny_1.0.5            
 ##  [19] rrcov_1.4-3             compiler_3.4.2         
 ##  [21] backports_1.1.1         assertthat_0.2.0       
-##  [23] Matrix_1.2-7.1          lazyeval_0.2.0         
+##  [23] Matrix_1.2-7.1          lazyeval_0.2.1         
 ##  [25] limma_3.32.10           htmltools_0.3.6        
 ##  [27] tools_3.4.2             bindrcpp_0.2           
-##  [29] gtable_0.2.0            glue_1.1.1             
+##  [29] gtable_0.2.0            glue_1.2.0             
 ##  [31] GenomeInfoDbData_0.99.0 reshape2_1.4.2         
 ##  [33] dplyr_0.7.4             doRNG_1.6.6            
 ##  [35] Rcpp_0.12.13            gdata_2.18.0           

diff --git a/docs/19-dropouts.md b/docs/19-dropouts.md
@@ -332,7 +332,7 @@ Plot the expression of the features for each of the other methods. Which appear
 ## [27] gdata_2.18.0            magrittr_1.5           
 ## [29] backports_1.1.1         gplots_3.0.1           
 ## [31] htmltools_0.3.6         MASS_7.3-45            
-## [33] bbmle_1.0.19            KernSmooth_2.23-15     
+## [33] bbmle_1.0.20            KernSmooth_2.23-15     
 ## [35] stringi_1.1.5           RCurl_1.95-4.8         
 ## [37] hypergeo_1.2-13
 ```
diff --git a/docs/20-pseudotime.md b/docs/20-pseudotime.md
@@ -549,7 +549,7 @@ __Exercise 7__: Repeat the exercise using a subset of the genes, e.g. the set of
 ## loaded via a namespace (and not attached):
 ##   [1] backports_1.1.1         Hmisc_4.0-3            
 ##   [3] RcppEigen_0.3.3.3.0     plyr_1.8.4             
-##   [5] igraph_1.1.2            lazyeval_0.2.0         
+##   [5] igraph_1.1.2            lazyeval_0.2.1         
 ##   [7] sp_1.2-5                densityClust_0.3       
 ##   [9] fastICA_1.2-1           digest_0.6.12          
 ##  [11] htmltools_0.3.6         gdata_2.18.0           
@@ -562,7 +562,7 @@ __Exercise 7__: Repeat the exercise using a subset of the genes, e.g. the set of
 ##  [25] lme4_1.1-14             spatstat_1.53-2        
 ##  [27] spatstat.data_1.1-1     bindr_0.1              
 ##  [29] survival_2.40-1         zoo_1.8-0              
-##  [31] glue_1.1.1              polyclip_1.6-1         
+##  [31] glue_1.2.0              polyclip_1.6-1         
 ##  [33] gtable_0.2.0            zlibbioc_1.22.0        
 ##  [35] XVector_0.16.0          MatrixModels_0.4-1     
 ##  [37] car_2.1-5               DEoptimR_1.0-8         
@@ -605,6 +605,6 @@ __Exercise 7__: Repeat the exercise using a subset of the genes, e.g. the set of
 ## [111] grid_3.4.2              rpart_4.1-10           
 ## [113] class_7.3-14            minqa_1.2.4            
 ## [115] rmarkdown_1.6           Rtsne_0.13             
-## [117] TTR_0.23-2              bbmle_1.0.19           
+## [117] TTR_0.23-2              bbmle_1.0.20           
 ## [119] shiny_1.0.5             base64enc_0.1-3
 ```
diff --git a/docs/21-imputation.md b/docs/21-imputation.md
@@ -22,9 +22,9 @@ As discussed previously, one of the main challenges when analyzing scRNA-seq dat
 * The gene was expressed, but for some reason the transcripts were lost somewhere prior to sequencing
 * The gene was expressed and transcripts were captured and turned into cDNA, but the sequencing depth was not sufficient to produce any reads.
 
-Thus, dropouts could be result of experimental shortcomings, and if this is the case then we would like to provide computational corrections. One possible solution is to impute the dropouts in the expression matrix. To be able to impute gene expression values, one must have an underlying model. However, since we do not know which dropout events are technical artefacts and which correspond to the transcript being truly absent, imputation is a difficult challenges.
+Thus, dropouts could be result of experimental shortcomings, and if this is the case then we would like to provide computational corrections. One possible solution is to impute the dropouts in the expression matrix. To be able to impute gene expression values, one must have an underlying model. However, since we do not know which dropout events are technical artefacts and which correspond to the transcript being truly absent, imputation is a difficult challenge.
 
-To the best of our knowledge, there are currently two different imputation methods available: MAGIC [@Van_Dijk2017-bh] and scImpute [@Li2017-tz]. [MAGIC](https://github.com/pkathail/magic) is only available for Python or Matlab, but we will run it from within R.
+To the best of our knowledge, there are currently two different imputation methods available: [MAGIC](https://github.com/pkathail/magic) [@Van_Dijk2017-bh] and [scImpute](https://github.com/Vivianstats/scImpute) [@Li2017-tz]. MAGIC is only available for Python or Matlab, but we will run it from within R.
 
 ### scImpute
 
@@ -243,7 +243,7 @@ __Exercise:__ What is the difference between `scImpute` and `MAGIC` based on the
 ##  [5] tools_3.4.2             backports_1.1.1        
 ##  [7] doRNG_1.6.6             R6_2.2.2               
 ##  [9] vipor_0.4.5             KernSmooth_2.23-15     
-## [11] DBI_0.7                 lazyeval_0.2.0         
+## [11] DBI_0.7                 lazyeval_0.2.1         
 ## [13] colorspace_1.3-2        gridExtra_2.3          
 ## [15] bit_1.1-12              compiler_3.4.2         
 ## [17] pkgmaker_0.22           labeling_0.3           
@@ -271,7 +271,7 @@ __Exercise:__ What is the difference between `scImpute` and `MAGIC` based on the
 ## [61] splines_3.4.2           locfit_1.5-9.1         
 ## [63] rjson_0.2.15            rngtools_1.2.4         
 ## [65] reshape2_1.4.2          codetools_0.2-15       
-## [67] biomaRt_2.32.1          glue_1.1.1             
+## [67] biomaRt_2.32.1          glue_1.2.0             
 ## [69] XML_3.98-1.9            evaluate_0.10.1        
 ## [71] data.table_1.10.4-3     httpuv_1.3.5           
 ## [73] gtable_0.2.0            assertthat_0.2.0       

diff --git a/docs/22-de-intro.md b/docs/22-de-intro.md
@@ -8,7 +8,7 @@ knit: bookdown::preview_chapter
 
 ### Bulk RNA-seq
 
-One of the most common types of analyses when analyzing bulk RNA-seq
+One of the most common types of analyses when working with bulk RNA-seq
 data is to identify differentially expressed genes. By comparing the
 genes that change between two conditions, e.g. mutant and wild-type or
 stimulated and unstimulated, it is possible to characterize the
@@ -21,7 +21,7 @@ have been developed for bulk RNA-seq. Moreover, there are also
 extensive
 [datasets](http://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-9-r95)
 available where the RNA-seq data has been validated using
-RT-qPCR. These data can be used to benchmark DE finding algorithms.
+RT-qPCR. These data can be used to benchmark DE finding algorithms and the available evidence suggests that the algorithms are performing quite well.
 
 ### Single cell RNA-seq
 
@@ -37,7 +37,7 @@ Unlike bulk RNA-seq, we generally have a large number of samples (i.e. cells) fo
 
 There are two main approaches to comparing distributions. Firstly, we can use existing statistical models/distributions and fit the same type of model to the expression in each group then test for differences in the parameters for each model, or test whether the model fits better if a particular paramter is allowed to be different according to group. For instance in Chapter \@ref(dealing-with-confounders) we used edgeR to test whether allowing mean expression to be different in different batches significantly improved the fit of a negative binomial model of the data.
 
-Alternatively, we can use a non-parametric test which does not assume expression values follow any particular distribution, e.g. the [Kolmogorov-Smirnov test (KS-test)](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test). Non-parametric tests generally convert observed expression values to ranks and test whether the distribution of ranks for one group are signficantly different from the distribution of ranks for the other group. However, some non-parametric methods fail in the presence of a large number of tied values, such as the case for dropouts (zeros) in single-cell RNA-seq expression data. Moreover, if the conditions for a parametric test hold, then it will typically be more powerful than a non-parametric test.
+Alternatively, we can use a non-parametric test which does not assume that expression values follow any particular distribution, e.g. the [Kolmogorov-Smirnov test (KS-test)](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test). Non-parametric tests generally convert observed expression values to ranks and test whether the distribution of ranks for one group are signficantly different from the distribution of ranks for the other group. However, some non-parametric methods fail in the presence of a large number of tied values, such as the case for dropouts (zeros) in single-cell RNA-seq expression data. Moreover, if the conditions for a parametric test hold, then it will typically be more powerful than a non-parametric test.
 
 ### Models of single-cell RNASeq data