Skip to content

Commit

Permalink
Merge pull request #205 from nf-core/dev
Browse files Browse the repository at this point in the history
Dev -> master for v1.4.0
  • Loading branch information
pinin4fjords committed Nov 27, 2023
2 parents 44c449b + 47ef421 commit a3d664c
Show file tree
Hide file tree
Showing 23 changed files with 294 additions and 97 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
strategy:
matrix:
NXF_VER:
- "23.04.0"
- "23.10.0"
- "latest-everything"
profile:
- "test"
Expand Down
22 changes: 21 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,30 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v1.3.1 - 2023-10-26
## v1.4.0 - 2023-11-27

### `Added`

- [[#203](https://github.com/nf-core/differentialabundance/pull/203)] - Transcript lengths for DESeq2 ([@pinin4fjords](https://github.com/pinin4fjords), review by [@maxulysse](https://github.com/maxulysse))
- [[#193](https://github.com/nf-core/differentialabundance/pull/193)] - Add DESeq2 text to report ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#192](https://github.com/nf-core/differentialabundance/pull/192)] - Add scree plot in report ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#189](https://github.com/nf-core/differentialabundance/pull/189)] - Add DE models to report ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#188](https://github.com/nf-core/differentialabundance/pull/188)] - Add option to cluster all features ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#198](https://github.com/nf-core/differentialabundance/pull/200)] - Document correct RNAseq matrix usage ([@pinin4fjords](https://github.com/pinin4fjords), review by [@WackerO](https://github.com/WackerO))

### `Fixed`

- [[#201](https://github.com/nf-core/differentialabundance/issues/201)] - DESeq2/Limma update, fix incorrect column names issue ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#191](https://github.com/nf-core/differentialabundance/issues/191)] - Fix sample metadata table in the html report not paginating ([@davidecarlson](https://github.com/davidecarlson), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#190](https://github.com/nf-core/differentialabundance/pull/190)] - Fix GSEA indent in report ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))

### `Changed`

- [[#194](https://github.com/nf-core/differentialabundance/pull/194)] - Change report volcano colors ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#188](https://github.com/nf-core/differentialabundance/pull/188)] - Update min nextflow version ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))

## v1.3.1 - 2023-10-26

### `Fixed`

- [[#183](https://github.com/nf-core/differentialabundance/pull/183)] - Fix logging for dendrograms ([@pinin4fjords](https://github.com/pinin4fjords), review by [@WackerO](https://github.com/WackerO))
Expand Down
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![GitHub Actions CI Status](https://github.com/nf-core/differentialabundance/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/differentialabundance/actions?query=workflow%3A%22nf-core+CI%22)
[![GitHub Actions Linting Status](https://github.com/nf-core/differentialabundance/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/differentialabundance/actions?query=workflow%3A%22nf-core+linting%22)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/differentialabundance/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.7568000-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.7568000)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/)
[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.10.0-23aa62.svg)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
Expand Down Expand Up @@ -51,6 +51,15 @@ RNA-seq:
-profile rnaseq,<docker/singularity/podman/shifter/charliecloud/conda/institute>
```

:::note
If you are using the outputs of the nf-core rnaseq workflow as input here **either**:

- supply the raw count matrices (file names like **gene_counts.tsv**) alongide the transcript length matrix via `--transcript_length_matrix` (rnaseq versions >=3.12.0, preferred)
- **or** supply the **gene_counts_length_scaled.tsv** or **gene_counts_scaled.tsv** matrices.

See the [usage documentation](https://nf-co.re/differentialabundance/usage) for more information.
:::

Affymetrix microarray:

```bash
Expand Down
103 changes: 79 additions & 24 deletions assets/differentialabundance_report.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ params:
report_title: NULL,
report_author: NULL,
report_description: NULL,
report_scree: NULL
observations_type: NULL
observations: NULL # GSE156533.samplesheet.csv
observations_id_col: NULL
Expand All @@ -36,7 +37,7 @@ params:
features_gtf_feature_type: NULL
features_gtf_table_first_field: NULL
features_log2_assays: NULL
raw_matrix: null # e.g. 0_salmon.merged.gene_counts.tsv
raw_matrix: null # e.g. 0_salmon.merged.gene_counts_length_scaled.tsv
normalised_matrix: null
variance_stabilised_matrix: null # e.g. test_files/3_treatment-WT-P23H.vst.tsv
contrasts_file: null # e.g. GSE156533.contrasts.csv
Expand Down Expand Up @@ -426,7 +427,7 @@ display_columns <- head(union(display_columns, additional_useful_cols), 5)
display_columns <- unique(c(display_columns, informative_variables))
observations_to_print <- observations[,unique(display_columns)]
colnames(observations_to_print) <- prettifyVariablename(colnames(observations_to_print))
print( htmltools::tagList(datatable(observations_to_print, caption = paste(ucfirst(params$observations_type), 'metadata'), rownames = FALSE, options = list(dom = 't')) ))
print( htmltools::tagList(datatable(observations_to_print, caption = paste(ucfirst(params$observations_type), 'metadata'), rownames = FALSE, options = list(dom = 'tb')) ))
```

Expand All @@ -437,6 +438,19 @@ Comparisons were made between `r params$observations_type` groups defined using
```{r, echo=FALSE, results='asis'}
contrasts_to_print <- contrasts
colnames(contrasts_to_print) <- prettifyVariablename(colnames(contrasts_to_print))
# Add design/model formulae to report
de_tool <- ifelse(params$study_type %in% c('rnaseq'), "deseq2", "limma")
contrasts_to_print$model <- sapply(contrasts_to_print$Id, function(id) {
model_file <- paste0(id, ".", de_tool, ".model.txt")
if (file.exists(model_file)) {
first_line <- readLines(model_file, n = 1)
return(first_line)
} else {
return(NA)
}
})
print( htmltools::tagList(datatable(contrasts_to_print, caption = paste0("Table of contrasts"), rownames = FALSE, options = list(dom = 't')) ))
```

Expand Down Expand Up @@ -495,10 +509,11 @@ cat(paste0("\n### ", ucfirst(params$observations_type), " relationships\n"))
Principal components analysis was conducted based on the `r params$exploratory_n_features` most variable `r params$features_type`s. Each component was annotated with its percent contribution to variance.

```{r, echo=FALSE, results='asis'}
# Create nested list to save the percentVars for reusing in the scree plot
percentVar_list <- list()
for (assay_type in rev(names(assay_data))){
pca_data <- pca_datas[[assay_type]]
for (iv in informative_variables){
cat(paste0("\n##### ", prettifyVariablename(assay_type), " (", iv, ")\n"))
Expand Down Expand Up @@ -543,10 +558,35 @@ for (assay_type in rev(names(assay_data))){
print(htmltools::tagList(do.call("plotly_scatterplot", plot_args)))
}
if (! assay_type %in% names(percentVar_list)){
percentVar_list[[assay_type]] <- percentVar
}
}
}
```

```{r, echo=FALSE, results='asis', eval=params$report_scree}
cat(paste0("\n#### Scree plot {.tabset}"))
cat(paste0("\nThe following scree plot visualizes what percentage of total variation in the data can be explained by each of the principal components computed.\n"))
#iv <- informative_variables[1]
for (assay_type in names(percentVar_list)) {
percentVarData <- data.frame(percentVar_list[[assay_type]])
colnames(percentVarData) <- c("var_explained")
percentVarData$PCA <- as.numeric(rownames(percentVarData))
cat(paste0("\n##### ", prettifyVariablename(assay_type), "\n"))
print(
ggplot(percentVarData, aes(x=factor(PCA),y=var_explained, group=1)) +
theme_bw() +
geom_point(size=4) +
geom_line(linetype="dashed") +
xlab("PC") +
ylab("Percent variance explained")
)
cat("\n")
}
```

#### Principal components/ metadata associations

For the variance stabilised matrix, an ANOVA test was used to determine assocations between continuous principal components and categorical covariates (including the variable of interest).
Expand Down Expand Up @@ -592,13 +632,13 @@ for (variable in rownames(pca_vs_meta)){

#### Clustering dendrograms {.tabset}

A hierarchical clustering of `r params$features_type`s was undertaken based on the top `r params$exploratory_n_features` most variable `r params$features_type`s. Distances between `r params$features_type`s were estimated based on `r params$exploratory_cor_method` correlation, which were then used to produce a clustering via the `r params$exploratory_clustering_method` method with `hclust()` in R.
A hierarchical clustering of `r params$features_type`s was undertaken based on `r ifelse(params$exploratory_n_features == -1, paste0("all ", params$features_type), paste0("the ", params$exploratory_n_features, " most variable ", params$features_type))`s. Distances between `r params$features_type`s were estimated based on `r params$exploratory_cor_method` correlation, which were then used to produce a clustering via the `r params$exploratory_clustering_method` method with `hclust()` in R.

```{r, echo=FALSE, results='asis'}
for (assay_type in rev(names(assay_data))){
for (iv in informative_variables){
cat(paste0("\n##### ", prettifyVariablename(assay_type), " (", iv, ")\n"))
variable_genes <- selectVariableGenes(matrix = assay_data[[assay_type]], ntop = params$exploratory_n_features)
variable_genes <- selectVariableGenes(matrix = assay_data[[assay_type]], ntop = ifelse(params$exploratory_n_features == -1, nrow(assay_data[[assay_type]]), params$exploratory_n_features))
dendroColorScale <- makeColorScale(length(unique(observations[[iv]])), palette = params$exploratory_palette_name)
p <- clusteringDendrogram(
Expand All @@ -608,8 +648,7 @@ for (assay_type in rev(names(assay_data))){
cor_method = params$exploratory_cor_method,
plot_title = paste0(
paste0(params$observations_type," clustering dendrogram, "),
params$exploratory_n_features,
" most variable ",
ifelse(params$exploratory_n_features == -1, nrow(assay_data[[assay_type]]), paste0(params$exploratory_n_features, " most variable")), " ",
params$features_type,
"s\n(", params$exploratory_clustering_method, " clustering, ", params$exploratory_cor_method, " correlation)"),
cluster_method = params$exploratory_clustering_method,
Expand Down Expand Up @@ -671,11 +710,17 @@ foo <- lapply(informative_variables[iv_min_group_sizes > 2], function(iv){
}
}
})
```

## Differential analysis

```{r, echo=FALSE, results='asis', eval=params$study_type %in% c('rnaseq')}
# For DESeq2, add some more explanation to the report
cat(paste0(
"The `DESeq2 R` package was used for differential analysis. p-values were adjusted with the ", params$deseq2_p_adjust_method, " method to reduce the number of false positives. ", ucfirst(params$features_type), "s were considered differential if, for the respective contrast, the adjusted p-value was equal to or lower than ", params$deseq2_alpha, " and the absolute log2 fold change was equal to or higher than ", params$deseq2_lfc_threshold, "."
))
```

### Differential `r params$features_type` `r params$study_abundance_type` {.tabset}

```{r, echo=FALSE, results='asis'}
Expand Down Expand Up @@ -733,28 +778,41 @@ for (i in 1:nrow(contrasts)){
cat("\n##### ", pvt, " p values\n")
pval_column <- p_value_types[[pvt]]
full_de$differential_status <- FALSE
full_de$differential_status[abs(full_de[[params$differential_fc_column]]) > log2(params$differential_min_fold_change) & full_de[[pval_column]] < p_value_thresholds[[pvt]]] <- TRUE
de_fc <- abs(full_de[[params$differential_fc_column]]) >= log2(params$differential_min_fold_change)
de_fc_label <- paste("abs(logFC) >=", params$differential_min_fold_change)
de_pval <- full_de[[pval_column]] <= p_value_thresholds[[pvt]]
de_pval_label <- paste(pvt, "<=", p_value_thresholds[[pvt]])
de_pval_fc_label <- paste(de_fc_label, '&', de_pval_label)
full_de$differential_status <- "Not significant"
full_de$differential_status[de_fc] <- de_fc_label
full_de$differential_status[de_pval] <- de_pval_label
full_de$differential_status[de_fc & de_pval] <- de_pval_fc_label
full_de$differential_status <- factor(full_de$differential_status, levels = c("Not significant", de_fc_label, de_pval_label, de_pval_fc_label), ordered = TRUE) # Factorize status so that non-significant is always first
# Define the thresholds we'll draw
hline_thresholds = vline_thresholds = list()
hline_thresholds[[paste(pval_column, '=', p_value_thresholds[[pvt]])]] = -log10(p_value_thresholds[[pvt]])
vline_thresholds[[paste(params$differential_fc_column, '<-', log2(params$differential_min_fold_change))]] = -log2(params$differential_min_fold_change)
vline_thresholds[[paste(params$differential_fc_column, '>', log2(params$differential_min_fold_change))]] = log2(params$differential_min_fold_change)
vline_thresholds[[paste(params$differential_fc_column, '<=', log2(params$differential_min_fold_change))]] = -log2(params$differential_min_fold_change)
vline_thresholds[[paste(params$differential_fc_column, '>=', log2(params$differential_min_fold_change))]] = log2(params$differential_min_fold_change)
palette_volcano <- append(c('#999999'), makeColorScale(3, params$differential_palette_name)) # set non-significant to gray
plot_args <- list(
x = full_de[[params$differential_fc_column]],
y = -log10(full_de[[pval_column]]),
colorby = full_de$differential_status,
ylab = paste("-log(10)", pval_column),
xlab = xlabel <- paste("higher in", contrasts$reference[i], " <<", params$differential_fc_column, ">> higher in", contrasts$target[i]) ,
xlab = xlabel <- paste("higher in", contrasts$reference[i], " <<", params$differential_fc_column, ">> higher in", contrasts$target[i]),
labels = full_de[[label_col]],
hline_thresholds = hline_thresholds,
vline_thresholds = vline_thresholds,
show_labels = FALSE,
legend_title = "Differential status",
palette = makeColorScale(2, params$differential_palette_name)
palette = palette_volcano
)
# Let's equalize the axes
Expand Down Expand Up @@ -793,21 +851,17 @@ if (any(unlist(params[paste0(possible_gene_set_methods, '_run')]))){
for (gene_set_method in possible_gene_set_methods){
if (unlist(params[paste0(gene_set_method, '_run')])){
cat("\n### ", toupper(gene_set_method) ," {.tabset}\n")
cat("\n#### ", toupper(gene_set_method) ," {.tabset}\n")
for (gmt_file in simpleSplit(params$gsea_gene_sets)) {
gmt_name <- basename(tools::file_path_sans_ext(gmt_file))
cat("\n#### ", gmt_name ," {.tabset}\n")
cat("\n##### ", gmt_name ," {.tabset}\n")
reference_gsea_tables <- paste0(contrasts$id, ".", gmt_name, '.gsea_report_for_', contrasts$reference, '.tsv')
target_gsea_tables <- paste0(contrasts$id, ".", gmt_name, '.gsea_report_for_', contrasts$target, '.tsv')
for (i in 1:nrow(contrasts)){
cat("\n##### ", contrast_descriptions[i], "\n")
cat("\n###### ", contrast_descriptions[i], "\n")
target_gsea_results <- read_metadata(target_gsea_tables[i])[,c(-2,-3)]
print( htmltools::tagList(datatable(target_gsea_results, caption = paste0("\nTarget (", contrasts$target[i], ")\n"), rownames = FALSE) ))
ref_gsea_results <- read_metadata(reference_gsea_tables[i])[,c(-2,-3)]
print( htmltools::tagList(datatable(ref_gsea_results, caption = paste0("\nReference (", contrasts$reference[i], ")\n"), rownames = FALSE) ))
}
Expand Down Expand Up @@ -857,6 +911,7 @@ make_params_table('exploratory analysis', 'exploratory_', remove_pattern = TRUE)

## Differential analysis


```{r, echo=FALSE, results='asis'}
if (params$study_type == 'rnaseq'){
make_params_table('DESeq2', 'deseq2_', remove_pattern = TRUE)
Expand All @@ -874,7 +929,7 @@ if (any(unlist(params[paste0(possible_gene_set_methods, '_run')]))){
for (gene_set_method in possible_gene_set_methods){
if (unlist(params[paste0(gene_set_method, '_run')])){
cat("\n### ", toupper(gene_set_method) ," {.tabset}\n")
cat("\n#### ", toupper(gene_set_method) ," {.tabset}\n")
make_params_table(toupper(gene_set_method), paste0(gene_set_method, '_'), remove_pattern = TRUE)
}
}
Expand Down Expand Up @@ -902,4 +957,4 @@ print( htmltools::tagList(datatable(versions_table, caption = "Software versions

```{r, echo=FALSE, results='asis'}
htmltools::includeMarkdown(params$citations)
```
```
4 changes: 2 additions & 2 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/differentialabundance/releases/tag/1.3.1" target="_blank">nf-core/differentialabundance</a>
This report has been generated by the <a href="https://github.com/nf-core/differentialabundance/releases/tag/1.4.0" target="_blank">nf-core/differentialabundance</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://nf-co.re/differentialabundance/1.3.1/docs/output" target="_blank">documentation</a>.
<a href="https://nf-co.re/differentialabundance/1.4.0/docs/output" target="_blank">documentation</a>.
report_section_order:
"nf-core-differentialabundance-methods-description":
order: -1000
Expand Down
2 changes: 1 addition & 1 deletion conf/affy.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ params {
differential_fc_column = "logFC"
differential_pval_column = "P.Value"
differential_qval_column = "adj.P.Val"
differential_feature_id_column = "probe_id"
differential_feature_id_column = "PROBEID"
differential_feature_name_column = "SYMBOL"

// A small amount of upstream work is required to get the app building
Expand Down
2 changes: 1 addition & 1 deletion conf/maxquant.config
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ params {
differential_fc_column = "logFC"
differential_pval_column = "P.Value"
differential_qval_column = "adj.P.Val"
differential_feature_id_column = "probe_id"
differential_feature_id_column = "Majority protein IDs"
differential_feature_name_column = "Majority protein IDs"

// Proteus options
Expand Down
2 changes: 1 addition & 1 deletion conf/soft.config
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ params {
differential_fc_column = "logFC"
differential_pval_column = "P.Value"
differential_qval_column = "adj.P.Val"
differential_feature_id_column = "probe_id"
differential_feature_id_column = "ID"
differential_feature_name_column = "Symbol"

}
Expand Down
Loading

0 comments on commit a3d664c

Please sign in to comment.