Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiome support (scATAC + scRNA) #174

Open
heylf opened this issue Nov 2, 2022 · 13 comments
Open

Add multiome support (scATAC + scRNA) #174

heylf opened this issue Nov 2, 2022 · 13 comments
Labels
enhancement New feature or request

Comments

@heylf
Copy link
Contributor

heylf commented Nov 2, 2022

Description of feature

Just wanted to put it down and mention that I am currently working on the implementation of cellranger-arc (modules + subworkflow). Maybe worth to discuss if this is fine to integrate into scranseq or if his should be an own pipeline as it requires a different sample sheet format and different input checks.

@heylf heylf added the enhancement New feature or request label Nov 2, 2022
@grst
Copy link
Member

grst commented Nov 4, 2022

Could you elaborate on how the samplesheet would need to look like? If it's just about having additional columns, I think it would be fine.

More general, we should think about which modalities (ATAC, CITE, VDJ, spatial, ...) we want to support in the future and which should be processed by the same workflow.

@heylf
Copy link
Contributor Author

heylf commented Nov 7, 2022

I will implement cellranger-arc in scrnaseq and then see how it goes. But yes, it would be nice to discuss the modalities.

@apeltzer
Copy link
Member

apeltzer commented Nov 8, 2022

Would still be great to discuss here before merging an entire subworkflow first 😉

@heylf
Copy link
Contributor Author

heylf commented Nov 9, 2022

Technically cellranger-arc needs a samplesheet (lib.cv) as an input which looks likes this:

fastqs,sample,library_type
/home/jdoe/runs/HNGEXSQXXX/outs/fastq_path,example,Gene Expression
/home/jdoe/runs/HNATACSQXX/outs/fastq_path,example,Chromatin Accessibility

Thus, lib.csv defines the folder locations for the scRNA and scATAC for the sample.

My thinking was the following:

  1. Keep the definitions, subworkflows, and modules for the samplesheet as it is currently defined for scrnaseq
  2. Add two new optional columns to the samplesheet: folder_GEX and folder_ATAC
  3. Write a separate input_check_multiome.nf to create an input channel for scranseq.nf that has instead of [meta, [reads]] now [meta, [folders]]
  4. Write a module and script to generate the lib.csv for cellranger based on the input channel from step 3.

This approach has two advantages:

  1. It can be easily adapted for future data modalities (e.g., an additional methylation data) if this arrives at some point.
  2. The user does not have to generate the lib.csv and stays with the definition of the samplesheet for scrnaseq

What are your thoughts about this? @grst and @apeltzer

@grst
Copy link
Member

grst commented Nov 9, 2022

Personally I like to have all files explicitly listed in the samplesheet, also for consistency with other aligners.

Possible alternative:
Add an additional column sample_type to the samplesheet. The column is optional and if nothing is specified it assumes "gex":

sample,fastq_1,fastq_2,expected_cells,sample_type
pbmc8k,s3://nf-core-awsmegatests/scrnaseq/input_data/pbmc8k_S1_L007_R1_001.fastq.gz,s3://nf-core-awsmegatests/scrnaseq/input_data/pbmc8k_S1_L007_R2_001.fastq.gz,"10000",gex
pbmc8k,s3://nf-core-awsmegatests/scrnaseq/input_data/pbmc8k_S1_L008_R1_001.fastq.gz,s3://nf-core-awsmegatests/scrnaseq/input_data/pbmc8k_S1_L008_R2_001.fastq.gz,"10000",gex
pbmc8k,s3://nf-core-awsmegatests/scrnaseq/input_data/pbmc8k_S1_L007_R1_001.fastq.gz,s3://nf-core-awsmegatests/scrnaseq/input_data/pbmc8k_S1_L007_R2_001.fastq.gz,"10000",atac
pbmc8k,s3://nf-core-awsmegatests/scrnaseq/input_data/pbmc8k_S1_L008_R1_001.fastq.gz,s3://nf-core-awsmegatests/scrnaseq/input_data/pbmc8k_S1_L008_R2_001.fastq.gz,"10000",atac

@heylf
Copy link
Contributor Author

heylf commented Nov 9, 2022

Works for me.

@apeltzer
Copy link
Member

apeltzer commented Nov 9, 2022

Tendency to go for the option proposed by @grst as it will not break existing solutions / setup 👍🏻

@heylf
Copy link
Contributor Author

heylf commented Nov 9, 2022

Perfect. Then I am on it!

@apeltzer
Copy link
Member

apeltzer commented Nov 9, 2022

A bit of a question remains for me how we should generally work with these multi-ome analysis types: scrnaseq (as per name suggests ;-)) is for sc-rna analysis, if we continue to add more types of analyses we might have overlaps with other nf-core pipelines (atacseq, ...) - maybe something I will put up for discussion on the general nf-core slack how to deal with these sort of things in the future 👍🏻

@heylf
Copy link
Contributor Author

heylf commented Nov 9, 2022

Thanks @apeltzer. Indeed, also for findability, because users might not immediately realize that you could use scrnaseq for scATAC and multiome.

@apeltzer
Copy link
Member

apeltzer commented Nov 9, 2022

Link to discussion on Slack also added here to do proper x-ref: https://nfcore.slack.com/archives/CE4K7FEHE/p1667987775811819

@apeltzer
Copy link
Member

apeltzer commented Nov 9, 2022

Please chime in there too - there is some opinions out there.

@grst
Copy link
Member

grst commented Jul 13, 2023

The corresponding module has been merged:
nf-core/modules#3229

@grst grst removed this from the 2.3.0 milestone Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants