Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The question about .config file #4

Closed
houruiyan opened this issue Oct 5, 2021 · 13 comments
Closed

The question about .config file #4

houruiyan opened this issue Oct 5, 2021 · 13 comments
Labels
bug Something isn't working

Comments

@houruiyan
Copy link

Hi, thanks for the great tool. I am trying to use it to solve some problems in my project. I have the 10x data and I used the cellranger to align them into the human ref. Finally, I got the bam file. So I want to configure the .config file. But I found it seems is not friendly to the input file exception the SICILIAN. I cannot how to write the input_file and meta file. Could you please give me some examples? I cannot understand the definition of "grouping_level_1 and grouping_level_2" and could you give me some explanation? Thank you in advance!

@houruiyan houruiyan added the bug Something isn't working label Oct 5, 2021
@kaitlinchaung
Copy link
Collaborator

kaitlinchaung commented Oct 5, 2021

Hello! Thank you for your question.

It sounds like you have some cellranger-aligned bams, and you have not run SICILIAN on that bam, is that correct?
In that case, I think you would want to have the following options:
SICILIAN = false
samplesheet = YOUR_SAMPLESHET_HERE.csv

For 10X data, I would follow the instructions in the first block to create the samplesheet: https://github.com/salzmanlab/SpliZ#samplesheets
You should have 2 comma-separated columns:

  • the name of the bam file(translates to the bam_ID)
  • the path to that bam file

For the metadata, that file should have at least 3 columns:

  • cell_id formatted as ${bam_ID}_${cellranger_barcode}
  • grouping_level_1 the metadata unit over which you would like to perform differential analysis
  • grouping_level_2 the metadata unit that you would like to calculate differential analysis

It is possible that you only have one group over which you'd like to perform differential analysis( #2 ), in which case, you can leave grouping_level_1 blank, and your metadata would look like:

  • cell_id formatted as ${bam_ID}_${cellranger_barcode}
  • grouping_level_2 the metadata unit that you would like to calculate differential analysis

An example I can provide is if you have data from multipletissue (i.e. lung, kidney, and heart) and multiple cell_type (i.e endothelial, blood, capillary) within each tissue.

  • If grouping_level_1 = tissue and grouping_level_2 = cell_type, then you would be looking for differential SpliZ in endothelial vs blood vs capillary FOR EACH tissue.
  • If grouping_level_2 = tissue and there is no grouping_level_1, then you would be looking for differential SpliZ in endothelial vs blood vs capillary, irrespective of tissue.
  • If grouping_level_2 = cell_type and there is no grouping_level_1, then you would be looking for differential SpliZ in lung vs kidney vs heart, irrespective of cell_type.

I hope that helps, and feel free to paste in your config file/metadata/samplesheets to check. And thanks again for your question, I'll update the readme to clarify the parameters a bit.

@houruiyan
Copy link
Author

Thank you very much! Your explanation is very clear! I write the .config file and build the meta data/samplesheet according to your instruction. I think there is also point that should be paid attention. When we use the bam file, we do not need to set value for the "input file". I think it works.
This is my meta data.
image

This is my config.
image

But there is another new problem appear.
image
image

I don't know the point causing this problem. Hope to get your help. Thank you!

@kaitlinchaung
Copy link
Collaborator

Can you please navigate to the 'Work dir' of that failed job, and paste the results of *.log?

The 'Work dir' path is located in the bottom of your second image, i.e. /storage/yhhuang/../work/..

@kaitlinchaung
Copy link
Collaborator

It may also be helpful to paste in a couple lines of your MS_ann_splices.tsv file.

@houruiyan
Copy link
Author

Dear Dr Chaung,

This is my calc_splizvd.log in the "work dir":
image

This is the MS_ann_splices.tsv file in my "work dir"
image

Thank you!

@kaitlinchaung
Copy link
Collaborator

Hi, if the column names of your metadata file are grouping_level_1 and grouping_level_2, then your config file should have:
grouping_level_1 = grouping_level_1
grouping_level_2 = grouping_level_2

@houruiyan
Copy link
Author

ok, thank you very much! I will try it! Thank you again!

@houruiyan
Copy link
Author

It works. thank you!

@kaitlinchaung
Copy link
Collaborator

No problem!

@wlei-amu
Copy link

Hello,
I want to run this tool for non-SICILIAN inputs,but I don't know what code to run, can you show me yours?Thanks!

@wlei-amu
Copy link

Hello, I want to run this tool for non-SICILIAN inputs,but I don't know what code to run, can you show me yours?Thanks!

If I configure the .config file,Where should I modify the.config file and what code should I run?Thanks!

@juliaolivieri
Copy link
Collaborator

Hellow @wlei-amu, what kind of data do you want to run on? 10X cellranger BAMs?

@tjhwangxiong
Copy link

tjhwangxiong commented Apr 1, 2022

Hellow @wlei-amu, what kind of data do you want to run on? 10X cellranger BAMs?

Dear juliaolivieri, I build SpliZ as following:

git clone https://github.com/salzmanlab/SpliZ.git
cd SpliZ
conda env create --name spliz_env --file=environment.yml
conda activate spliz_env
conda install nextflow

I have ran test data successfully via modifing small.config to set input_file = "small_data/small.pq".

Here, I wonder, if we run SpliZ using 10X cellranger BAMs, which config file shall we edit or generate? Can I justed modified the nextflow.config file as following:

// Global default params, used in configs
params {
  // Workflow flags for SpliZ
  // TODO nf-core: Specify your pipeline's command line flags
  dataname = wx
  input_file = wx_1.bam
  SICILIAN = false
  pin_S = 0.01
  pin_z = 0.0
  bounds = 5
  light = false
  svd_type = "normdonor"
  n_perms = 100
  grouping_level_1 = grouping_level_1
  grouping_level_2 = grouping_level_2
  libraryType = null
  run_analysis = false
  samplesheet = samplesheet.csv
  annotator_pickle = hg38_refseq.pkl
  exon_pickle = hg38_refseq_exon_bounds.pkl
  splice_pickle = hg38_refseq_splices.pkl
  meta = metadata.tsv
  gtf = GRCh38_genomic.gtf
  rank_quant = 0
  outdir = './results/${params.dataname}'
  publish_dir_mode = 'copy'

Or should I generate a new config file? If so, how shall I load the new config file.

Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants