Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to recognize a combination of multiple barcode sequences #360

Closed
shangyf-stu opened this issue Apr 29, 2023 · 3 comments
Closed

Unable to recognize a combination of multiple barcode sequences #360

shangyf-stu opened this issue Apr 29, 2023 · 3 comments

Comments

@shangyf-stu
Copy link

Hello,
I am processing data for library construction using BD Rhapsody and inDrop. Both of them contain combined barcodes. Barcode not found in *. BCstats.txt file
image
The barcodes in the file are null, but there are counts. I don't know what caused this error. Additionally, in the data of InDrop, there may be 1-2 mismatches in the W1 sequence (GAGTGATTGCTTTGTGACGCCTT) in the fastq file. How to set it in zUMIs?

To Reproduce
Yaml file:
project: dataset467
sequence_files:
file1:
name: /dataset467/fastq1/dataset467_1.fastq.gz
base_definition:
- cDNA(1-35)
file2:
name: /dataset467/fastq1/dataset467_2.fastq.gz
base_definition:
- barcode(1-8,31-38)
- UMI(39-44)
correct_frameshift: GAGTGATTGCTTGTGACGCCTT
reference:
STAR_index: /zUMIs_v2.9.7d/star_index_2.7.1a_nogtf
GTF_file: /3.2.0_star_index/Homo_sapiens.GRCh38.102.gtf
exon_extension: no
extension_length: 0
scaffold_length_min: 0
additional_files:
additional_STAR_params:
out_dir: /dataset467/zUMIs
num_threads: 8
mem_limit: 20
filter_cutoffs:
BC_filter:
num_bases: 1
phred: 20
UMI_filter:
num_bases: 1
phred: 20
barcodes:
barcode_num: null
barcode_file: null
barcode_sharing: null
automatic: yes
BarcodeBinning: 1
nReadsperCell: 100
demultiplex: no
counting_opts:
introns: yes
intronProb: no
downsampling: '0'
strand: 0
Ham_Dist: 0
velocyto: no
primaryHit: yes
multi_overlap: no
fraction_overlap: 0
twoPass: yes
make_stats: yes
which_Stage: Filtering
samtools_exec: samtools
Rscript_exec: Rscript
STAR_exec: STAR
pigz_exec: pigz

Screenshots
Error message:
Error in uik(bccount$cellindex, bccount$cs/1000) :
Method is not applicable for such a small vector. Please give at least a 5 numbers vector
Calls: cellBC -> .cellBarcode_unknown -> .FindBCcut -> uik
Execution halted
Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes_binned.txt")) :
File '/dataset467/zUMIs/zUMIs_output/dataset467kept_barcodes_binned.txt' does not exist or is non-readable. getwd()=='/dataset467/zUMIs'
Execution halted
Loading required package: yaml
Loading required package: Matrix
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file '/dataset467/zUMIs/zUMIs_output/expression/dataset467.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Error in data.table::fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, :
File '/dataset467/zUMIs/zUMIs_output/dataset467kept_barcodes.txt' does not exist or is non-readable. getwd()=='/dataset467/zUMIs'
Execution halted

I hope you can provide me with some help. Thank you!
Best wishes,
Shang

@shangyf-stu
Copy link
Author

The yaml file above is for inDrop data, and the yaml file below is for BD Rhapsody:
project: dataset466
sequence_files:
file1:
name: /dataset466/filter_fastp/dataset466_clean_1.fastq.gz
base_definition:
- BD(1-9,22-30,44-52)
- UMI(53-60)
file2:
name: /dataset466/filter_fastp/dataset466_clean_2.fastq.gz
base_definition:
- cDNA(1-150)
reference:
STAR_index: /zUMIs_v2.9.7d/star_index_2.7.1a_nogtf
GTF_file: /3.2.0_star_index/Homo_sapiens.GRCh38.102.gtf
exon_extension: no
extension_length: 0
scaffold_length_min: 0
additional_files:
additional_STAR_params:
out_dir: /dataset466/zUMIs
num_threads: 8
mem_limit: 20
filter_cutoffs:
BC_filter:
num_bases: 1
phred: 20
UMI_filter:
num_bases: 1
phred: 20
barcodes:
barcode_num: null
barcode_file: null
barcode_sharing: null
automatic: yes
BarcodeBinning: 1
nReadsperCell: 100
demultiplex: no
counting_opts:
introns: yes
intronProb: no
downsampling: '0'
strand: 0
Ham_Dist: 0
velocyto: no
primaryHit: yes
multi_overlap: no
fraction_overlap: 0
twoPass: yes
make_stats: yes
which_Stage: Filtering
samtools_exec: samtools
Rscript_exec: Rscript
STAR_exec: STAR
pigz_exec: pigz

@cziegenhain
Copy link
Collaborator

Hi,

To your questions:

  1. zUMIs does not support allowing for mismatches in the frameshift correction pattern
  2. You do not receive useful output because your YAML files are incorrect. You must specify the barcode base ranges with BC but you have written barcode or BD. https://github.com/sdparekh/zUMIs/wiki/Protocol-specific-setup

Please check carefully all documentation before opening issues. Thank you.

@shangyf-stu
Copy link
Author

Thank you for your answer. All samples have been successfully run. I'm sorry for the mistake caused by my carelessness!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants