Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spikeIn sampe scaling factor calculation discrepenses #953

Closed
sunta3iouxos opened this issue Nov 8, 2023 · 2 comments
Closed

spikeIn sampe scaling factor calculation discrepenses #953

sunta3iouxos opened this issue Nov 8, 2023 · 2 comments

Comments

@sunta3iouxos
Copy link

I am getting different results using the multiBamSummary command on my local PC using the following command that I expected to be the same on the server side, and I am suspecting that this is probably the issue:
my command:

multiBamSummary bins --binSize 10000 --blackListFileName /home/tgeorgom/CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikesTEST/annotation/blacklist.bed --ignoreDuplicates -p 8 --bamfiles /mnt/c/AP04/split_bam/*host*bam  -out /mnt/c/AP04/mergedBam/deeptools/multiBAM_SPIKE_bin1000.out.npz --scalingFactors /mnt/c/AP04/mergedBam/deeptools/multiBAM_spike_scaling_q2.txt  --minMappingQuality 2

Number of bins found: 273590
the output:

sample	scalingFactor
A006850324_209957_S1_L000_spikein.bam	1.0099
A006850324_209960_S2_L000_spikein.bam	1.1728
A006850324_209962_S3_L000_spikein.bam	0.8376
A006850324_209964_S4_L000_spikein.bam	1.0653
A006850324_209966_S5_L000_spikein.bam	0.8244
A006850324_209968_S6_L000_spikein.bam	0.9989
A006850324_209970_S7_L000_spikein.bam	1.0220
A006850324_209972_S8_L000_spikein.bam	1.0535
A006850324_209974_S9_L000_spikein.bam	1.1026
A006850324_209976_S10_L000_spikein.bam	1.0160
A006850324_209978_S11_L000_spikein.bam	1.0301
A006850324_209980_S12_L000_spikein.bam	1.0347
A006850324_209982_S13_L000_spikein.bam	1.0306
A006850324_209984_S14_L000_spikein.bam	0.9753
A006850324_209986_S15_L000_spikein.bam	0.9549
A006850324_209988_S16_L000_spikein.bam	0.9313
A006850324_209990_S17_L000_spikein.bam	0.8619
A006850324_209992_S18_L000_spikein.bam	1.0217

and this is what I am using on the server side:

ChIP-seq -d /scratch/tgeorgom/AP04/ --useSpikeInForNorm --getSizeFactorsFrom genome --sampleSheet /scratch/tgeorgom/AP04/pSer5POLII.tsv --windowSize 500 --plotFormat pdf mm10_gencodeM19_spikesTEST /scratch/tgeorgom/AP04/PolII_ChIPtype_all.yalm

Number of bins found: 4687

and the output is as follows:

sample	scalingFactor
A006850324_209957_S1_L000	0.9813
A006850324_209960_S2_L000	1.1281
A006850324_209962_S3_L000	0.8268
A006850324_209964_S4_L000	1.0348
A006850324_209966_S5_L000	0.8089
A006850324_209968_S6_L000	0.9749
A006850324_209970_S7_L000	0.9942
A006850324_209972_S8_L000	1.0490
A006850324_209974_S9_L000	1.0823
A006850324_209976_S10_L000	0.9764
A006850324_209978_S11_L000	0.9942
A006850324_209980_S12_L000	1.0081
A006850324_209982_S13_L000	0.9971
A006850324_209984_S14_L000	0.9630
A006850324_209986_S15_L000	0.9409
A006850324_209988_S16_L000	0.9228
A006850324_209990_S17_L000	0.8261
A006850324_209992_S18_L000	1.0030

Using a different --binSize to recapitulate the Number of bins found, did not solved the issue:

multiBamSummary bins --binSize 500000 --ignoreDuplicates -p 8 --bamfiles /mnt/c/AP04/split_bam/*spikein*bam -out /mnt/c/AP04/mergedBam/deeptools/multiBAM_SPIKE_bin1000.out.npz --scalingFactors /mnt/c/AP04/mergedBam/deeptools/multiBAM_spike_scaling_q2Noblack_bin500000.txt  --minMappingQuality 2

Number of bins found: 5518

sample	scalingFactor
A006850324_209957_S1_L000_spikein.bam	1.0093
A006850324_209960_S2_L000_spikein.bam	1.1931
A006850324_209962_S3_L000_spikein.bam	0.8369
A006850324_209964_S4_L000_spikein.bam	1.0730
A006850324_209966_S5_L000_spikein.bam	0.8210
A006850324_209968_S6_L000_spikein.bam	1.0097
A006850324_209970_S7_L000_spikein.bam	1.0212
A006850324_209972_S8_L000_spikein.bam	1.0618
A006850324_209974_S9_L000_spikein.bam	1.1137
A006850324_209976_S10_L000_spikein.bam	1.0174
A006850324_209978_S11_L000_spikein.bam	1.0432
A006850324_209980_S12_L000_spikein.bam	1.0476
A006850324_209982_S13_L000_spikein.bam	1.0343
A006850324_209984_S14_L000_spikein.bam	0.9812
A006850324_209986_S15_L000_spikein.bam	0.9624
A006850324_209988_S16_L000_spikein.bam	0.9431
A006850324_209990_S17_L000_spikein.bam	0.8646
A006850324_209992_S18_L000_spikein.bam	1.0306
@katsikora
Copy link
Contributor

Hi,
the default spikein_bin_size for calculating scaling factors from spikein genome is 1000 . This should be visible in the workflow config.yaml in the output folder.

In the first commandline you pasted, I've noticed you are passing host bam files to calculate spikein size factors. Then you're citing size factors from a file, in which they appear to be calculated on spikein bam files. Could you revisit that?

If you run snakePipes with --verbose, full shell commands will be returned in the log file. You can then see the full multiBamSummary command with all the parameters passed. Would that be helpful?

Best,

Katarzyna

@sunta3iouxos
Copy link
Author

In the first commandline you pasted, I've noticed you are passing host bam files to calculate spikein size factors. Then you're citing size factors from a file, in which they appear to be calculated on spikein bam files. Could you revisit that?

that was a typo from my side, I will come back after testing a few things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants