Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error while running HiCPro2FitHiC.py #389

Closed
priyatamapandey opened this issue Dec 12, 2020 · 23 comments
Closed

error while running HiCPro2FitHiC.py #389

priyatamapandey opened this issue Dec 12, 2020 · 23 comments

Comments

@priyatamapandey
Copy link

Hi,
I have used singularity container for running hicpro and it generated all the files, Further I want to use FitHiC. To do that I am using HiCPro2FitHiC.py but it ended up with the error. Although it generated 3 files but I think those are not the complete files. I tried using those files in FitHiC and it gave me error again.

The command I used is

singularity shell --bind /project/roselai_228  /project/wiemels_260/priya/myTools/hicpro_3.0.0_ubuntu.img 
/HiC-Pro-devel_py3/bin/utils/hicpro2fithic.py -i matrix/SRR1030745/raw/1000000/SRR1030745_1000000.matrix -b matrix/SRR1030745/raw/1000000/SRR1030745_1000000_abs.bed -s matrix/SRR1030745/iced/1000000/SRR1030745_1000000_iced.matrix.biases -o output

And the error I got is below

Screen Shot 2020-12-11 at 6 29 45 PM

I am guessing that may be there is issue in the output results generated from Hicpro. FYI,
the command I used for HicPro is

singularity shell --bind /project/roselai_228 /project/wiemels_260/priya/myTools/hicpro_3.0.0_ubuntu.img Singularity> HiC-Pro -i /project/roselai_228/priyatap/HiC_work/fastq/SRR1030745 -o /project/roselai_228/priyatap/HiC_work/output -c /project/roselai_228/priyatap/HiC_work/annotationFiles/config-hicpro.txt

Please help to figure out this issue,
Thank you,
Priya

@nservant
Copy link
Owner

Hi
Would you mind sharing with me the input files please ? So that I can try to reproduce the bug ?
Thanks

@priyatamapandey
Copy link
Author

priyatamapandey commented Dec 14, 2020

Hi,
Thank you for your reply. I also asked from the FitHiC group and they found that my _abs.bed file has one less row than the biases file which seems to cause the error.

Attached please find the input files.
troubleshooting.zip

Thank you so much for help,
Priya

@priyatamapandey
Copy link
Author

Hi,
Have you able to reproduce the error? Any idea, what causing error?

Thank you,
Priya

@priyatamapandey
Copy link
Author

Hi,

In the meanwhile I tried to run older version 2.11.4 from https://zerkalo.curie.fr/partage/HiC-Pro/singularity_images/hicpro_latest_ubuntu.img
I have received an error. I used the same command as earlier.
Here is the error
Screen Shot 2020-12-16 at 3 35 00 PM

Please help me, how to resolve this issue.

Thank you,

@nservant
Copy link
Owner

Hi,
I figured out what's the issue, but I do not understand how it is possible ...
In your matrix, you have 3114 bins, but in the bias files, you have 3115 values.
Could you please check which iced version your are using ? and update it to the latter version if necessary ?
Thanks

@priyatamapandey
Copy link
Author

Hi,

Thank you for checking it. It used 0.5.6 version of iced. I am pasting log file for ice

`cat ice_1000000.log
ice --results_filename hic_results/matrix/SRR1030745/iced/1000000/SRR1030745_1000000_iced.matrix --filter_low_counts_perc 0.02 --filter_high_counts_perc 0 --max_iter 100 --eps 0.1 --remove-all-zeros-loci --output-bias 1 --verbose 1 hic_results/matrix//SRR1030745/raw/1000000/SRR1030745_1000000.matrix
/usr/local/anaconda/lib/python3.7/site-packages/iced/normalization/_ca_utils.py:9: UserWarning: The API of this module is likely to change. Use only for testing purposes
"The API of this module is likely to change. "
Using iced version 0.5.6
Loading files...
Normalizing...
Filter 264 out of 3115 bins ...
Matrix is triangular superior
Writing results...

`

I want to tell you that I tried the older version of HiC pro singularity container 2.11.4 . And it worked and further I used the utility code hicpro2fithic.py and that also worked. Although, now I am getting some error in fithic. Do you think the newer version may be have some bug. I am pasting the same ice log file which generated which generated from 2.11.4 version. In this case iced version is 0.4.2.

`

cat ice_1000000.log
ice --results_filename hic_results/matrix/SRR1030745/iced/1000000/SRR1030745_1000000_iced.matrix --filter_low_counts_perc 0.02 --filter_high_counts_perc 0 --max_iter 100 --eps 0.1 --remove-all-zeros-loci --output-bias 1 --verbose 1 hic_results/matrix//SRR1030745/raw/1000000/SRR1030745_1000000.matrix
Using iced version 0.4.2
Loading files...
Normalizing...
Filter 263 out of 3114 bins ...
Matrix is triangular superior
ICE at iteration 1 156.18873409511477
ICE at iteration 2 41.63980704529987
ICE at iteration 3 11.578281107643033
ICE at iteration 4 3.2393413338099606
ICE at iteration 5 0.911371468541986
ICE at iteration 6 0.2572246911103202
break at iteration 7
Writing results..

`
Thank you,
Priya

@nservant
Copy link
Owner

Thank you Priya.
That's good to know. I'll check with the iced developer.

For fitHiC, please use the fitHiC google group to find help

@nservant
Copy link
Owner

Hi Priya
One short question. Did you compare the two bias files that you have (with iced 0.5.6 and iced 0.4.2) ?
Are they similar ? I would expect that the output of iced 0.5.6 has an extra line ? hopefully the first one ?
Thank you
best

@priyatamapandey
Copy link
Author

Hi,
that is right.
Here is the first 10 line of that file.

iced 0.4.2
3114 lines

[priyatap@discovery1 1000000]$ head -n 10 SRR1030745_1000000_iced.matrix.biases
1.259179284464612414e-01
4.220329226397311895e-01
4.778662319336556830e-01
4.565832414325673438e-01
7.838514289932088097e-01
9.350828020805725949e-01
7.184684395304455906e-01
7.690394973140728396e-01
9.585290308679312865e-01
6.438992048300308246e-01

iced 0.5.6
3115 lines

[priyatap@discovery2 1000000]$ head -n 10 SRR1030745_1000000_iced.matrix.biases
nan
1.259179284464615189e-01
4.220329226397306899e-01
4.778662319336555719e-01
4.565832414325678434e-01
7.838514289932090318e-01
9.350828020805728169e-01
7.184684395304433702e-01
7.690394973140743939e-01
9.585290308679310645e-01

Is that the sole cause of error?

Thanks,
Priya

@nservant
Copy link
Owner

Thank you Priya
I asked @NelleV to look at it !
best

@priyatamapandey
Copy link
Author

priyatamapandey commented Dec 24, 2020

Hi,
I have few paired end files and one single end fastq file. How should I proceed with the single end fastq file. I keep each sample in different subfolder under the main input file folders.

Thank you,
Priya

@nservant
Copy link
Owner

Hi Priya
HiC-Pro cannot handle single-end data. And actually, this is the first time I'm seeing single-end data for Hi-C ...
Sorry for that
Best

@priyatamapandey
Copy link
Author

Hi,
I downloaded this SRA file from the below link.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1267209

If you check the description in the link it has only one file of paired file (may be just forwards reads file)

Description - CTCF_NP_R1.fastq.gz

Is there any way I can run this file using His-Pro?

Thank you,
Priya

@nservant
Copy link
Owner

nservant commented Jan 4, 2021

Hi Priya
This is not a Hi-C dataset, I think this is a CTCF ChIP-seq dataset ;)
Best

@priyatamapandey
Copy link
Author

Detail description confused me. Thanks for noticing.

Priya

@priyatamapandey
Copy link
Author

Hi,
I want to do SNP specific analysis. I want to run your sample data first but unable to find out the input example file mgp.v2.snps.annot.reformat.vcf. Can you please locate me the exact link?

Thank you,
Priya

@nservant
Copy link
Owner

Hi Priya,
Are you working on Mouse ? what are the parental strains ?
Best

@priyatamapandey
Copy link
Author

Hi,
I am working on human brain cells. I want to limit my analysis for few functional snips. Si that I can get more interaction around an anchor/snp in a chromosome. Is there any parameter or any way to limit interactions chromosome wise instead of gemone wide. May be that way I can generate more interaction in region of interest.
Thank you,
Priya

@nservant
Copy link
Owner

Hi Priya,
If you only want to look at a few snps, I think you do it by hand (with a custom script).
The allele specfic mode of HiC-pro is mainly useful if you have a list of all phased snps genome-wide, and you want to distinguish the interactions between parental chromosomals.
Best

@priyatamapandey
Copy link
Author

Hi, Thank you for reply. I am exploring the options more in HicPro, so that I can utilize your tool. I want to generate result similar to the below screen shot, that is mostly focused around a genomic position. In the image, my region of interest is first 2 column window which interacting with moving window of 40KB.

Screen Shot 2021-01-11 at 6 08 00 PM

I found the capture target bed file option can be given as a input in His-Pro. What this file is basically ? Can I give a window of 40KB or something similar?

[CAPTURE_TARGET] | BED file of target regions to focus on (mainly used for capture Hi-C data)

Thanks for bearing with me,
Priya

@priyatamapandey
Copy link
Author

Hi,
After running hicpro2fithic.py and found that the output file named fragmentMappability.gz is not exactly in the fithic input required format.

Here is the screenshot.
Screen Shot 2021-01-17 at 12 49 52 AM

This file is fragment file for the fithic and the description for this file is I am pasting below

The -f argument is used to pass in a full path to what we deem a 'fragments file,' Each line will have 5 entries. The second and fifth fields can be any integer as they are not needed in most cases. The first field is the chromosome name or number, the third field is the coordinate of the midpoint of the fragment on that chromosome, the fourth field is the total number of observed mid-range reads (contact counts) that involve the specified fragment. The fields can be separated by space or tab. All possible fragments need to be listed in this file.

where as it should look like below.
Screen Shot 2021-01-17 at 12 54 46 AM

I can see my files showing fragment entires in col 2 and col 3 both. Please suggest me why I am seeing this kind of output.

Thanks,
Priya

@nservant
Copy link
Owner

Hi Pryia,
The hicpro2fithic.py was actually writting by the FitHiC team :) so I'm not really an expert.
Would you mind asking the question on the FitHiC google group ?
Thanks

@priyatamapandey
Copy link
Author

Hi Nicolas,
I am using target capture to see interaction around a region. I did not find any example for the target capture bed file so I am playing by changing the length of the region that means changing start and end position from few base pairs to 40KB in my capture target bed file. My goal is to get the interaction of the target region with 1Mb of the either side around that region.

So my question is, my capture target file is changing only, I have to rerun HiC-pro all the time from the beginning or I can start from build_contact_maps step option?
I would appreciate if you also suggest capture target file bin size too.

Thank you,
Priya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants