Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing cooler on capture Hi-C data #251

Open
nservant opened this issue Apr 16, 2021 · 9 comments
Open

testing cooler on capture Hi-C data #251

nservant opened this issue Apr 16, 2021 · 9 comments

Comments

@nservant
Copy link

Hi @nvictus
As quickly discussed, I tried to build a cooler object on a 3Mb Hi-C region from capture Hi-C data.
I'm facing several issues according to the test I run ...

I have a bed file with genomic intervals of 1kb, from chrX:150125000-153125000
Accordinly, I extracted my pairs within the same genomic range ;

>>zcat contacts.txt.gz | head -1
NB501764:1043:HM5FKBGXF:2:23302:4878:10125	chrX	150125462	chrX	150163625	+	+
>>zcat contacts.txt.gz | tail -1
NB501764:1043:HM5FKBGXF:3:21505:23621:19037	chrX	153121783	chrX	153123912	-	-

Then, I simply try to ingest the data with cload pairs

>>cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 target.bed contacts.txt.gz test.cool
INFO:cooler.create:Writing chunk 0: /data/kdi_prod/.kdi/project_workspace_0/1309/acl/01.00/downstreamAnalysis/scripts/tmpenmswn78.multi.cool::0
INFO:cooler.create:Creating cooler at "/data/kdi_prod/.kdi/project_workspace_0/1309/acl/01.00/downstreamAnalysis/scripts/tmpenmswn78.multi.cool::/0"
INFO:cooler.create:Writing chroms
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.create:Writing indexes
INFO:cooler.create:Writing info
INFO:cooler.create:Done
INFO:cooler.create:Merging into test.cool
INFO:cooler.create:Creating cooler at "test.cool::/"
INFO:cooler.create:Writing chroms
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.reduce:nnzs: [377101]
INFO:cooler.reduce:current: [0]
Traceback (most recent call last):
  File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/bin/cooler", line 10, in <module>
    sys.exit(cli())
  File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/cli/cload.py", line 492, in pairs
    ordered=False
  File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 944, in create_cooler
    max_merge=max_merge)
  File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 687, in create_from_unordered
    **kwargs)
  File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 577, in create
    file_path, target, meta.columns, iterable, h5opts, lock)
  File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 213, in write_pixels
    for i, chunk in enumerate(iterable):
  File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/reduce.py", line 163, in __iter__
    ignore_index=True)
  File "conda/nf-core-hic/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 255, in concat
    sort=sort,
  File "conda/nf-core-hic/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 304, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

Then, I had a try with cooler pairx

cooler csort -c1 2 -p1 3 -s1 6 -c2 4 -p2 5 -s2 7 -o contacts.csort.txt.gz contacts.txt.gz chrX.sizes
cooler cload pairix target.bed contacts.csort.txt.gz test.cool       
INFO:cooler.cli.csort:Enumerating requested chromosomes...
INFO:cooler.cli.csort:chrX	1
INFO:cooler.cli.csort:Input: '../Jarid_NPC_B1/cooler/Jarid_NPC_B1_contacts.txt.gz'
INFO:cooler.cli.csort:Output: '../Jarid_NPC_B1/cooler/Jarid_NPC_B1_contacts.csort.txt.gz'
INFO:cooler.cli.csort:Reordering pair mates and sorting pair records...
INFO:cooler.cli.csort:Sort order: block (chrom1, chrom2, pos1, pos2)
INFO:cooler.cli.csort:sort -k2,2 -k4,4 -k3,3n -k5,5n --parallel=8 --buffer-size=50%
INFO:cooler.cli.csort:Indexing...
INFO:cooler.cli.csort:Indexer: pairix
INFO:cooler.cli.csort:pairix -f -s2 -d4 -b3 -e3 -u5 -v5 ../Jarid_NPC_B1/cooler/Jarid_NPC_B1_contacts.csort.txt.gz
INFO:cooler.cli.cload:Using 8 cores
INFO:cooler.create:Creating cooler at "../Jarid_NPC_B1/cooler/Jarid_NPC_B1_1000.cool::/"
INFO:cooler.create:Writing chroms
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.create:Binning chrX:150125000-151625000|*
INFO:cooler.create:Binning chrX:151625000-153125000|*
INFO:cooler.create:Finished chrX:151625000-153125000|*
INFO:cooler.create:Finished chrX:150125000-151625000|*
Traceback (most recent call last):
  File "conda/nf-core-hic/bin/cooler", line 10, in <module>
    sys.exit(cli())
  File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/cli/cload.py", line 238, in pairix
    ordered=True)
  File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 925, in create_cooler
    lock=lock)
  File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 577, in create
    file_path, target, meta.columns, iterable, h5opts, lock)
  File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 213, in write_pixels
    for i, chunk in enumerate(iterable):
  File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_ingest.py", line 292, in _validate_pixels
    "Found a bin ID that exceeds the declared number of bins. "
cooler.create._ingest.BadInputError: Found a bin ID that exceeds the declared number of bins. Check whether your bin table is correct.

The csort command works (although I put here the entire chrX size ? not sure what to put otherwise ...), but the cload pairix also crashed ...

Of note, I also reported the same error in cooltools earlier, when trying to bin a small genome
open2c/cooltools#237

cooler -V
cooler, version 0.8.6.post0

@nvictus
Copy link
Member

nvictus commented Apr 19, 2021

Hi @nservant,

I have a bed file with genomic intervals of 1kb, from chrX:150125000-153125000

Bin tables must start at 0 and should end at the chromosome length, even if the data is restricted to a smaller region. The first bin could simply be [0, 150125000) and the last [153125000, chrX_size).

Let me know if that works.

@nservant
Copy link
Author

ok. I'll try and let you know if it works.
What would be the best way to go for this type of data according to you ?
Should I add the first/last bins as you just suggested, or should I continue to use the whole chromosome ?
I'm a bit concerned about the impact on balancing and downstream analysis (insulation, dots, compartments with cooltools) ?
Thanks

@Phlya
Copy link
Member

Phlya commented Apr 19, 2021

I support the idea we should look more into this btw in general. More and more people do this kind of analysis.

I've just been using regular genome-wide equal binning for this sort of data, this way you also get a view of the contacts that are only captured on one end, and they are not completely useless potentially.

@nservant
Copy link
Author

Hi @Phlya, I agree with you. The only point is that I frequently have some balancing issues using the contacts that are only captured on one end ... while focusing on the targeted region usually works well. Though I never really deeply investigate the reason. If you want to make additional tests, let me know, I'll be happy to exchange on that.

@Phlya
Copy link
Member

Phlya commented Apr 19, 2021

You don't even need to use them for balancing, just storing them in the cooler so you can see them if you want... In my limited experience balancing small regions is not 100% reliable in general unfortunately, need to modify filtering sometimes. But would be good to assemble some test data to check how the tools work with it.

@nservant
Copy link
Author

nservant commented Apr 19, 2021

One interesting feature which could help on that would simply be to have a tool which could extract a sub-matrix from a cooler object. Then you could imagine to generate a genome-wide object and restrict the downstream analysis such as TADs calling ...

@nservant
Copy link
Author

Hi,

An update of this topic.
@nvictus, I tried to add the first/last bins.

It is now working with csort + cload pairix.
However, it's still crashing with cload pairs

current_peek = inbuffer.peek()
TypeError: peek() missing 1 required positional argument: 'n'

Then, using the cool files generated by csort + cload pairix ...

  • balance is OK
  • zoomify crashes: TypeError: cannot convert the series to <class 'float'>
  • diamond-insulation crashes: TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'

N

@aakashsur
Copy link

I've run into the same zoomify issue.

@Phlya
Copy link
Member

Phlya commented Aug 9, 2021

I just wanted to say that I always use whole-genome binning, as if it was whole-genome Hi-C, and never had any issues like that. Just provide a blacklist to balancing, which would make it ignore most of the genome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants