-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cooler files for single-cell Hi-C #186
Comments
I was actually thinking about it too, would be great to have multiple cells in one file! What I was thinking, is to have them all in one pixel table with different names for the interaction count columns, actually, a bit of a different design... But regardless of implementation, just wanted to support the general idea! |
Thanks for supporting this idea @Phlya. The way of storing it as cool matrices within the "mcool" file has the nice effect of a very easy way to parallel the computations. You need to read in the list of stored matrices via However, I am always open for better ideas and implementations. What benefits would it bring to store it in the way you proposed? |
Hi Joachim! This is a great idea! I think it would be great to gather input from single-cell users for a good file layout to store all single cells together, and which could probably optionally include bulk/pooled data as well. As for the file extension convention, I guess I'd put in a vote for As you said, I don't think we can store data @Phlya's suggested way with the current API. However, querying many cells is straightforward, and you could just do that and merge all your pixels into a dataframe labelled by cell ID to construct such a thing on the fly. Btw, how are you labeling the groups referring to the single cells? Are you grouping them by resolution? One idea I've considered in the past is:
Of course, one could also do It would be good to chime in the 4DN DCIC @burakalver and @hbbrandao for additional feedback. |
Hi Nezar, Thanks for your positive feedback. I took the name 'scool' for the single-cell cool file and using it on my webserver and the depending publication. However, as soon as we will have an agreement I will update my tools to whatever format we will have. I am a supporter of the principle 'Keep it simple'; therefore I think Last, the bulk matrix. I am not sure if we should keep it together with the scool file Best, Joachim |
I also like the simple It would be good to include some top-level metadata for introspection and versioning.
If we agree that all single cells must use the same bin segmentation, I would propose including at least:
|
I would argue having a possibility of multiple resolutions is a good thing. Two flavours, like for regular coolers, perhaps? .scool and .smcool? How about thinking about it a bit more general, not just for single-cell data, but just for storing multiple datasets together in one file? Why not keep all samples from an experiment all together, for example? It seems like the schema would be identical to the proposed ideas here, just needs to name and "market" it appropriately. |
Well, if you treat the
Well, that is basically why the cooler "data collection" is defined as a tree that can be rooted anywhere. Having a few standardized layouts is good, but I'm skeptical of finding a general directory hierarchy that everyone agrees with. Maybe some recommended good practices? Otherwise, just making people more aware the introspection tools would be useful ( |
The Concerning to store |
Hi, I have not received any answer to my mail to you, therefore I try it here again. I really want to push this idea forward and have single-cell cool format. However, please just say it if you don't have the time or you are not interested anymore. Best, Joachim |
Joachim, I apologize for the long delay. I sent you an email. |
Added in #201 |
Hi Nezar,
I am using your cool format as you know in HiCExplorer and now new, for single-cell Hi-C analysis in a software called scHiCExplorer.
To use your file format I create one cool file per cell and store all of them in one mcool file. This works quite fine; however, as pointed out by a referee of a publication of mine, it is misleading (And if that referee reads this thread: Thanks for the positive feedback and the comments.). I call the file
mcool
but of course they are not in compliance with your definition of a multiresolution cool file.My question to you is: Is there any interest in coming up with a new file ending from your side for single-cell Hi-C data? The concept would be to simply store the individual cool files in a
single-cell mcool
file and use it as a container. In the future we could discuss to add additional metadata and or bulk matrices. What I have I simply call nowversion 1
, it has only the root folder of the mcool and in this one all individual cool files are stored. In a possibleversion 2
the concept could look like:It would be important to me to have a solution which is in compliance with the developers (aka you :) ) of cooler to prevent a branching of file formats and names.
Possible suggestions I have for a single-cell Hi-C cool format name:
Any other, and possible better name proposal is welcome.
Thanks for having a thought on that.
Best,
Joachim
The text was updated successfully, but these errors were encountered: