Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Scool format support #201

Merged
merged 19 commits into from
Jul 14, 2020
Merged

[WIP] Scool format support #201

merged 19 commits into from
Jul 14, 2020

Conversation

joachimwolff
Copy link
Collaborator

Hi Nezar,

We discussed a few weeks back the scool format. I propose here a very first implementation, a function named create_scool added to _create.py.

The idea is to pass the name of the file, the bins, and a list of a) pixels and b) cell names. All other parameters (except mode) are equal to the create function and are passed through. As discussed, the bins for each cell are hard links, and much of the code for create_scool I copied from the create function where I thought it is necessary.

Furthermore, I had to extend the create function with two parameters append_scool=False and scool_root_uri=None. The first is responsible that the bins are not created in a new way, but get a hard link to the bins group in the root. The second one is the name of the root file.

I also added a very small test case to test_create.py test_create_scool at the bottom of the document.

What is missing is the documentation within the src and for readthedocs, however, I first want to make sure the function is working as expected and follows our formal definition of the scool format. I hope you can give me a review of the propose function.

Best,

Joachim

@joachimwolff joachimwolff changed the title Scool format support [WIP] Scool format support Jun 3, 2020
cooler/create/_create.py Show resolved Hide resolved
cooler/create/_create.py Outdated Show resolved Hide resolved
cooler/create/_create.py Show resolved Hide resolved
cooler/create/_create.py Show resolved Hide resolved
cooler/create/_create.py Show resolved Hide resolved
@joachimwolff
Copy link
Collaborator Author

Hi Nezar,

thanks for the good and constructive telco yesterday. As discussed, I changed the following:

  • pixels per dictionary instead of two lists
  • bins is a dictionary too instead of just one pandas dataframe. The first element of this dict is used to create the root bins group, and just chrom, start and end column are used for it. The reason I changed this was that I think it is the easiest for the users to simple pass a list of bins dataframes with all its columns, instead of removing chrom, start, end for all manually. We have some overhead with this but I think it should manageable.
  • the hard links are not set to the group bins anymore, but to the datasets bins/chrom, bins/start and bins/end. A function to add the individual additional columns per cell is added.
  • extended the test case to cover the new interface and implementation
  • added a MAGIC_SCOOL string and scool_format_version
  • extended the function _is_cooler to check and accept for scool format
  • extended the documentation with the format description of scool format

Best,

Joachim

tests/test_create.py Outdated Show resolved Hide resolved
@nvictus
Copy link
Member

nvictus commented Jul 9, 2020

Thank you for all these updates!

the hard links are not set to the group bins anymore, but to the datasets bins/chrom, bins/start and bins/end. A function to add the individual additional columns per cell is added.

This is great!

extended the function _is_cooler to check and accept for scool format

Hmm. So right now is_cooler is supposed to return True for URIs that can be opened with the Cooler class. For example, it doesn't consider the mcool format, which is checked via is_multires_file. I was thinking scool could have a similar is_singlecell_file or something. My reason to suggest removing it from _is_cooler for now is because it breaks that existing assumption.

We could extend is_cooler with a keyword arg that makes it also check for "higher-order" coolers too, but we can also save that for another PR.

Finally, I added one minor comment about using the temp filesystem context manager in the tests, just to keep things consistent.

Other than that, I think this is good to go!

@joachimwolff
Copy link
Collaborator Author

Hi Nezar,

I added the functions is_scool_file and list_scool_cells, restoring the old content of _is_cooler and remove there the URL in the if condition.
Moreover, I changed the test case for create_scool as requested and added a test case for the two new functions in fileops.

Best,

Joachim

@nvictus nvictus merged commit 16afcb8 into open2c:develop Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants