-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segment_matrix improvements (#30) #31
Conversation
Is this ready for review? If you care about JSON size, I have a branch where I changed all the lists to dictionaries or sets which is a radically smaller file size https://github.com/graph-genome/component_segmentation/tree/experimental_v6_sparse_matrix. I ran into issues with this format particularly on the Schematize side. Now I think it wouldn't be worth it with all the other JSON format changes. Something to keep in mind though. If we can precompute things to make the browser faster that's good. However, if the large file size makes the file load slow in the browser, that's counter-productive. I'll leave it to your good judgement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python code here is really clean. I can't find anything to gripe about at all. If I understand correctly, you're saying the order the link columns are listed in the JSON is not deterministic. Example of them switching:
Since you've now seen the deep insides, I'd appreciate it if you read Minutiae: Link Column Ordering. This describes the specification, rather than the current reality. Short version: link column should stack from the inside out on a component as they're traversed, there are likely cases where no consistent sort is possible across all individuals.
I see no reason not to merge this in. Really, well done.
.
Question: since we have data checked into the repo, is it going to generate a diff every time the same command is run? |
No, it won't. The order now corresponds to the traversal of the sorted dataframe, there are no random choices involved. |
Note: there are a few changes in the output JSON files (only one dataset made it into the commit since that's what I test on). This is because order of arrivals and departures arrays' elements is not fixed. For testing against the original results I run
jq '.components[].arrivals |= sort_by(.upstream, .downstream) | '.components[].departures |= sort_by(.upstream, .downstream)'
on each chunk.