Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mrmatrix more flexible #71

Merged
merged 26 commits into from
May 21, 2019
Merged

mrmatrix more flexible #71

merged 26 commits into from
May 21, 2019

Conversation

mccalluc
Copy link
Contributor

@mccalluc mccalluc commented Apr 22, 2019

Fix #63, Fix #64

Description

What was changed in this pull request?

  • Removed the option to take TSV from STDIN: This is difficult, because we need to know the the full size before we start reading out the data: If only the width matters, no problem, but if it could be taller than it is wide, then two passes really are necessary, and I'm not sure how this would be done memory efficiently if we only have STDIN.
  • Use built-in tsv parsing.
  • Add new command-line arguments.
  • Fix all linting problems in these two files, and enforce it in travis
$ python ./scripts/tsv_to_mrmatrix.py --help
usage: tsv_to_mrmatrix.py [-h] [-d D] [-n N] [-s] [-l] input_file output_file

Given a tab-delimited file, produces an HDF5 file with mrmatrix ("multi-
resolution matrix") structure: Under the "resolutions" group are datasets,
named with successive powers of 2, which represent successively higher
aggregations of the input.

positional arguments:
  input_file           TSV file path
  output_file          HDF5 file

optional arguments:
  -h, --help           show this help message and exit
  -d D, --delimiter D  Delimiter; defaults to tab
  -n N, --first-n N    Only read first N columns from first N rows
  -s, --square         Row labels are assumed to match column labels
  -l, --labelled       TSV Matrix has column and row labels

Tests I wrote earlier still pass, but I need to add tests of newer functionality.

Why is it necessary?

The data I'll want to load is taller than it is wide, and has no row or column labels. This tests that case.

Checklist

  • Unit tests added or updated
  • Updated CHANGELOG.md

@mccalluc mccalluc changed the title Mccalluc/built in csv mrmatrix more flexible Apr 22, 2019
scripts/tsv_to_mrmatrix.py Show resolved Hide resolved
test/tsv_to_mrmatrix_test.py Outdated Show resolved Hide resolved
@mccalluc
Copy link
Contributor Author

@pkerpedjiev : Let me resolve the conflicts and get back to you about the other questions...

@mccalluc
Copy link
Contributor Author

The 3d1aca9 pr build failed because it couldn't download... Hopefully just a transient network issue on Travis.

@mccalluc mccalluc merged commit dafc422 into develop May 21, 2019
@mccalluc mccalluc deleted the mccalluc/built-in-csv branch May 21, 2019 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sure rectangular data is not truncated Use built-in csv instead of parsing by hand
2 participants