Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 176 #194

Merged
merged 3 commits into from Apr 6, 2022
Merged

Issue 176 #194

merged 3 commits into from Apr 6, 2022

Conversation

JSKenyon
Copy link
Collaborator

  • Tests added / passed

    $ py.test -v -s daskms/tests

    If the pep8 tests fail, the quickest way to correct
    this is to run autopep8 and then flake8 and
    pycodestyle to fix the remaining issues.

    $ pip install -U autopep8 flake8 pycodestyle
    $ autopep8 -r -i daskms
    $ flake8 daskms
    $ pycodestyle daskms
    
  • Fully documented, including HISTORY.rst for all changes
    and one of the docs/*-api.rst files for new API

    To build the docs locally:

    pip install -r requirements.readthedocs.txt
    cd docs
    READTHEDOCS=True make html
    

@JSKenyon
Copy link
Collaborator Author

This PR adds support for index_columns during conversion from measurement set to parquet/zarr. This means that output data can be written with a different ordering than input data. This is not necessarily efficient as it will use TAQL and fragmented reads to accomplish the reordering.

In addition to exposing this feature, this PR fixes incorrect conversion of boolean values when using parquet. This is because pyarrow represents boolean values as bits whereas python represents them as bytes. The fix in this PR works but is a prelude to a slight rewrite of the arrow extensions. Currently, extensions use the from_buffer to convert from arrow to numpy. This relies on memory layout (hence the aforementioned bug) and may be unnecessary. A future PR will instead use array.storage.flatten in conjunction with to_numpy and reshape to move between representations. This should be less vulnerable to discrepancies in memory layout.

@JSKenyon
Copy link
Collaborator Author

JSKenyon commented Apr 6, 2022

I changed my mind - I imagine that the use of buffers if future-proofing against ragged data. I figured out how to use buffers with booleans and have implemented a fix. Note that this is still not a zero-copy operation in the case of booleans as we have to move from bit to byte representation.

@JSKenyon JSKenyon merged commit e1f024c into master Apr 6, 2022
@JSKenyon JSKenyon deleted the issue-176 branch April 6, 2022 08:39
JSKenyon added a commit that referenced this pull request Apr 6, 2022
This reverts commit e1f024c.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant