Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General opener, using file info. #198

Merged
merged 4 commits into from
May 6, 2018
Merged

General opener, using file info. #198

merged 4 commits into from
May 6, 2018

Conversation

mhvk
Copy link
Owner

@mhvk mhvk commented Apr 4, 2018

My fun project for the evening. It does somewhat work, though clearly at least the sample rate should be checked:

In [1]: from baseband.data import SAMPLE_DADA, SAMPLE_VDIF, SAMPLE_MARK4, SAMPLE_MARK5B; import baseband

In [2]: baseband.file_info(SAMPLE_DADA)
Out[2]: 
{'fmt': 'dada',
 'sample_shape': SampleShape(npol=2, nchan=1),
 'start_time': <Time object: scale='utc' format='mjd' value=56475.06898148148>}

In [3]: baseband.open(SAMPLE_MARK4)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-e1748998f20e> in <module>()
----> 1 baseband.open(SAMPLE_MARK4)

/home/mhvk/packages/baseband/baseband/core.py in open(name, mode, fmt, **kwargs)
     38         if 'missing' in info and 's' in mode:
     39             raise ValueError("file format is {}, but required arguments {} "
---> 40                              "are missing.".format(fmt, info['missing']))
     41 
     42     module = importlib.import_module('.' + fmt, package='baseband')

ValueError: file format is mark4, but required arguments ['decade', 'ref_time'] are missing.

In [4]: baseband.open(SAMPLE_MARK4, decade=2010)
Out[4]: 
<Mark4StreamReader name=/home/mhvk/packages/baseband/baseband/data/sample.m4 offset=0
    sample_rate=32.0 MHz, samples_per_frame=80000,
    sample_shape=SampleShape(nchan=8), bps=2,
    start_time=2014-06-16T07:38:12.47500>

@mhvk
Copy link
Owner Author

mhvk commented Apr 4, 2018

I separated out file_info since it seemed obvious that would be the easier one to start with.

@mhvk mhvk force-pushed the baseband-open branch 2 times, most recently from 164eb7a to d753414 Compare April 16, 2018 00:30
@mhvk
Copy link
Owner Author

mhvk commented Apr 16, 2018

Rebased this. The opener is now such a minor contribution that one might as well look at this one... (especially as I changed the file_info function...).

@mhvk mhvk changed the title WIP: general opener, using file info. General opener, using file info. Apr 20, 2018
@mhvk
Copy link
Owner Author

mhvk commented Apr 20, 2018

I think this is getting ready for prime time!

@mhvk
Copy link
Owner Author

mhvk commented Apr 24, 2018

OK, with some further cleanup, I think this is ready. But probably good to have a thorough review, since this new baseband.open function might well become the standard way files are opened (which is partially why I now check consistency for any argument passed in).

@mhvk mhvk requested a review from cczhu April 24, 2018 16:21
Copy link
Contributor

@cczhu cczhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highly incomplete review - will add more tonight.

baseband/core.py Outdated
"""
if format is None:
format = tuple(FILE_FORMATS.keys())
All keyword arguments passed in and are classified, ending up in one of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"passed in and are classified" -> "passed in are classified"

elif key == 'nchan':
sample_shape = info_dict.get('sample_shape')
if sample_shape is not None:
# If we passed nchan, and info doesn't have it, but does have a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm happy with nchan being treated like the total size of the sample shape - couldn't the user, for a two channel, two thread file pass nthread = 2, nchan = 2? In that case, this check would return inconsistent.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one could allow if if either sample_shape.nchan == nchan or sample_shape.nchan == 1 and the product equals nchan. I guess for a consistency check one does not have to be all that strict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on second thought I think what's below is fine (I was pretty sleep-deprived when looking at this yesterday).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still changed it to be a bit more generous: sample_shape.nchan == nchan should be OK too.

Copy link
Contributor

@cczhu cczhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, and I found no problems reading some sample files of various formats. The only non-nitpick I have is I'd like it to be recorded somewhere (maybe in Getting Started?) that only GSB timestamp files are recognized by the file info and general opener routines.

elif key == 'nchan':
sample_shape = info_dict.get('sample_shape')
if sample_shape is not None:
# If we passed nchan, and info doesn't have it, but does have a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on second thought I think what's below is fine (I was pretty sleep-deprived when looking at this yesterday).



@pytest.mark.parametrize('sample', (SAMPLE_M4, SAMPLE_M5B))
def test_open_missing_args(sample):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

def test_open_missing_args(sample):
    with pytest.raises(TypeError) as excinfo:
        baseband_open(sample, 'rs')
    assert "missing required arguments" in str(excinfo.value)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

# reader, but then fail on an incorrect sample rate.
mark4_args = {'nchan': 8,
'ref_time': Time('2014-01-01')}
with pytest.raises(ValueError): # wrong sample_rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, would be nice to also test that "arguments inconsistent" or "got unexpected keyword" show up in the error messages. I don't think it's essential to check these (or the above), but I feel it's more precise and also helps a bit when reading the tests at a later date.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that I felt the need to add comments to the tests, this is a good point. I made the checks somewhat minimal, though, as I don't like to be stuck with particular error messages.

- Added a general file opener, ``baseband.open`` which for a set of formats
will check whether the file is of that format, and then load it using the
corresponding module. [#198]

API Changes
-----------

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "Other Changes and Additions" should mention that sample data files now listed in the documentation.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.. _data:

*****************
Sample data files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Sample data files" -> "Sample Data Files"

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're not consistent with the capitalization (top level index.rst doesn't do it), but changed anyway.

@@ -51,15 +51,17 @@ troubleshooting help and APIs for each.
Core framework and utilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to do with this PR, but seems silly not to fix considering the table of contents is being modified anyway: in the Overview section "authors_for_sphinx" is still listed, which repeats the authors and contributors listed under Project Details.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

This tutorial covers the basic features of Baseband. It assumes that
`NumPy <http://www.numpy.org/>`_ and the `Astropy`_ units module have been
imported::
For some file formats, one can simply import baseband and use `baseband.open` to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

basics of :ref:`inspecting files <getting_started_inspecting>`, :ref:`reading
<getting_started_reading>` from and :ref:`writing <getting_started_writing>`
to files, and :ref:`converting <getting_started_converting>` from one format
to another. We assume that baseband as well as `NumPy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: "baseband" should be capitalized for consistency (though I admit that capitalizing at all is a personal preference).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we do it for astropy and numpy in this context, let's do it here too. I have not capitalized it in "baseband module", though...

formats. When opening Mark 4 and Mark 5B files, however, some additional
arguments may need to be passed (as was the case above for inspecting a Mark
5B file). Notes on such features and quirks of individual formats can be
found in the API entries of their ``open`` functions, and within the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also note that file_info can also return missing arguments that are necessary.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

As shown at the very start, files can be opened with the general
`baseband.open` function. This will try to determine the file type using
`~baseband.file_info`, load the corresponding baseband module, and then open
the file using that modele's master input/output function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"modele" -> "module"

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

mhvk added 2 commits May 6, 2018 12:40
In particular, decade, kday, and ref_time should be consistent with
the file being opened, and nchan with the sample_shape.
Also make sure ``data`` is documented.
@mhvk mhvk merged commit 77cdeee into master May 6, 2018
@mhvk mhvk deleted the baseband-open branch May 6, 2018 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants