Write wideband data with CPHD metadata and add a Python list-of-dicts interface for PVP parameters #351

elliot-klein · 2020-09-21T23:14:28Z

Add cphd::CPHDWriter::writeWideband() to write a NumPy array of complex64 wideband data along with CPHD Metadata and PVPBlock objects
Add methods to cphd::PVPBlock to convert to/from a list of Python dictionaries of PVP data (a list of channels, each with a dict of PVP parameter names mapping to NumPy arrays)
Make cphd::PVPBlock used std::map to store added PVP parameters instead of std::unordered_map. This was needed to support older versions of SWIG.
Add Python tests for these features and move some common test code into util.py. To install util.py along with the test folder, use waf install (not just waf install --target=test_.....py)

Issues

cphd::CPHDWriter::writeWideband() doesn't support compressed data (and there is currently nothing preventing someone from using it to write compressed data and failing)
cphd::CPHDWriter::writeWideband() calls cphd::CPHDWriter::write(), which was previously %ignored by SWIG, so the Python CPHDWriter now has both a write() and writeWideband() method, and it may be unclear which one to use.
Using PVPBlock list-of-dicts with custom PVP parameters and/or multiple components is not fully supported:
- Using custom float or complex PVP parameters requires using astype('float32') and astype('complex64'), otherwise the values are slightly inconsistent when reading to/from a list-of-dicts
- Different data types in a single parameter (e.g. X='F8';Y='I8';) isn't supported, parameters with multiple components can only have one type (this is only an issue for custom parameters, none of the default parameters currently require this).
- Custom PVP parameters with multiple components aren't supported
- Default PVP parameters that are not of size 1 or 3 are not supported (not currently an issue since there are none)
- If using a string PVP parameter that is longer than the specified size, NumPy will silently truncate it (e.g. a parameter specified as 'S8' but given '123456789' will store '12345678')

…ple channels

…mplate CPHDWriter::write() for complex<float>

…rdered_map for cphd::Pvp.addedPVP)

…t yet

…lity)

…nd()

dstarinshak · 2021-01-04T18:04:04Z

six/modules/python/cphd/source/cphd.i

+ * \param cols
+ *        Expected number of cols in widebandArray
+ */
+void writeWideband(PVPBlock pvpBlock, PyObject* widebandArray, size_t rows, size_t cols)


I think I see what you were trying to do here, but I don't think it will work in general. I think a cleaner approach would be to circumvent using CPHDWriter::write() altogether.

The write() routine expects to see a giant concatenated blob of wideband data for all the channels at once. It would be nice if all the per-channel wideband were the same size. Then you could have a single array that's shaped (num_channels, num_rows, num_cols). Getting the underlying array pointer would ensure everything's contiguous in memory.

However, each channel can have its own dimensions. In that case, it seems more natural to have the wideband data expressed as a collection (e.g., list, tuple, etc) of numpy arrays. You can't call the write() routine directly because there's no guarantee that the individual numpy arrays are contiguous in memory with each other. We could make them contiguous by doing a copy, but that can get very expensive very fast. Instead, we can modify this extended function to do the following:

void writeWideband(PVPBlock pvpBlock, PyObject* widebandArrays) { const size_t numChannels = static_cast<size_t>(<Python_API_for_computing_length>(widebandArrays)); ASSERT(numChannels == mMetadata.data.getNumChannels()); writeMetadata(pvpBlock); if (mMetadata.data.getNumSupportArrays() != 0) { if (supportData == nullptr) { throw except::Exception(Ctxt("SupportData is not provided")); } writeSupportData(supportData); } writePVPData(pvpBlock); for (size_t ii = 0; ii < numChannels; ++ii) { const types::RowCol<size_t> dims( mMetadata.data.getNumVectors(ii), mMetadata.data.getNumSamples(ii) ); const types::RowCol<size_t> arrayDims = numpyutils::getDimensionsRC(widebandArrays[ii]); ASSERT(dims == arraysDims); auto widebandPtr = numpyutils::getBuffer<std::complex<float>>(widebandArrays[ii]); writeCPHDData<std::complex<float>>(widebandPtr, numElements, ii); } }

Although this duplicates a lot of CPHDWriter::write(), it leaves the Python level looking clean.

I would suggest also adding the following to simplify writing from Python. Hide the CPHDWriter object from the Python API:

%pythoncode %{ def write_cphd( pathname, metadata, pvp_block, wideband_arrays, schema_paths=None, num_threads=None, scratch_space=None, ): """Docstring""" if schema_paths is None: schema_paths = [] # Not sure if this is the best default or checking the user's env if num_threads is None: num_threads = 0 if scratch_space is None: scratch_space = 4 * 1024 * 1024 # Support writing a single channel with a single ndarray if isinstance(wideband_arrays, np.ndarray): wideband_arrays = [wideband_arrays,] writer = CPHDWriter( metadata, pathname, schema_paths, num_threads, scratch_space, ) writer.writeWideband(pvp_block, wideband_arrays) %}

Thanks @dstarinshak! I updated writeWideband() and added the write_cphd() method.

How would you recommend handling support arrays? write_cphd() currently does not accept any as a parameter (but writeWideband() does). CPHDWriter has two writeSupportData() methods -- one that writes support arrays individually given their string ID and a data pointer (without including any padding from the metadata), and another that writes all the support arrays given a pointer to contiguous support array data. Two ideas:

write_cphd() could receive a sequence of NumPy support arrays, make a copy so that the data is contiguous, and pass that to writeWideband() (no changes needed to writeWideband())

write_cphd() could receive a Python dictionary mapping string support array IDs to NumPy support array data, and writeWideband() could individually write each array by ID (maybe modify that writeSupportData() version to include padding)

Thank you!

I think both approaches for writing support arrays sounds good. The first one sounds like less work. It would only involve adding a support_arrays=None kwarg to write_cphd and parsing out what to do with it inside. As long as you document what form supportArrays should take in the docstring, I think we're good.

How would you recommend reading the support arrays back in for test_cphd_write_wideband? The first method works well for writing, but I haven't tested it since it doesn't look like there are Python bindings yet for reading support arrays.

CPHDReader loads a SupportBlock, and we can get the support array metadata (string identifiers and dimensions) from Metadata.data.supportArrayMap.

SupportBlock has methods to read support arrays by string ID into a pre-allocated BufferView or a ScopedArray-- based on Wideband.read (which calls (1), (2), (3)), We can cast the __array_interface__ NumPy data pointer from long long to a sys::ubyte* to create a BufferView<sys::ubyte> for SupportBlock::read(). That compiles and runs, but the data in the NumPy array is not equal to what was written. Do those casts and methods sound correct?

Thank you!

Sorry, I lost track of this thread. I suggest simply leaving support arrays out of the bindings for the time being. Is there a clean way to recognize if a CPHD contains support data in the bindings? If so, we could log a warning to the user that "support array parsing it not implemented" and that we're only reading and returning a subset of their CPHD. I would rather not raise a NotImplementedError since the remainder of the CPHD should be valid for parsing

We can get the number of support arrays with metadata.data.getNumSupportArrays(), but for the moment there's not a great place to check that in the bindings, since we aren't extending CPHDReader or SupportBlock at all.

It's possible to get a SupportBlock object with CPHDReader::getSupportBlock(), but then there's no Python interface to get the actual support block data. To write a CPHD, you need to provide the support array data (as well as the SupportBlock from the metadata), so I don't think it would be possible for someone to load a CPHD file that has support arrays and then e.g. write it to another file without realizing that the support arrays are missing.

dstarinshak · 2021-01-04T18:40:44Z

Consider running pylint on any (non-SWIG-generated) Python files to check style

…rrays and add write_cphd() Python method

elliot-klein

Sorry, I ran out of time to completely wrap this up. I think this is what remains:

Fix the 'U1' and 'I1' PVP params not being correctly converted from the list-of-dicts
Decide what to do about support array reading (might be just fine as-is for now). I pushed my WIP changes to the wip-support-arrays branch of my fork in case you want to use those as a starting point.
I flagged some duplicated code for generating the list of PVP params in from_list_of_dicts(), you might be able to come up with a cleaner way to do that
Test string and multi-size custom PVP params
Make any final style fixes
Update the SWIG-generated files

Thanks again for all your help with this

elliot-klein · 2021-01-15T23:10:36Z

six/modules/python/cphd/source/cphd.i

+        # Populate PVPBlock object from pvp_data
+        for channel_index, channel_data in enumerate(pvp_data):
+            for vector_index in range(metadata.getNumVectors(channel_index)):
+                for name, data in channel_data.items():


It might be good to check that the list of keys in channel_data matches the list of PVP params (the required ones, any optional ones, and any added ones)

elliot-klein · 2021-01-15T23:12:57Z

six/modules/python/cphd/source/cphd.i

+        pvp_info = {}
+
+        # Set required and optional PVP parameters
+        for name in dir(metadata.pvp):


There's a bit of code that's duplicated here from above. Here, we need to get the list of PVPBlock setter methods for all default PVP params (required ones plus any optional parameters that are in use) and ideally build a dict that maps PVP param names to setter methods

elliot-klein added 23 commits September 3, 2020 10:45

Increment wideband data pointer in CPHDWriter::write() to write multi…

f423b50

…ple channels

Add CPHDWriter.writeWidebandImpl() for writing NumPy widebands and te…

dd1309e

…mplate CPHDWriter::write() for complex<float>

Add CPHDWriter.writeWideband() for writing NumPy wideband arrays

092bd4c

Re-generate CPHD Python SWIG bindings

5ab4cde

Add PVPBlock.toListOfDicts()

4657c90

Fix NumPy import and add more comments

3063ec8

Add support for custom PVP params (using std::map instead of std::uno…

a426d7d

…rdered_map for cphd::Pvp.addedPVP)

Clean up / comments

70a08dc

More work on custom multiple PVP parameters, no actual Vector3 suppor…

d7c4965

…t yet

Add PVPBlock.populateFromListOfDicts()

ae55a39

Use map instead of unordered_map for Pvp.addedPVP (for SWIG compatibi…

7bdf7f0

…lity)

Remove Python method to stack widebands before CPHDWriter.writeWideba…

1149b24

…nd()

Make PVPBlock.fromListOfDicts static

0cf7b6b

Check parameter data sizes in PVPBlock.fromListOfDicts

a8a09b2

Move test metadata generation into util.py

2f5fcff

Add test_cphd_write_wideband and test_pvp_list_of_dicts

32ca21a

Add test_cphd_write_wideband

6f0ec01

Merge branch 'master' of https://github.com/ngageoint/six-library

ae9040e

Update generated files

11560c5

Fix test_cphd_write_wideband

14bc776

Fix test_make_cphd, support arrays were not in the same order

86c98eb

Fix test_pvp_list_of_dicts, had some dtype issues

6e130fe

Regenerate SWIG Python bindings for 'cphd'

4e925ac

dstarinshak self-requested a review September 22, 2020 11:59

Fix unimported module in CPHD tests

b99ab87

dstarinshak reviewed Jan 4, 2021

View reviewed changes

elliot-klein added 3 commits January 4, 2021 20:55

Make CPHDWriter::writeWideband() support differently-sized wideband a…

bc630c3

…rrays and add write_cphd() Python method

Style fixes

f3281ae

Update generated files

d6cf25f

elliot-klein added 9 commits January 12, 2021 19:19

Use int64_t for 'signal' PVP parameter

f0d0dba

Use types with the same size as added PVP parameters

3227869

Use binary blob to write PVP data in Python (not 100% complete yet)

8498efa

Style fixes

834016a

Use snake case for Python code

36c2ad0

Check metadata and PVP block in test_cphd_write_wideband

5372989

Template setAddedPVP methods with correctly-sized C++ types

1c1e9ac

Raise NotImplementedError for complex integer PVP parameters

fe49642

Clean up code

5e0ffa8

elliot-klein commented Jan 15, 2021

View reviewed changes

dave-starinshak mentioned this pull request Jun 6, 2022

Feature/multi channel writer and type fixes #559

Merged

JDanielSmith mentioned this pull request Jun 29, 2022

SIX-3.1.12 #570

Merged

JDanielSmith closed this Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write wideband data with CPHD metadata and add a Python list-of-dicts interface for PVP parameters #351

Write wideband data with CPHD metadata and add a Python list-of-dicts interface for PVP parameters #351

elliot-klein commented Sep 21, 2020 •

edited

Loading

dstarinshak Jan 4, 2021

elliot-klein Jan 5, 2021

dstarinshak Jan 6, 2021

elliot-klein Jan 7, 2021 •

edited

Loading

dstarinshak Jan 15, 2021

elliot-klein Jan 15, 2021

dstarinshak commented Jan 4, 2021

elliot-klein left a comment

elliot-klein Jan 15, 2021

elliot-klein Jan 15, 2021

Write wideband data with CPHD metadata and add a Python list-of-dicts interface for PVP parameters #351

Write wideband data with CPHD metadata and add a Python list-of-dicts interface for PVP parameters #351

Conversation

elliot-klein commented Sep 21, 2020 • edited Loading

Issues

dstarinshak Jan 4, 2021

Choose a reason for hiding this comment

elliot-klein Jan 5, 2021

Choose a reason for hiding this comment

dstarinshak Jan 6, 2021

Choose a reason for hiding this comment

elliot-klein Jan 7, 2021 • edited Loading

Choose a reason for hiding this comment

dstarinshak Jan 15, 2021

Choose a reason for hiding this comment

elliot-klein Jan 15, 2021

Choose a reason for hiding this comment

dstarinshak commented Jan 4, 2021

elliot-klein left a comment

Choose a reason for hiding this comment

elliot-klein Jan 15, 2021

Choose a reason for hiding this comment

elliot-klein Jan 15, 2021

Choose a reason for hiding this comment

elliot-klein commented Sep 21, 2020 •

edited

Loading

elliot-klein Jan 7, 2021 •

edited

Loading