-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write wideband data with CPHD metadata and add a Python list-of-dicts interface for PVP parameters #351
Conversation
…mplate CPHDWriter::write() for complex<float>
…rdered_map for cphd::Pvp.addedPVP)
* \param cols | ||
* Expected number of cols in widebandArray | ||
*/ | ||
void writeWideband(PVPBlock pvpBlock, PyObject* widebandArray, size_t rows, size_t cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what you were trying to do here, but I don't think it will work in general. I think a cleaner approach would be to circumvent using CPHDWriter::write()
altogether.
The write()
routine expects to see a giant concatenated blob of wideband data for all the channels at once. It would be nice if all the per-channel wideband were the same size. Then you could have a single array that's shaped (num_channels, num_rows, num_cols)
. Getting the underlying array pointer would ensure everything's contiguous in memory.
However, each channel can have its own dimensions. In that case, it seems more natural to have the wideband data expressed as a collection (e.g., list, tuple, etc) of numpy arrays. You can't call the write()
routine directly because there's no guarantee that the individual numpy arrays are contiguous in memory with each other. We could make them contiguous by doing a copy, but that can get very expensive very fast. Instead, we can modify this extended function to do the following:
void writeWideband(PVPBlock pvpBlock, PyObject* widebandArrays)
{
const size_t numChannels = static_cast<size_t>(<Python_API_for_computing_length>(widebandArrays));
ASSERT(numChannels == mMetadata.data.getNumChannels());
writeMetadata(pvpBlock);
if (mMetadata.data.getNumSupportArrays() != 0)
{
if (supportData == nullptr)
{
throw except::Exception(Ctxt("SupportData is not provided"));
}
writeSupportData(supportData);
}
writePVPData(pvpBlock);
for (size_t ii = 0; ii < numChannels; ++ii)
{
const types::RowCol<size_t> dims(
mMetadata.data.getNumVectors(ii),
mMetadata.data.getNumSamples(ii)
);
const types::RowCol<size_t> arrayDims = numpyutils::getDimensionsRC(widebandArrays[ii]);
ASSERT(dims == arraysDims);
auto widebandPtr = numpyutils::getBuffer<std::complex<float>>(widebandArrays[ii]);
writeCPHDData<std::complex<float>>(widebandPtr, numElements, ii);
}
}
Although this duplicates a lot of CPHDWriter::write()
, it leaves the Python level looking clean.
I would suggest also adding the following to simplify writing from Python. Hide the CPHDWriter
object from the Python API:
%pythoncode
%{
def write_cphd(
pathname,
metadata,
pvp_block,
wideband_arrays,
schema_paths=None,
num_threads=None,
scratch_space=None,
):
"""Docstring"""
if schema_paths is None:
schema_paths = [] # Not sure if this is the best default or checking the user's env
if num_threads is None:
num_threads = 0
if scratch_space is None:
scratch_space = 4 * 1024 * 1024
# Support writing a single channel with a single ndarray
if isinstance(wideband_arrays, np.ndarray):
wideband_arrays = [wideband_arrays,]
writer = CPHDWriter(
metadata,
pathname,
schema_paths,
num_threads,
scratch_space,
)
writer.writeWideband(pvp_block, wideband_arrays)
%}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dstarinshak! I updated writeWideband()
and added the write_cphd()
method.
How would you recommend handling support arrays? write_cphd()
currently does not accept any as a parameter (but writeWideband()
does). CPHDWriter
has two writeSupportData()
methods -- one that writes support arrays individually given their string ID and a data pointer (without including any padding from the metadata), and another that writes all the support arrays given a pointer to contiguous support array data. Two ideas:
write_cphd()
could receive a sequence of NumPy support arrays, make a copy so that the data is contiguous, and pass that towriteWideband()
(no changes needed towriteWideband()
)write_cphd()
could receive a Python dictionary mapping string support array IDs to NumPy support array data, andwriteWideband()
could individually write each array by ID (maybe modify thatwriteSupportData()
version to include padding)
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both approaches for writing support arrays sounds good. The first one sounds like less work. It would only involve adding a support_arrays=None kwarg to write_cphd and parsing out what to do with it inside. As long as you document what form supportArrays should take in the docstring, I think we're good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you recommend reading the support arrays back in for test_cphd_write_wideband
? The first method works well for writing, but I haven't tested it since it doesn't look like there are Python bindings yet for reading support arrays.
CPHDReader
loads a SupportBlock
, and we can get the support array metadata (string identifiers and dimensions) from Metadata.data.supportArrayMap
.
SupportBlock
has methods to read support arrays by string ID into a pre-allocated BufferView
or a ScopedArray
-- based on Wideband.read
(which calls (1), (2), (3)), We can cast the __array_interface__
NumPy data pointer from long long
to a sys::ubyte*
to create a BufferView<sys::ubyte>
for SupportBlock::read()
. That compiles and runs, but the data in the NumPy array is not equal to what was written. Do those casts and methods sound correct?
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I lost track of this thread. I suggest simply leaving support arrays out of the bindings for the time being. Is there a clean way to recognize if a CPHD contains support data in the bindings? If so, we could log a warning to the user that "support array parsing it not implemented" and that we're only reading and returning a subset of their CPHD. I would rather not raise a NotImplementedError
since the remainder of the CPHD should be valid for parsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get the number of support arrays with metadata.data.getNumSupportArrays()
, but for the moment there's not a great place to check that in the bindings, since we aren't extending CPHDReader
or SupportBlock
at all.
It's possible to get a SupportBlock
object with CPHDReader::getSupportBlock()
, but then there's no Python interface to get the actual support block data. To write a CPHD, you need to provide the support array data (as well as the SupportBlock
from the metadata), so I don't think it would be possible for someone to load a CPHD file that has support arrays and then e.g. write it to another file without realizing that the support arrays are missing.
Consider running |
…rrays and add write_cphd() Python method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I ran out of time to completely wrap this up. I think this is what remains:
- Fix the
'U1'
and'I1'
PVP params not being correctly converted from the list-of-dicts - Decide what to do about support array reading (might be just fine as-is for now). I pushed my WIP changes to the
wip-support-arrays
branch of my fork in case you want to use those as a starting point. - I flagged some duplicated code for generating the list of PVP params in
from_list_of_dicts()
, you might be able to come up with a cleaner way to do that - Test string and multi-size custom PVP params
- Make any final style fixes
- Update the SWIG-generated files
Thanks again for all your help with this
# Populate PVPBlock object from pvp_data | ||
for channel_index, channel_data in enumerate(pvp_data): | ||
for vector_index in range(metadata.getNumVectors(channel_index)): | ||
for name, data in channel_data.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to check that the list of keys in channel_data
matches the list of PVP params (the required ones, any optional ones, and any added ones)
pvp_info = {} | ||
|
||
# Set required and optional PVP parameters | ||
for name in dir(metadata.pvp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bit of code that's duplicated here from above. Here, we need to get the list of PVPBlock
setter methods for all default PVP params (required ones plus any optional parameters that are in use) and ideally build a dict that maps PVP param names to setter methods
cphd::CPHDWriter::writeWideband()
to write a NumPy array ofcomplex64
wideband data along with CPHDMetadata
andPVPBlock
objectscphd::PVPBlock
to convert to/from a list of Python dictionaries of PVP data (a list of channels, each with a dict of PVP parameter names mapping to NumPy arrays)cphd::PVPBlock
usedstd::map
to store added PVP parameters instead ofstd::unordered_map
. This was needed to support older versions of SWIG.util.py
. To installutil.py
along with the test folder, usewaf install
(not justwaf install --target=test_.....py
)Issues
cphd::CPHDWriter::writeWideband()
doesn't support compressed data (and there is currently nothing preventing someone from using it to write compressed data and failing)cphd::CPHDWriter::writeWideband()
callscphd::CPHDWriter::write()
, which was previously%ignore
d by SWIG, so the PythonCPHDWriter
now has both awrite()
andwriteWideband()
method, and it may be unclear which one to use.PVPBlock
list-of-dicts with custom PVP parameters and/or multiple components is not fully supported:float
orcomplex
PVP parameters requires usingastype('float32')
andastype('complex64')
, otherwise the values are slightly inconsistent when reading to/from a list-of-dictsX='F8';Y='I8';
) isn't supported, parameters with multiple components can only have one type (this is only an issue for custom parameters, none of the default parameters currently require this).'S8'
but given'123456789'
will store'12345678'
)