what's the best way to apply a channel slice? #109

o-smirnov · 2020-04-24T13:49:25Z

Been asking elsewhere, but it occurs to me it's a general enough question to be usefully asked here....

So, what's the best way to read a subset of channels? I can apply a slice to the array objects of course, but is there a way to do this up front? Or a better way?

sjperkins · 2020-04-24T13:56:44Z

To do this with maximal efficiency, you' have to use a pre-process to figure out the optimal chunking strategy for the channel dimensions

# Initial dataset partition on FIELD_ID and DATA_DESC_ID
ddids = [ds.DATA_DESC_ID for ds in xds_from_ms("3C286.ms")]
# Read in very small DATA_DESCRIPTION table into memory
ddid = xds_from_table("3C286.ms::DATA_DESCRIPTION").compute()
# Create a dataset per row from SPECTRAL_WINDOW
spws = xds_from_table("3C286.ms::SPECTRAL_WINDOW", group_cols="__row__")
# Number of channels for each dataset
nchan = [spws[ddid[i].SPECTRAL_WINDOW_ID[0]].CHAN_FREQ.shape[0] for i in ddids]
# Channel chunking schema for each dataset
chan_chunks = [(chanslice.start - 0, chanslice.end - chanslice.start, nc - chanslice.end)
                          for nc in nchan]

# Chunking schema for each dataset
chunks = [{'row': 100000, 'chan': cc} for cc in chan_chunks]

# Re-open exact same datasets with a different chunking strategy
datasets = xds_from_ms("3C286.ms", chunks=chunks)

# This should slice the channel selection optimally without dask block overlap
datasets[0].DATA[:, chanslice, :]

I typed this out without running it, but it should illustrate the idea.

The above is clunky, I'm thinking about general improvements for the process in #86

Mulan-94 · 2020-04-24T15:27:11Z

@sjperkins , I'm unable to follow properly. I usually just do dataset[0].sel(chan=slice(start, end)) . And the chunking for the channel seems to be reconstructed automatically by dask.
Is this the wrong way :( ?

sjperkins · 2020-04-24T15:33:55Z

@sjperkins , I'm unable to follow properly. I usually just do dataset[0].sel(chan=slice(start, end)) . And the chunking for the channel seems to be reconstructed automatically by dask.
Is this the wrong way :( ?

It's not wrong, but that way will end up reading more data from the MS than necessary. What the example is trying to demonstrate is that to prevent this behaviour the dask chunking must be correctly set up in xds_from_ms(..., chunks={...}).

I agree that it's not necessarily easy to follow, I simply haven't found a reasonable way of making this easy yet. #86 should probably form the nucleus for dealing with this.

sjperkins · 2020-04-24T15:53:16Z

Lets look at this with a concrete example

nchan = 64
chanslice = slice(37, 56)
chan_chunks = (37, 19, 8) = (37 - 0, 56 - 37, 64 - 56)

datasets = xds_from_ms(..., chunks={..., 'chan': chan_chunks})
ds.DATA.data[:, chanslice, :]

that will create DATA dask arrays with channel chunks of (37, 19, 8). Each channel chunk will read that channel chunk from the MS exactly using getcolslice.

If by contrast, we don't ask for channel chunking:

datasets = xds_from_ms(...)
ds.DATA.data[:, chanslice, :]

there'll be a single channel chunk for the dask DATA arrays containing the full 64 channels. So there'll be a getcol operation behind that reads all the channel data followed by a slice that returns the channel subset.

sjperkins · 2020-04-24T15:54:20Z

I also want to say its good that people are pointing this out as a conceptually difficult thing, because I do want to make things easier and reduce the cognitive overhead.

o-smirnov · 2020-04-24T16:03:53Z

As an aging professor, I can only applaud the sentiment. My cognitive is all overhead!

Mulan-94 · 2020-04-24T17:08:43Z

that will create DATA dask arrays with channel chunks of (37, 19, 8). Each channel chunk will read that channel chunk from the MS exactly using getcolslice.

This makes it clearer, thanks :) !

o-smirnov assigned o-smirnov and sjperkins and unassigned o-smirnov Apr 24, 2020

sjperkins added the question label Apr 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what's the best way to apply a channel slice? #109

what's the best way to apply a channel slice? #109

o-smirnov commented Apr 24, 2020

sjperkins commented Apr 24, 2020

Mulan-94 commented Apr 24, 2020

sjperkins commented Apr 24, 2020

sjperkins commented Apr 24, 2020

sjperkins commented Apr 24, 2020

o-smirnov commented Apr 24, 2020

Mulan-94 commented Apr 24, 2020

what's the best way to apply a channel slice? #109

what's the best way to apply a channel slice? #109

Comments

o-smirnov commented Apr 24, 2020

sjperkins commented Apr 24, 2020

Mulan-94 commented Apr 24, 2020

sjperkins commented Apr 24, 2020

sjperkins commented Apr 24, 2020

sjperkins commented Apr 24, 2020

o-smirnov commented Apr 24, 2020

Mulan-94 commented Apr 24, 2020