Skip to content
Gijs Molenaar edited this page Feb 13, 2014 · 5 revisions

Working with MSs

The first thing to know is that the MeqTimba kernel itself does not deal directly with AIPS++ MSs. Instead, the kernel deals with visibility streams, which consist of:

  • a VisHeader record, describing the data layout, and carrying auxiliary information (phase centers, antenna positions, etc.) normally found in MS subtables.
  • a number of VisTiles. Each VisTile is an N timeslots by M channels by K correlations chunk of data and flags for one interferometer. All data is tiled in the time direction only.
  • a VisFooter record, indicating end of data. The source of an input stream or the destination of the output stream may be a real MS, a flat file, or another process (possibly across the network). The kernel itself does not know or care about this. The mapping between visibility streams and and actual data source is done by objects called event channels.

Columns And Columns

VisTiles contain columns of data (generally, one or more of DATA, PREDICT and RESIDUALS for correlations, FLAGS for flags, etc.) These columns are distinct from AIPS++ MS columns, so I will try to refer to them as tile columns and MS columns to avoid confusion.

When reading/writing an MS, the system in effect does double mapping on each side of the kernel: input mapping :
MS columns --[via input channel]--> tile columns --[via MeqSpigot]--> VellSets

output mapping :
VellSets --[via MeqSink]--> tile columns --[via output channel]--> MS columns

The inner mapping, i.e. between VisTiles and VellSets, is determined by the state records of MeqSink and MeqSpigot nodes. These can be configured to work with any column of the tile (DATA, PREDICT, RESIDUALS), and to treat correlations and flags in various interesting ways. See MeqSpigot for details. This page only deals with the outer mapping of VisTiles to/from MSs.

Event Channels

The MeqServer process maintains two objects called an input channel and an output channel. These objects control where the MeqSpigot and MeqSink nodes get and put their data. The sinks are initialized with an input record and an output record as follows:

  # Glish example (Python is similar)
  mqs.init([output_col="PREDICT"],input=[...],output=[...]);

Each time an init() call is made with an input argument, the input channel is reinitialized with the given record, and begins shooting out a visibility stream -- provided, that is, the record is correct. If no input argument is supplied, the output channel is reconfigured if so specified, but no data is read. The usual mode of operation is to supply both input and output records, and watch the sparks fly.

NB: the output_col field in the first record is a temporary Ugly Kludge(TM) that tells the server what tile columns to initialize in the output. If MeqSinks are told to write to tile column(s) that does not exist in the input tiles, the column(s) must be specified here as a string or a vector of strings containing "DATA", "PREDICT" or "RESIDUALS".

Note that the most recent values of the input and output records are stored in forest state, as the stream sub-record.

Configuring Input Channels

There are three types of input channels, ms_in for reading measurement sets, boio for reading flat files (from the DMI BOIO class -- Block Object Input/Output), and octopussy for publish/subscribe streams. The channel type is selected via the sink_type field of the input record.

MS Input Channels

A MS input channel record has the following general structure:

  inputrec := [ 
      sink_type = 'ms_in',                  # sink type
      ms_name = 'test.ms',                  # MS filename
      data_column_name = 'DATA',            # which MS column to read
      tile_size = 5,                        # tile size, in time slots
      selection = [=],                      # selection sub-record
      python_init = 'read_msvis_header.py', # optional init script
      record_input = 'test.ms.boio'         # optional, boio file to record input stream to
  ];
  • The data_column_name field specifies which MS column is mapped to the DATA column of the tiles. Other tile columns cannot be populated at this time. Note that a typical tree will only be reading one column anyway.
  • The tile_size field determines the tile size (and therefore snippet domain size), in number of timeslots.
  • The selection record can be used extract a subset of the MS. It may contain the following fields: * [[!table header="no" class="mointable" data=""" channel_start_index | starting channel (default is 0) channel_end_index | ending channel (default is -1, for last channel) ddid_index | DATA_DESCRIPTION_ID (default is 0) field_index | FIELD_ID (default is 0) selection_string | any TaQL string for additional selection (default is none)

Only one DATA_DESCRIPTION_ID and FIELD_ID at a time is read in at the moment; multiple ddids  and fields have to be represented by separate streams. _Note: as customary throughout the system, fields ending with "`_index`" are 1-based in Glish and 0-based everywhere else, with automatic adjustment done._ 

* The `python_init` field is described in [[MeasurementSetHeaders|MeasurementSetHeaders]]. 
* The `record_input` field may be specified to capture a copy of the input visibility stream to the named file, as a flat (BOIO) file. Large MSs take a long time to read and tile properly; reading a BOIO file can be up to 4-5x faster. The captured file may be read in later by using a BOIO input sink. Which brings us to... 

#### BOIO Input Sinks

A BOIO input sink record looks like this: 


input := [ sink_type = 'boio', # sink type boio_file_name = 'test.ms.boio' # input file ];

The input file must be a flat BOIO file containing a visibility stream. Such files are produced either by capturing the input of another sink via `record_input`, or by writing the output to a BOIO sink (see below). 


### Configuring Output Sinks

Likewise, there are three types of output sinks, `ms_out` for writing to a measurement set, `boio` for writing flat files, and `octopussy` for publish/subscribe streams. The sink type is selected via the `sink_type` field of the output record. 


#### MS Output Sinks

At this time, MS output sinks cannot create new MSs of their own. Instead, they dump their data to the same MS that the input stream came from (this information is contained in the [[VisHeader|VisHeader]]). If a BOIO sink is providing the input stream, the MS that was used to capture the original stream is used. If the MS does not exist (or does not match the layout of the output stream), an error is reported. 

An MS sink output record looks like this: 


output := [ sink_type = 'ms_out', # sink type

data_column = 'DATA', # optional tile column mappings

predict_column = 'MODEL_DATA',

residuals_column = 'CORRECTED_DATA'

          flag_mask   = 0                         # output flag mask, 0 for none

];

* The `_column` fields specify where to put the tiles' output columns. Typically, [[MeqSinks|MeqSinks]] will be configured to dump data into the PREDICT or RESIDUALS column of a tile, these columns are then mapped to MS columns according to the `_column` fields. If a `_column` field is missing, then that tile column is ignored. Note that if the named MS column does not exist, the sink will create it; however, many AIPS++ tools (imager, etc.) only support the three standard columns named here, so this is of limited use. 
* The `flag_mask` field controls the writing of data flags. Tile flags are 32-bit masks, while AIPS++ flags are boolean. Tile flags are bitwise-ANDed with the mask to yield a boolean. If the mask is 0, then tile flags are not written out. 
An MS output sink with no `_column`s specified and a `flag_mask` of 0 is the sink equivalent of `/dev/nulll`. 


#### BOIO Output Sinks

BOIO output is an order of a magnitude faster than writing to an MS, so it may be good to use it when you're just experimenting. A separate command-line tool exists (_NB: almost..._) to write a BOIO file into an MS. 

A BOIO output sink record looks like this: 


input := [ sink_type = 'boio', # sink type boio_file_name = 'test.ms.boio' # output file boio_file_mode = 'W' # W to write new file, A to append ];

The sink simply dumps the stream to a flat BOIO file. Append mode may be used to concatenate multiple streams, which BOIO input sinks can automatically read in one by one. 


## An Example

Here's a Glish example of making event sinks, [[MeqSpigots|MeqSpigots]] and [[MeqSinks|MeqSinks]] work together: 


initialize meqserver

mqs.init([output_col="PREDICT"], output=[predict_column='MODEL_DATA',flag_mask=0]);

no stream is read yet since no input record is specified

...

create spigot

spigrec1 := meqnode('MeqSpigot','spigot1'); spigrec1.input_col := 'DATA'; spigrec1.station_1_index := 1; spigrec1.station_2_index := 2; mqs.meq('Create.Node',spigrec1);

...

create sink

sinkrec := meqnode('MeqSink','sink1',children="compare"); sinkrec.output_col := 'PREDICT'; sinkrec.station_1_index := 1; sinkrec.station_2_index := 2; mqs.meq('Create.Node',sinkrec);

...

inputrec := [ sink_type='ms_out', ms_name = 'test.ms', data_column_name = 'DATA', tile_size=5, selection = [=] ];

this starts the input stream

mqs.init(input=inputrec);

 

This exaple maps the MS "DATA" column via the tile DATA column to the input of a [[MeqSpigot|MeqSpigot]], and maps the output of a [[MeqSink|MeqSink]] via the tile PREDICT column to MS column "MODEL_DATA". 
Clone this wiki locally