# parsers

The Midgard **parsers** module can be used for reading files. For example following file formats are available via Midgard:

- ANTEX
- RINEX observation and navigation (version 2 and 3)
- RINEX clock (version 1.0)
- SINEX (e.g. IGS SINEX station information file igs.snx)
- ...

All available parsers can be shown with `names` method:

In [None]:
# Import parsers package
from midgard import parsers

# List all available parsers
parsers.names()

## Use parsers module

An example is shown, how to use the **parsers** module:

In [None]:
# Import parsers package
from midgard import parsers

# Read file by generating an instance of a Parser class
p = parsers.parse_file(parser_name="gnss_sinex_igs", file_path="./examples/parsers/gnss_sinex_igs")

File data are saved in the `data` and `meta` attribute. The `data` attribute includes the observation data and the `meta` attribute the metainformation (e.g. header information) read from file.

In [None]:
# Access observation data
p.data

In [None]:
# Access metainformation data
p.meta

The observation data can be saved also in different kind of data structures:
- dataframe with method `as_dataframe()`
- dataset with method `as_dataset()`
- dict with method `as_dict()`

In [None]:
# Save data in dictionary data structure
p.as_dict()

## Implement parsers
This section describes what kind of parser functionality Midgard provides and how to use it.

Midgard provides following classes for parsing of files:

    - Parser: An abstract base class that has basic methods for parsing a datafile.
    - ChainParser: This class provides functionality for parsing a file with chained groups of information.
    - LineParser: This class provides functionality for using numpy to parse a file line by line.
    - SinexParser: This class provides functionality for parsing files in SINEX format.

### ChainParser
In the following it will be described, how the chain can be used. A chain parser should be applied if the datafile uses different data formats. For example with different header sections like:

<center><img src="figures/parsers/chain_parser.png", width=100/></center>

A single parser can read and parse one group of datalines, defined through the ParserDef by specifying how to parse
each line (parser_def), how to identify each line (label), how to recognize the end of the group of lines
(end_marker) and finally what (if anything) should be done after all lines in a group is read (end_callback).

The end_marker, label, skip_line and end_callback parameters should all be functions with the following signatures:

    end_marker   = func(line, line_num, next_line)
    label        = func(line, line_num)
    skip_line    = func(line)
    end_callback = func(cache)

The parser definition `parser_def` includes the `parser`, `field`, `strip` and `delimiter` entries. The `parser`
entry points to the parser function and the `field` entry defines how to separate the line in fields. The separated
fields are saved either in a dictionary or in a list. In the last case the line is split on whitespace by
default. With the `delimiter` entry the default definition can be overwritten. Leading and trailing whitespace
characters are removed by default before a line is parsed.  This default can be overwritten by defining the
characters, which should be removed with the 'strip' entry. The `parser` dictionary is defined like:

    parser_def = { <label>: {'fields':    <dict or list of fields>,
                             'parser':    <parser function>,
                             'delimiter': <optional delimiter for splitting line>,
                             'strip':     <optional characters to be removed from beginning and end of line>
                 }}