# State machine design pattern

## Requirements

In [96]:
import abc
import enum
import io
import re
import sys

## Problem specification

You want to parse the following data format.  It consists of potentially many experiments.  Each experiment has a name and defines a number of parameters.  Below is an example of such a file.

```
begin experiment basic
    n: int = 100
    # comment 1
    T: float = 317.0
    description = particle simulation, test
end experiment
# comment 2
begin experiment large
    n: int = 1000
    T: float = 347.0
    charge: bool = True
    description: str = particle simulation, production
end experiment
```

Note that this file format is not intended as an example of how such a specification should be defined.  It would be far better to use a standard file format such as JSON, YAML or XML.  This is only intended to illustrate how the state machine design pattern can help you implement a parser for such a file format.

## Implementation

### Exceptions

Since a lot of things can go wrong, it is useful to define a number of exceptions.  The are all derived from an abstract base class `ExperimentException` to ensure that they can be captured easily.

In [112]:
class ExperimentException(abc.ABC, Exception):
    '''Base class for experiment parser exceptions'''
    pass

class ParameterNameError(ExperimentException):
    '''Error to be raised when the name of a parameter is invalic'''
    pass

class ParameterTypeError(ExperimentException):
    '''Error to be raised when the type of a parameter is invalic'''
    pass

class ExperimentFormatError(ExperimentException):
    '''Error to be raised when the experiment file's format is invalid'''
    pass

### Experiment class

Next, you need a class to represent experiments.

In [107]:
class Experiment:
    '''Class that represents an experiment.
    
    Experiments can have arbitrary attributes that can be accessed Pythonically.
    Note that objects of this class should not be constructed directly, but are
    constructed by the Parser class.
    
    >>> e = Experiment('test')
    >>> e._add_attr('T', 273.15)
    >>> print(e.T)
    273.15
    >>> type(e.T)
    float
    '''
    
    def __init__(self, name):
        '''Initialize an experiment.
        
        Parameters
        ----------
        name: str
            name of the experiment
        '''
        self._name = name

    @property
    def experiment_name(self):
        '''Get the experiment's name

        Returns
        -------
        str
            name of the experiment
        '''
        return self._name

    def _add_attr(self, name, value):
        '''Add a new parameter to the experiment
        
        Parameters
        ----------
        name: str
            name of the  parameter to add
        value: str | int | float | complex | bool
            value of the parameter to add
        
        Returns
        -------
        Experiment
            object for chaining calls to _add_attr (builder design pattern)

        Raises
        ------
        ParameterNameError
            if the name of the attribute is invalid
        ParameterTypeError
            if the type of the parameter is invalid
        '''
        if hasattr(self, name):
            raise ParameterNameError(f"invalid parameter name '{name}'")
        if not isinstance(value, (str, int, float, complex, bool)):
            raise ParameterTypeError(f"invalid parameter type '{type(value)}'")
        self.__dict__[name] = value
        return self

    def __str__(self):
        '''String representation of the experiment
        
        Returns
        -------
        str
            human readable string representation of the experiment
        '''
        repr = f'experiment {self.experiment_name}'
        for name, value in self.__dict__.items():
            if name in ('_name', ):
                continue
            repr += f'\n  {name}: {type(value).__name__} = {value}'
        return repr

### Parser

The actual parsing of the file is done by instances of the `ExperimentParser` class.

In [211]:
class ExperimentParser:
    '''Parser for experiment description files.
    
    The following is an example of an experiment definition file:
    begin experiment basic
        n: int = 100
        # comment 1
        T: float = 317.0
        description = particle simulation, test
    end experiment
    # comment 2
    begin experiment large
        n: int = 1000
        T: float = 347.0
        charge: bool = True
        description: str = particle simulation, production
    end experiment
    '''

    class State(enum.Enum):
        '''Inner class to represent the parser state.
        '''
        START = enum.auto()
        IN_BLOCK = enum.auto()
        OUT_OF_BLOCK = enum.auto()
    
    def __init__(self, verbose=False):
        '''Initialize parser.
        
        Parameters
        ----------
        verbose: bool
            if True, provide verbose output to sys.stderr, default is False
        '''
        self.clear()
        self._verbose = verbose

    def clear(self):
        '''Clear list of experiments.
        '''
        self._experiments = []

    @property
    def is_verbose(self):
        '''Check whether parser is in verbose mode.
        
        Returns
        -------
        bool
            True if parser is in verbose mode, False otherwise
        '''
        return self._verbose
        
    @property
    def experiments(self):
        '''Return the list of experiments.
        
        Returns
        -------
        list[Experiment]
            list of experiments parsed so far
        '''
        return self._experiments

    def _parse_header(self, line, line_nr):
        '''Parser a begin experiment line.
        
        Parameters
        ----------
        line: str
            line to parse
        line_nr: int
            line number to provide feedback on errors

        Raises
        ------
        ExperimentFormatError
            if experiment begin line format is invalid
        '''
        regex_str = r'^begin\s+experiment\s+(?P<experiment_name>\w+)$'
        if match := re.match(regex_str, line):
            self._experiments.append(Experiment(match.group('experiment_name')))
            return match.group('experiment_name')
        else:
            raise ExperimentFormatError(f'invalid line {line_nr}: {line}')

    def _parse_parameter_definition(self, line, line_nr):
        '''Parser a parameter definition line.

        Note that an invalid type error will raise an ExperimentFormatError rather
        than a ParameterTypeError since this method uses eval, and hence the type is
        checked using the regular expression.
        
        Parameters
        ----------
        line: str
            line to parse
        line_nr: int
            line number to provide feedback on errors

        Raises
        ------
        ExperimentFormatError
            if parameter definition format is invalid
        ParameterNameError
            if a parameter has an invalid name
        '''
        regex_str = r'''
            ^
            (?P<var_name>\w+)                                       # variable name
            (?:\s*:\s*(?P<var_type>str|int|float|complex|bool))?    # optional data type
            \s*=\s*                                                 # assignment operator
            (?P<var_value>.+)                                       # variable value
            $'''
        if match := re.match(regex_str, line.strip(), re.VERBOSE):
            if match.group('var_type'):
                var_type = eval(match.group('var_type'))
                var_value = var_type(match.group('var_value'))
            else:
                var_value = match.group('var_value')
            self._experiments[-1]._add_attr(match.group('var_name'), var_value)
        else:
            raise ExperimentFormatError(f'invalid line {line_nr}: {line}')

    @staticmethod
    def _is_blank(line):
        '''Check whether a line is blank.
        
        Returns
        -------
        bool
            True if line is blank, False otherwise
        '''
        return re.match(r'^\s*$', line)
        
    @staticmethod
    def _is_comment(line):
        '''Check whether a line is a comment.
        
        Returns
        -------
        bool:
            True if line is a comment, False otherwise
        '''
        return re.match(r'^\s*#', line)

    @staticmethod
    def _is_block_begin(line):
        '''Check whether a line is a begin experiment.
        
        Returns
        -------
        bool:
            True if line is a begin experiment, False otherwise
        '''
        return re.match(r'^\s*begin\s+experiment', line)

    @staticmethod
    def _is_block_end(line):
        '''Check whether a line is an end experiment.
        
        Returns
        -------
        bool:
            True if line is an end experiment, False otherwise
        '''
        return re.match(r'^\s*end\s+experiment', line)

    @property
    def _is_in_block(self):
        '''Check whether the parser is in an experiment definition block
        
        Returns
        -------
        bool:
            True if the parser is in a definition block, False otherwise
        '''
        return self._state == self.State.IN_BLOCK

    def _check_block_begin_end_match(self, line, line_nr):
        '''Verify that name of experiment matches at begin and and markers.
        Parameters
        ----------
        line: str
            line to parse
        line_nr: int
            line number to provide feedback on errors

        Raises
        ------
        ExperimentFormatError
            if parameter definition format is invalid
        '''
        regex_str = r'^end\s+experiment\s+(?P<experiment_name>\w+)$'
        if match := re.match(regex_str, line):
            if match.group('experiment_name') != self._experiments[-1].experiment_name:
                raise ExperimentFormatError(f"begin and end experiment names do not match on line {line_nr}: '{line}'")            
        
    @property
    def _is_done(self):
        '''Chack whether parsing is done.
        
        The parser's state is either START or OUT_OF_BLOCK.

        Returns
        -------
        bool:
            True if the parser is one, False otherwise
        '''
        return self._state in (self.State.START, self.State.OUT_OF_BLOCK)
        
    def parse(self, file):
        '''Parse an experiment definition file.
        
        Parameters
        ----------
        file: typing.TextIO
            file-like object to parse
            
        Raises
        ------
        ExperimentFormatError
            if the experiment file is not correctly formatted
        ParameterNameError
            if a parameter has an invalid name
        '''
        self._state = self.State.START
        for line_nr, line in enumerate(file, 1):
            line = line.rstrip()
            if ExperimentParser._is_blank(line):
                continue
            elif ExperimentParser._is_comment(line):
                continue
            elif ExperimentParser._is_block_begin(line):
                if self._is_in_block:
                    raise ExperimentFormatError(f"missing end experiment for '{self._experiments[-1].experiment_name}'")
                name = self._parse_header(line, line_nr)
                if self.is_verbose:
                    print(f'start parsing {name}', file=sys.stderr)
                self._state = self.State.IN_BLOCK
            elif ExperimentParser._is_block_end(line):
                if not self._is_in_block:
                    raise ExperimentFormatError(f"end experiment without begin on line {line_nr}: '{line}'")
                self._check_block_begin_end_match(line, line_nr)
                if self.is_verbose:
                    print(f'end parsing {self._experiments[-1].experiment_name}', file=sys.stderr)
                self._state = self.State.OUT_OF_BLOCK
            elif self._is_in_block:
                self._parse_parameter_definition(line, line_nr)
            else:
                raise ExperimentFormatError(f"invalid line {line_nr}: '{line}'")
        if not self._is_done:
            raise ExperimentFormatError(f'invalid experiments file, no end for experiment {self._experiments[-1].experiment_name}')

## Testing

Clearly, you should use unit testing, e.g., pytest for proper testing, this is only intended to illustrate the behavior of the parser.

### Valid data

First, start off with a valid experiment file.

In [212]:
valid_experiments = '''
begin experiment basic
    n: int = 100
    # comment 1
    T: float = 317.0
    description = particle simulation, test
end experiment
# comment 2
begin experiment large
    n: int = 1000
    T: float = 347.0
    charge: bool = True
    description: str = particle simulation, production
end experiment
'''

In [213]:
parser = ExperimentParser()

In [214]:
parser.parse(io.StringIO(valid_experiments))

In [215]:
for experiment in parser.experiments:
    print(experiment)

experiment basic
  n: int = 100
  T: float = 317.0
  description: str = particle simulation, test
experiment large
  n: int = 1000
  T: float = 347.0
  charge: bool = True
  description: str = particle simulation, production


### Invalid parameter name

Next, try with a file that has an invalid parameter name.

In [216]:
invalid_parameter_name_experiments = '''
begin experiment basic
    n: int = 100
    # comment 1
    T: float = 317.0
    description = particle simulation, test
end experiment
# comment 2
begin experiment large
    n: int = 1000
    T: float = 347.0
    charge: bool = True
    _name: str = particle simulation, production
end experiment
'''

In [217]:
parser = ExperimentParser()

In [218]:
try:
    parser.parse(io.StringIO(invalid_parameter_name_experiments))
except ParameterNameError as e:
    print(e)

invalid parameter name '_name'


### Invalid parameter type

Try a parameter with an invalid type.

In [219]:
invalid_parameter_type_experiments = '''
begin experiment basic
    n: int = 100
    # comment 1
    T: float = 317.0
    description = particle simulation, test
    values: list = 2, 3, 5, 7
end experiment
# comment 2
begin experiment large
    n: int = 1000
    T: float = 347.0
    charge: bool = True
    descripton: str = particle simulation, production
end experiment
'''

In [220]:
parser = ExperimentParser()

In [221]:
try:
    parser.parse(io.StringIO(invalid_parameter_type_experiments))
except ExperimentFormatError as e:
    print(e)

invalid line 7:     values: list = 2, 3, 5, 7


### Missing end experiment

Try an experiment definition without and end marker.

In [222]:
missing_end_experiments = '''
begin experiment basic
    n: int = 100
    # comment 1
    T: float = 317.0
    description = particle simulation, test
# comment 2
begin experiment large
    n: int = 1000
    T: float = 347.0
    charge: bool = True
    descripton: str = particle simulation, production
end experiment
'''

In [223]:
parser = ExperimentParser()

In [224]:
try:
    parser.parse(io.StringIO(missing_end_experiments))
except ExperimentFormatError as e:
    print(e)

missing end experiment for 'basic'


### Non-matching begin and end

Try an experiment with a non-matching begin and end.

In [225]:
non_matching_begin_end_experiments = '''
begin experiment basic
    n: int = 100
    # comment 1
    T: float = 317.0
    description = particle simulation, test
end experiment nonbasic
# comment 2
begin experiment large
    n: int = 1000
    T: float = 347.0
    charge: bool = True
    descripton: str = particle simulation, production
end experiment
'''

In [226]:
parser = ExperimentParser()

In [227]:
try:
    parser.parse(io.StringIO(non_matching_begin_end_experiments))
except ExperimentFormatError as e:
    print(e)

begin and end experiment names do not match on line 7: 'end experiment nonbasic'


### Invalid experiment definition

Try an experiment with an invalid name.

In [228]:
invalid_experiment_definition_experiments = '''
begin experiment basic
    n: int = 100
    # comment 1
    T: float = 317.0
    description = particle simulation, test
end experiment basic
# comment 2
begin experiment large scale run
    n: int = 1000
    T: float = 347.0
    charge: bool = True
    descripton: str = particle simulation, production
end experiment
'''

In [229]:
parser = ExperimentParser()

In [230]:
try:
    parser.parse(io.StringIO(invalid_experiment_definition_experiments))
except ExperimentFormatError as e:
    print(e)

invalid line 9: begin experiment large scale run
