docs/tutorial/tutorial1.txt

Extending Types
===============

This is the next tutorial, see the :doc:`previous <tutorial0>`

Once you have your base transformation up and running it is time to extend the fields. Out of the box the 
system delivers various extensions, the full list of :ref:`Field types <field-types>` can be found in the 
reference.

Translating IDs
---------------

One of the patterns we see in the evolution with microservices is a move away from fully normalized database
schema's. Modern systems rely more on auto-completion, data is cheaper and probably a lot more reasons to store
full strings instead of IDs. In *data-migrator* this is easily supported for hardcoded values with a small 
python function:

.. code-block:: python

  M = {
    0: "NOT SET",
    1: "Hallo",
    2: "Hello",
    3: "Bonjour"
  }

  def parse_b(v):
    return M.get(int(v), M[0])
  
  class Result(models.Model):
    id = models.IntField(pos=0) # keep id
    ....
    # replace ID with value
    b  = models.StringField(pos=2, parse=parse_b, default="How are you doing?")

Note the values are parsed as string from the CSV reader.``NULL`` is by default translated to ``None``, which
is replaced by the default value and will never see the parse function.

Merging columns
---------------

Another migration pattern is to merge separate (boolean) columns back to a single enumeration column. To support
that use a row parser instead of a single value parser. If no ``pos`` is given, the parser will be row based 
instead of a value parsed linked to a single column value:

.. code-block:: python

  def employment_type(row):
    if row[26] == "1": 		# contractor
      return 'contractor'
    elif row[27] == "1": 	# intern
      return 'intern'
    else:
      return 'perm'

  class Result(models.Model):
    ....
    b  = models.StringField(parse=employment_type, default="perm")


Dynamic lookups
---------------

At moments one needs to lookup values in the target database. Do not be shy to generate dynamic lookups in the
target database using SELECT statements that run during import into the target database.

.. code-block:: python

  class Result(models.Model):
    recruiter_uuid = models.StringField(pos=38,
      replace=lambda x:'(SELECT uuid FROM `persons` WHERE `mail`=%s limit 0,1)' % x)

This can off course be combined with python based transformations to fix deleted values:

.. code-block:: python

  def recruiter(v):
    if v is None or v in ['missing1@mail.com', 'missing2@mail.com']:
        return 'default_person@mail.com'
    else:
      return v

  class Result(models.Model):
    recruiter_uuid = models.StringField(pos=38, parse=recruiter,
      replace=lambda x:'(SELECT uuid FROM `persons` WHERE `mail`=%s limit 0,1)' % x)

The output is a value for the target database as being the input for a query on that target database.

Table lookups
-------------

For larger tables there is support for table driven lookups from external CSV files. It is also possible for
the map (just a key,value) to be ad-hoc generated by other means. *data-migrator* offers a helper function
:func:`~.read_map_from_csv` to read the csv.

.. code-block:: python

  from data_migrator.contrib.read import read_map_from_csv

  class Result(models.Model):
    country = models.MappingField(pos=33, default='NLD',
      data_map=read_map_from_csv(f=open('data/country.csv'), delimiter=';', key='country_id', value='alpha3'))

Combining table lookups
-----------------------

The table lookup and column reduction can also be combined. Consider a multi state entity with specific lookup 
values to be merged in one lookup value:

.. code-block:: python

  from data_migrator.contrib.read import read_map_from_csv

  LOOKUP1 = read_map_from_csv(f=open('data/state1.csv'), delimiter=';', key='id', value='name')
  LOOKUP2 = read_map_from_csv(f=open('data/state2.csv'), delimiter=';', key='id', value='name')
  
  def parse_lookup(row):
    return LOOKUP1.get(row[1], LOOKUP2.get(row[2], ''))
    
  class Result(models.Model):
    ...
    state = models.StringField(parse=parse_lookup)

Flatten multi values
--------------------

The most extensive many-2-many flattening is for example a tagging of multiple values to a main entity.
This is mostly implemented in a 3 table structure, following the classic normalization approach:

* A table with the main entity (for example persons)
* a table with the attributes in a fixed id,value structure and last
* a many-to-many table linking the attributes to the main entities.

A simple approach to flatten this is to encode this a JSON list, to transform the data use a four step 
approach:

#. Extract the data from the old system fully expanded
#. Read the CSV and flatten to a map of lists
#. Link the values at read time replacing the main ID with lists
#. Emit the whole as a JSON list

The first step relies on queries like:

.. code-block:: sql

	SELECT
		P.person_id,
		S.name as skill
	FROM person_skill_m2m P
	INNER JOIN skills S
		ON S.id=P.skill_id;

After that, loading and emitting to JSON is simply using the :class:`~.MappingField`

.. code-block:: python

  from data_migrator.contrib.read import read_map_from_csv

  class Result(models.Model):
    skills = models.MappingField(pos=0, default=[], as_json=True,
      data_map=read_map_from_csv(f=open('results/skill.csv'), key='candidate_id', value='skill', as_list=True))


Now take these examples and mix your own. It is standard Python, we have no doubt you can come up with all kinds
of amazing transformations.