/
tutorial2.txt
67 lines (45 loc) · 1.95 KB
/
tutorial2.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
Validation
==========
See the :doc:`previous <tutorial1>` tutorial.
Another aspect is to have some control over the data received. Out of the box *data-migrator*
offers a wide range of facilities
None column test
----------------
The current list of :ref:`Field types <field-types>` contains Nullables and non Nullable fields.
In some cases you might want to exclude those nulls. This is a generic function in the manager and
can be set on a model basis in the Meta block
.. code-block:: python
class Result(models.Model):
id = models.IntField(pos=0) # keep id
a = models.StringField(pos=1)
b = models.StringField(pos=2, nullable=None)
class Meta:
drop_if_none = ['b']
The moment records are saved they are validated and records are dropped if b turns out to be ``None``
Unique columns
--------------
Another common pattern is to drop non unique values (last to be dropped), for example when scanning
unique emailaddresses. This functionality is also out of the box:
.. code-block:: python
class Result(models.Model):
id = models.IntField(pos=0) # keep id
a = models.StringField(pos=1, unique=True)
class Meta:
drop_non_unique = True
This is checked at the parsing of the input, before any records are written. Use ``fail_non_unique``
if you do not want to drop, but completely fail the transformation.
Complex Unique columns
----------------------
A more complex situation when a combination of (input) columns need to be checked. Consider
for example the de-duplication of membership records. This is a solved by using a parse function and
a hidden column:
.. code-block:: python
def compound(row):
return row[0] + row[1] + row[2]
class Result(models.Model):
id = models.IntField(pos=0) # keep id
key = models.HiddenField(parse=compound, unique=True)
a = models.StringField(pos=1)
class Meta:
drop_non_unique = True
In this example key is set to check uniqueness but not output in the end result.