-
Notifications
You must be signed in to change notification settings - Fork 4
/
example.txt
100 lines (70 loc) · 3.65 KB
/
example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Example
=======
Core of *data-migrator* is the unix pipe and filter paradigm to build data
transformers. Source data is read from a database or some other source. It is
piped to a filter written with *data-migrator*, which emits for example SQL
INSERT statements. These can be piped to a target client and thereby loaded
into the target database.
Consider the most basic datapump in mysql, which is direct pipe between source
and target database:
.. code-block:: bash
$ mysqldump -u [uname] -p[pass] source_db table | mysql target_db
In this example ``mysqldump`` will export the table as SQL statements and the
new database will process them. Now if you want to do something extra and
repeatable with respect to the data, you could use all kinds of unix filtering
with sed, awk, or your favorite scripting. It is not very hard to imagine what
Pythonista's would do especially if extra transformations or simplifications
are needed. The basic packages are quite expressive and one would easily setup
something like:
.. code-block:: bash
$ mysql source_db -E 'select * from table' -B | python my_filter.py | mysql target_db
With `my_filter.py` a basic implementation of `csv`_:
.. code-block:: python
import sys, csv
reader = csv.DictReader(sys.stdin)
for row in reader:
print 'INSERT INTO `table` (a,b) VALUES ("%(a)s", %(b)s)' % row
To see the options for manipulation is left as an exercise to the reader, but
do accept that as soon things become just a little more complex (think:
splitting in two tables, column reverses, renaming of columns, mixing, joining,
filtering, transforming), a more declarative support is helpful. That is why
we came up with *data-migrator*. One could simply replace the transformation
with:
.. code-block:: python
from data_migrator import models, transform
from data_migrator.emitters import MySQLEmitter
def parse_b(v):
if v == 'B':
return 'transformed_B'
else:
return v.lower()
class Result(models.Model):
id = models.IntField(pos=0) # keep id
uuid = models.UUIDField() # generate new uuid4 field
# replace NULLs and trim
a = models.StringField(pos=1, default='NO_NULL', max_length=5, nullable='NULL', replacement=lambda x:x.upper())
# parse this field
b = models.StringField(pos=2, parse=parse_b, name='my_b')
class Meta:
table_name = 'new_table_name'
# django-esc like creating and saving of additional records (to a manager)
Result(a='my a', b='my b').object.save()
if __name__ == "__main__":
transform.Transformer(models=[Result], emitter=MySQLEmitter).process()
assert(len(Result.objects) > 1)
Hereby write a nice self explaining transformer, which will generate something
like:
.. code-block:: sql
-- transformation for Result to table new_table_name
-- input headers: id,a,b
-- stats: in=10,dropped=0,out=10
SET SQL_SAFE_UPDATES = 0; -- you need this to delete without WHERE clause
DELETE FROM `new_table_name`;
ALTER TABLE `new_table_name` AUTO_INCREMENT = 1;
INSERT INTO `new_table_name` (`id`, `uuid`, `a`, `my_b`) VALUES (0, "ac7100b9-c9ad-4069-8ca5-8db1ebd36fa3", "MY A", "my b");
INSERT INTO `new_table_name` (`id`, `uuid`, `a`, `my_b`) VALUES (1, "38211712-0eb2-4433-b28f-e3fe33492e7a", "NO_NULL", "some value");
INSERT INTO `new_table_name` (`id`, `uuid`, `a`, `my_b`) VALUES (2, "a3478903-aed9-462c-8f47-7a89013bc6ea", "CHOPP", "transformed_B");
...
If you can see the value of this package, continue with the :doc:`installation </install>` of *data-migrator*.
After which there are some more pages in the :doc:`Tutorial </tutorial/tutorial0>` for you to get a grasp of this library.
.. _csv: https://docs.python.org/2/library/csv.html