Tutorial walkthrough (https://medium.freecodecamp.org/sqlalchemy-makes-etl-magically-easy-ab2bd0df928)

#### Install SQLalchemy

`$ pip install sqlalchemy`

### Defininig Schema

database schema defines the structure of a dabase system
- tables
- columns
- fields
- the relationships between them

Schemas can be defined in raw SQL, or through the use of SQLAlchemy's ORM feature

* * * *

#### (1) Import SQLAlchemy

In [1]:
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import *


import pickle

#### (2) Connect to database

For engine configuration for different Database, check the link below  
<http://docs.sqlalchemy.org/en/latest/core/engines.html>

In [2]:
pw = pickle.load(open("sqlalchemy_mysql_pw.pickle", "rb"))

In [3]:
engine = create_engine("mysql+mysqldb://root:" + pw + "@52.78.44.120/test_alchemy")

#### (3) Define table classes

- format --> `<ColumnName> = Column(type)`

`type` is a data type *such as Interger, String, Datetime and so on.  
Use `primary_key=True` to denote columns which will be used as primary keys

In [4]:
Base = declarative_base()

In [5]:
class Users(Base):
    __tablename__ = "users"
    UserId = Column(Integer, primary_key=True)
    Title = Column(VARCHAR(255))
    FirstName = Column(VARCHAR(255))
    LastName = Column(VARCHAR(255))
    Email = Column(VARCHAR(255))
    UserName = Column(VARCHAR(255))
    DOB = Column(DateTime)

In [6]:
class Uploads(Base):
    __tablename__ = "uploads"
    UploadId = Column(Integer, primary_key=True)
    UserId = Column(Integer)
    Title = Column(VARCHAR(255))
    Body = Column(VARCHAR(255))
    Timestamp = Column(DateTime)

#### (4) Create tables.

`checkfirst=True` parameter ensures that new tables are only created if they do not currently exist in the database

In [7]:
Users.__table__.create(bind=engine, checkfirst=True)
Uploads.__table__.create(bind=engine, checkfirst=True)

### Extract

Once the schema has been defined, the next task is Extraction of raw data from its source

In [8]:
import requests

The example data is held in two objects in JSON format

In [9]:
url = 'https://randomuser.me/api/?results=10'
users_json = requests.get(url).json()

In [10]:
url2 = 'https://jsonplaceholder.typicode.com/posts/'
uploads_json = requests.get(url2).json()

In [11]:
users_json

{'info': {'page': 1,
  'results': 10,
  'seed': '12c245a14de87408',
  'version': '1.1'},
 'results': [{'cell': '0772-527-698',
   'dob': '1978-08-19 00:32:13',
   'email': 'joey.nelson@example.com',
   'gender': 'male',
   'id': {'name': 'NINO', 'value': 'YE 38 11 12 Y'},
   'location': {'city': 'chichester',
    'postcode': 'TO3A 3UD',
    'state': 'avon',
    'street': '6984 school lane'},
   'login': {'md5': 'e0f9d62b8f67424db55a3b8f2f8f4fa7',
    'password': 'pinkfloy',
    'salt': 'xcM6gb3K',
    'sha1': '1477cf806549e7c93c93ff41df2e4adab78cd5c3',
    'sha256': '81ac8582632e83233df4b6e286c4ecd044fc95ca8be443bab05e6b231c347ac6',
    'username': 'yellowladybug722'},
   'name': {'first': 'joey', 'last': 'nelson', 'title': 'mr'},
   'nat': 'GB',
   'phone': '01150 328833',
   'picture': {'large': 'https://randomuser.me/api/portraits/men/59.jpg',
    'medium': 'https://randomuser.me/api/portraits/med/men/59.jpg',
    'thumbnail': 'https://randomuser.me/api/portraits/thumb/men/59.jpg'},

In [12]:
uploads_json

[{'body': 'quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto',
  'id': 1,
  'title': 'sunt aut facere repellat provident occaecati excepturi optio reprehenderit',
  'userId': 1},
 {'body': 'est rerum tempore vitae\nsequi sint nihil reprehenderit dolor beatae ea dolores neque\nfugiat blanditiis voluptate porro vel nihil molestiae ut reiciendis\nqui aperiam non debitis possimus qui neque nisi nulla',
  'id': 2,
  'title': 'qui est esse',
  'userId': 1},
 {'body': 'et iusto sed quo iure\nvoluptatem occaecati omnis eligendi aut ad\nvoluptatem doloribus vel accusantium quis pariatur\nmolestiae porro eius odio et labore et velit aut',
  'id': 3,
  'title': 'ea molestias quasi exercitationem repellat qui ipsa sit aut',
  'userId': 1},
 {'body': 'ullam et saepe reiciendis voluptatem adipisci\nsit amet autem assumenda provident rerum culpa\nquis hic commodi nesciunt rem tenetur dolore

### Transform

It is important to ensure that it is in the correct format. The JSON objects created in the code above are nested, and contain more data than is requred fro the tables defined

Intermediary step **Data Tranformation** is important for the current nested JSONN format, so that the flat format can be safely written to the database without error

The code below creates two lists, `users` and `uploads`

In [13]:
from datetime import datetime, timedelta
from random import randint

In [14]:
users = []
uploads = []

In [15]:
for i, result in enumerate(users_json['results']):
    row = {}
    row['UserId'] = i
    row['Title'] = result['name']['title']
    row['FirstName'] = result['name']['first']
    row['LastName'] = result['name']['last']
    row['Email'] = result['email']
    row['UserName'] = result['login']['username']
    dob = datetime.strptime(result['dob'], '%Y-%m-%d %H:%M:%S')
    row['DOB'] = dob.date()
    users.append(row)

In [16]:
for result in uploads_json:
    row = {}
    row['UploadId'] = result['id']
    row['UserId'] = result['userId']
    row['Title'] = result['title']
    row['Body'] = result['body']
    delta = timedelta(seconds=randint(1, 86400))
    row['Timestamp'] = datetime.now() - delta
    uploads.append(row)

In [17]:
users[0]['UserId'] = 10

The main goal for the for loops is to iterate through the JSON objects. For each result, create a new Python dictionary object with keys corresponding to each column defined for the relevant table in the schema. This ensures that the data is no longer nested but keeps only the data we need for the database tables

The other step is to use Python's `datetime` module to parse dates and transform them into `DateTime` type objects that can be written to the databse. For the sake of this example, random `DataTime` objects are generated using the timedelta() method.

Each creaed dictionary is appended to a list

### Load

Finally, the data is in a form that can be loaded into the database. SQLAlchemy makes this step straightforward through its Session API

The Session API acts a bit like a middleman, or "holding zone," for Python objects you have either loaded from or associated with the database. These objects can be manipulated within the session before being committed to the database:

In [18]:
Session = sessionmaker(bind=engine)
session = Session()

In [19]:
for user in users:
    row = Users(**user)
    session.add(row)

In [20]:
for upload in uploads:
    row = Uploads(**upload)
    session.add(row)

In [22]:
session.commit()

InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry '1' for key 'PRIMARY'") [SQL: 'INSERT INTO uploads (`UploadId`, `UserId`, `Title`, `Body`, `Timestamp`) VALUES (%s, %s, %s, %s, %s)'] [parameters: ((1, 1, 'sunt aut facere repellat provident occaecati excepturi optio reprehenderit', 'quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto', datetime.datetime(2018, 5, 18, 14, 21, 24, 56422)), (2, 1, 'qui est esse', 'est rerum tempore vitae\nsequi sint nihil reprehenderit dolor beatae ea dolores neque\nfugiat blanditiis voluptate porro vel nihil molestiae ut reiciendis\nqui aperiam non debitis possimus qui neque nisi nulla', datetime.datetime(2018, 5, 18, 3, 43, 9, 56443)), (3, 1, 'ea molestias quasi exercitationem repellat qui ipsa sit aut', 'et iusto sed quo iure\nvoluptatem occaecati omnis eligendi aut ad\nvoluptatem doloribus vel accusantium quis pariatur\nmolestiae porro eius odio et labore et velit aut', datetime.datetime(2018, 5, 18, 15, 39, 35, 56449)), (4, 1, 'eum et est occaecati', 'ullam et saepe reiciendis voluptatem adipisci\nsit amet autem assumenda provident rerum culpa\nquis hic commodi nesciunt rem tenetur doloremque ipsam iure\nquis sunt voluptatem rerum illo velit', datetime.datetime(2018, 5, 18, 4, 12, 32, 56454)), (5, 1, 'nesciunt quas odio', 'repudiandae veniam quaerat sunt sed\nalias aut fugiat sit autem sed est\nvoluptatem omnis possimus esse voluptatibus quis\nest aut tenetur dolor neque', datetime.datetime(2018, 5, 18, 2, 38, 33, 56458)), (6, 1, 'dolorem eum magni eos aperiam quia', 'ut aspernatur corporis harum nihil quis provident sequi\nmollitia nobis aliquid molestiae\nperspiciatis et ea nemo ab reprehenderit accusantium quas\nvoluptate dolores velit et doloremque molestiae', datetime.datetime(2018, 5, 18, 9, 19, 58, 56462)), (7, 1, 'magnam facilis autem', 'dolore placeat quibusdam ea quo vitae\nmagni quis enim qui quis quo nemo aut saepe\nquidem repellat excepturi ut quia\nsunt ut sequi eos ea sed quas', datetime.datetime(2018, 5, 18, 0, 40, 51, 56467)), (8, 1, 'dolorem dolore est ipsam', 'dignissimos aperiam dolorem qui eum\nfacilis quibusdam animi sint suscipit qui sint possimus cum\nquaerat magni maiores excepturi\nipsam ut commodi dolor voluptatum modi aut vitae', datetime.datetime(2018, 5, 17, 23, 44, 1, 56473))  ... displaying 10 of 100 total bound parameter sets ...  (99, 10, 'temporibus sit alias delectus eligendi possimus magni', 'quo deleniti praesentium dicta non quod\naut est molestias\nmolestias et officia quis nihil\nitaque dolorem quia', datetime.datetime(2018, 5, 18, 4, 12, 53, 56872)), (100, 10, 'at nam consequatur ea labore ea harum', 'cupiditate quo est a modi nesciunt soluta\nipsa voluptas error itaque dicta in\nautem qui minus magnam et distinctio eum\naccusamus ratione error aut', datetime.datetime(2018, 5, 18, 7, 29, 19, 56876)))] (Background on this error at: http://sqlalche.me/e/gkpj)