# 3. SQLAlchemy orm

The main objective of the Object Relational Mapper API of SQLAlchemy is to facilitate associating user-defined Python classes with database tables, and objects of those classes with rows in their corresponding tables. Changes in states of objects and rows are synchronously matched with each other. SQLAlchemy enables expressing database queries in terms of user defined classes and their defined relationships.

## 3.1 Declare mapping

In case of ORM, the configuration process starts by
- describing the database tables
- defining classes which will be mapped to those tables.

In SQLAlchemy, these two tasks are performed together. This is done by using Declarative system; the classes created include directives to describe the actual database table they are mapped to.

In [1]:
from sqlalchemy import create_engine
base_path="../../../data/orm_test.db"
db_url=f"sqlite:///{base_path}"
# echo(default is false) when set to True will generate the activity log
# Below command will create the sqlite db, if not existed
# create_engine() will return an engine object.
# The Engine establishes a real DBAPI connection to the database when
# a method like Engine.execute() or Engine.connect() is called.
engine = create_engine(db_url, echo = True)

Below code create a `base class`, which stores a catalog of classes and mapped tables in the Declarative system. This is called as the declarative base class. There will be usually just one instance of this base in a commonly imported module. The declarative_base() function is used to create base class. This function is defined in sqlalchemy.ext.declarative module.

In [2]:
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()

A table object mapper class in Declarative must have a **__tablename__** attribute, and **at least one Column** which is part of a primary key. Declarative replaces all the Column objects with special Python accessors known as `descriptors`. In below example, we have two types of descriptors:
- column
- relationship

All the descriptors will be stored in **Base.metadata**

In [3]:
from sqlalchemy.orm import backref, relationship
from sqlalchemy import Column, Integer,String, SmallInteger, Text, DateTime, ForeignKey

# One-to-many relation
# Having a ForeignKey defines the existence of the relationship between Cohort and
# Dataset.
# Below code defines a parent-child collection. The datasets attribute being plural
# (which is not a requirement, just a convention) is an indication that it’s a collection.
# The first parameter is the class name Dataset (which is not the table name dataset), is the
# class to which the datasets attribute is related. The relationship informs SQLAlchemy that
# there’s a relationship between the **Cohort and Dataset classes**. SQLAlchemy will find the
# relationship in the Dataset class definition (line 3 of Dataset class)
# The backref parameter creates an author attribute for each Book instance. This attribute refers
# to the parent Author that the Book instance is related to.

class Cohort(Base):
    __tablename__='cohort'

    id=Column(Integer,primary_key=True)
    cname=Column(String)
    datasets=relationship("Dataset", backref=backref("cohort"))

In [4]:
class Dataset(Base):
    __tablename__="dataset"

    id=Column(Integer,primary_key=True)
    cohort_id=Column(Integer, ForeignKey("cohort.id"))
    year= Column(Integer)
    name = Column(String)
    location = Column(String)
    status = Column(SmallInteger)
    validation_tasks=relationship("ValidationTask",backref=backref("dataset"))

In [5]:
class Descriptor(Base):
    __tablename__="descriptor"

    id=Column(Integer,primary_key=True)
    dataset_id=Column(Integer, ForeignKey("dataset.id"))
    name = Column(String)
    location = Column(String)

In [6]:
class ValidationRule(Base):
    __tablename__="validation_rule"

    id=Column(Integer,primary_key=True)
    name = Column(String)
    description=Column(Text)
    args= Column(String)
    kwargs= Column(String)
    validation_tasks=relationship("ValidationTask",backref=backref("validation_rule"))

In [7]:
class ValidationTask(Base):
    __tablename__="validation_task"

    id=Column(Integer,primary_key=True)
    start_date=Column(DateTime)
    end_date=Column(DateTime)
    dataset_id=Column(Integer, ForeignKey("dataset.id"))
    validation_rule_id=Column(Integer,ForeignKey("validation_rule.id"))
    task_status = Column(SmallInteger)
    output = Column(Text)


In [8]:
Base.metadata.create_all(engine)

2022-12-12 09:40:26,176 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-12 09:40:26,177 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("cohort")
2022-12-12 09:40:26,197 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-12-12 09:40:26,200 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("cohort")
2022-12-12 09:40:26,201 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-12-12 09:40:26,201 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("dataset")
2022-12-12 09:40:26,202 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-12-12 09:40:26,203 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("dataset")
2022-12-12 09:40:26,217 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-12-12 09:40:26,217 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("descriptor")
2022-12-12 09:40:26,228 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-12-12 09:40:26,229 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("descriptor")
2022-12-12 09:40:26,233 INFO sqlalchemy.engine.Engine [raw sql

## 3.2 Session

In order to interact with the database, we need to obtain its handle. A **session object** is the handle to database. Session class is defined using **sessionmaker()**  a configurable session factory method which is bound to the `engine object`.

In [9]:
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind = engine)

In [10]:
session = Session()

### 3.2.1 Insert rows by using session

Session class provides two function to add rows into tables
- .add(): add one entry a time
- .add_all(): add a list of entry


In [11]:
c1=Cohort(id=0,cname="breast_cancer")
session.add(c1)

In [12]:
# Note that the above transaction is pending until the same is flushed using commit() method.
session.commit()

2022-12-12 14:29:36,510 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-12 14:29:36,513 INFO sqlalchemy.engine.Engine INSERT INTO cohort (id, cname) VALUES (?, ?)
2022-12-12 14:29:36,514 INFO sqlalchemy.engine.Engine [generated in 0.00060s] (0, 'breast_cancer')
2022-12-12 14:29:36,515 INFO sqlalchemy.engine.Engine COMMIT


In [13]:
c2=Cohort(id=1,cname="colon_cancer")
d1=Dataset(id=1,cohort_id=0,year=2019,name="breast_clinical_meds.csv",location="s3://minio.casd.local/data/casd_cancer",status=0)

d2=Dataset(id=2,cohort_id=0,year=2018,name="breast_cancer_death_rate.csv",location="s3://minio.casd.local/data/casd_cancer",status=0)

session.add_all([c2,d1,d2])

In [14]:
session.commit()

2022-12-12 14:39:05,666 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-12 14:39:05,667 INFO sqlalchemy.engine.Engine INSERT INTO cohort (id, cname) VALUES (?, ?)
2022-12-12 14:39:05,668 INFO sqlalchemy.engine.Engine [cached since 569.2s ago] (1, 'colon_cancer')
2022-12-12 14:39:05,670 INFO sqlalchemy.engine.Engine INSERT INTO dataset (id, cohort_id, year, name, location, status) VALUES (?, ?, ?, ?, ?, ?)
2022-12-12 14:39:05,670 INFO sqlalchemy.engine.Engine [generated in 0.00082s] ((1, 0, 2019, 'breast_clinical_meds.csv', 's3://minio.casd.local/data/casd_cancer', 0), (2, 0, 2018, 'breast_cancer_death_rate.csv', 's3://minio.casd.local/data/casd_cancer', 0))
2022-12-12 14:39:05,671 INFO sqlalchemy.engine.Engine COMMIT


### 3.2.2 Querying tables

All `SELECT statements generated by SQLAlchemy ORM are constructed by Query object`. It provides a generative interface, hence successive calls return a new Query object, a copy of the former with additional criteria and options associated with it.

Query objects are initially generated using the query() method of the Session as follows

```python
q = session.query(mapped class)

# below statement is also equivalent to the above given statement
q = Query(mappedClass, session)
```

In [16]:
result = session.query(Cohort).all()

for row in result:
    print("Id: ", row.id, ", Name: ", row.cname)

2022-12-12 14:45:17,666 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort
2022-12-12 14:45:17,667 INFO sqlalchemy.engine.Engine [cached since 10.1s ago] ()
Id:  0 , Name:  breast_cancer
Id:  1 , Name:  colon_cancer


#### Other usefully function of query object

1. add_columns(): It adds one or more column expressions to the list of result columns to be returned.

2. add_entity(): It adds a mapped entity to the list of result columns to be returned.

3. count(): It returns a count of rows this Query would return.

4. delete(): It performs a bulk delete query. Deletes rows matched by this query from the database.

5. distinct(): It applies a DISTINCT clause to the query and return the newly resulting Query.

6. filter(): It applies the given filtering criterion to a copy of this Query, using SQL expressions.
7. first(): It returns the first result of this Query or None if the result doesn’t contain any row.
8. get(): It returns an instance based on the given primary key identifier providing direct access to the identity map of the owning Session.
9. group_by() : It applies one or more GROUP BY criterion to the query and return the newly resulting Query
10. join(): It creates a SQL JOIN against this Query object’s criterion and apply generatively, returning the newly resulting Query.
11. one(): It returns exactly one result or raise an exception.

12. order_by(): It applies one or more ORDER BY criterion to the query and returns the newly resulting Query.

13. update(): It performs a bulk update query and updates rows matched by this query in the database.

In [17]:
# get the cohort with primary key=0
row= session.query(Cohort).get(0)
print("Id: ", row.id, ", Name: ", row.cname)

Id:  0 , Name:  breast_cancer


In [18]:
# get the row number of cohort table
count=session.query(Cohort).count()
print(f"Cohort table has {count} rows")

2022-12-12 14:57:47,593 INFO sqlalchemy.engine.Engine SELECT count(*) AS count_1 
FROM (SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort) AS anon_1
2022-12-12 14:57:47,602 INFO sqlalchemy.engine.Engine [generated in 0.00842s] ()
Cohort table has 2 rows


### 3.2.3 Updating values

To update the value, you have two options:

  - modify the attributes of the mapping object : Single row update
  - Use Update(): bulk updates

In [19]:
# Get the first row of cohort table
first=session.query(Cohort).first()

# before update value
print("Id: ", first.id, ", Name: ", first.cname)

2022-12-12 15:19:53,249 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort
 LIMIT ? OFFSET ?
2022-12-12 15:19:53,255 INFO sqlalchemy.engine.Engine [generated in 0.00626s] (1, 0)
Id:  0 , Name:  breast_cancer


In [20]:
# Now we want to update the value cname
first.cname="breast_cancer_2022"

# after the update
print("Id: ", first.id, ", Name: ", first.cname)

Id:  0 , Name:  breast_cancer_2022


In [21]:
# You can notice the value has been updated. What if you want to undo the changes. You can call rollback() method, which will restore the session to the state of last commit.

# note all change after the last commit will be deleted.
session.rollback()

# after the rollback
print("Id: ", first.id, ", Name: ", first.cname)

2022-12-12 15:24:49,660 INFO sqlalchemy.engine.Engine ROLLBACK
2022-12-12 15:24:49,669 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-12 15:24:49,671 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id = ?
2022-12-12 15:24:49,681 INFO sqlalchemy.engine.Engine [generated in 0.00992s] (0,)
Id:  0 , Name:  breast_cancer


In [30]:
from sqlalchemy import update

# get all rows of Cohort
session.query(Cohort).filter(Cohort.id!=0)


<sqlalchemy.orm.query.Query at 0x7fc15e2ea850>

The update() method requires two parameters as follows:

- A dictionary of key-values with key being the attribute to be updated, and value being the new contents of attribute.

- synchronize_session attribute mentioning the strategy to update attributes in the session. Valid values are false: for not synchronizing the session, fetch: performs a select query before the update to find objects that are matched by the update query; and evaluate: evaluate criteria on objects in the session.

In [31]:
# add a postfix value

update({Cohort.cname:Cohort.cname+"_cohort"},sychronize_session=False)

ArgumentError: subject table for an INSERT, UPDATE or DELETE expected, got {<sqlalchemy.orm.attributes.InstrumentedAttribute object at 0x7fc177075ea0>: <sqlalchemy.sql.elements.BinaryExpression object at 0x7fc15e27f910>}.

### 3.2.4 Applying filter

The general form is

```text
session.query(map_class_name).filter(bool_condition)
```

The bool_condition can be combined with `and_(cond1,cond2)`, `or_(cond1,cond2)`


In [32]:
result = session.query(Cohort).filter(Cohort.id>=0)
for row in result:
    print("Id: ", row.id, "Name: ", row.cname)

2022-12-12 15:50:30,677 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-12 15:50:30,687 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id >= ?
2022-12-12 15:50:30,701 INFO sqlalchemy.engine.Engine [generated in 0.01361s] (0,)
Id:  0 , Name:  breast_cancer
Id:  1 , Name:  colon_cancer
Id:  2 , Name:  HIS
Id:  3 , Name:  Hyper_tension


In [33]:
result = session.query(Cohort).filter(Cohort.id==3)
for row in result:
    print("Id: ", row.id, "Name: ", row.cname)

2022-12-12 15:51:31,607 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id = ?
2022-12-12 15:51:31,608 INFO sqlalchemy.engine.Engine [generated in 0.00181s] (3,)
Id:  3 Name:  Hyper_tension


In [35]:
# like operator on string column
result = session.query(Cohort).filter(Cohort.cname.like("%_cancer"))
for row in result:
    print("Id: ", row.id, "Name: ", row.cname)

2022-12-12 15:53:32,501 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.cname LIKE ?
2022-12-12 15:53:32,504 INFO sqlalchemy.engine.Engine [cached since 34.35s ago] ('%_cancer',)
Id:  0 Name:  breast_cancer
Id:  1 Name:  colon_cancer


In [36]:
# use IN operator to match values in a list
result = session.query(Cohort).filter(Cohort.id.in_([1,2]))
for row in result:
    print("Id: ", row.id, "Name: ", row.cname)

2022-12-12 15:55:05,411 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id IN (?, ?)
2022-12-12 15:55:05,412 INFO sqlalchemy.engine.Engine [generated in 0.00134s] (1, 2)
Id:  1 Name:  colon_cancer
Id:  2 Name:  HIS


In [38]:
from sqlalchemy import and_

# multiple condition
result = session.query(Cohort).filter(and_(Cohort.id!=0,Cohort.cname.like("%_cancer")))
for row in result:
    print("Id: ", row.id, "Name: ", row.cname)

2022-12-12 16:05:56,431 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id != ? AND cohort.cname LIKE ?
2022-12-12 16:05:56,441 INFO sqlalchemy.engine.Engine [generated in 0.00991s] (0, '%_cancer')
Id:  1 Name:  colon_cancer


### 3.2.5 Working with joins

The JOIN operation is easily achieved using the **Query.join()** method

In [40]:
result=session.query(Cohort).join(Dataset).all()
for row in result:
    print(row)
    print("Id: ", row.id, "Name: ", row.cname)

2022-12-12 16:12:21,327 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort JOIN dataset ON cohort.id = dataset.cohort_id
2022-12-12 16:12:21,336 INFO sqlalchemy.engine.Engine [cached since 154.9s ago] ()
<__main__.Cohort object at 0x7fc161f91190>
Id:  0 Name:  breast_cancer


In the above code, we just