# 3. SQLAlchemy orm

The main objective of the Object Relational Mapper API of SQLAlchemy is to facilitate associating user-defined Python classes with database tables, and objects of those classes with rows in their corresponding tables. Changes in states of objects and rows are synchronously matched with each other. SQLAlchemy enables expressing database queries in terms of user defined classes and their defined relationships.

## 3.1 Declare mapping

In case of ORM, the configuration process starts by
- describing the database tables
- defining classes which will be mapped to those tables.

In SQLAlchemy, these two tasks are performed together. This is done by using Declarative system; the classes created include directives to describe the actual database table they are mapped to.

In [1]:
from sqlalchemy import create_engine
base_path="../../../data/orm_test.db"
db_url=f"sqlite:///{base_path}"
# echo(default is false) when set to True will generate the activity log
# Below command will create the sqlite db, if not existed
# create_engine() will return an engine object.
# The Engine establishes a real DBAPI connection to the database when
# a method like Engine.execute() or Engine.connect() is called.
engine = create_engine(db_url, echo = True)

Below code create a `base class`, which stores a catalog of classes and mapped tables in the Declarative system. This is called as the declarative base class. There will be usually just one instance of this base in a commonly imported module. The declarative_base() function is used to create base class. This function is defined in sqlalchemy.ext.declarative module.

In [2]:
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()

A table object mapper class in Declarative must have a **__tablename__** attribute, and **at least one Column** which is part of a primary key. Declarative replaces all the Column objects with special Python accessors known as `descriptors`. In below example, we have two types of descriptors:
- column
- relationship

All the descriptors will be stored in **Base.metadata**

In [3]:
from sqlalchemy.orm import backref, relationship
from sqlalchemy import Column, Integer,String, SmallInteger, Text, DateTime, ForeignKey

# One-to-many relation
# Having a ForeignKey defines the existence of the relationship between Cohort and
# Dataset.
# Below code defines a parent-child collection. The datasets attribute being plural
# (which is not a requirement, just a convention) is an indication that it’s a collection.
# The first parameter is the class name Dataset (which is not the table name dataset), is the
# class to which the datasets attribute is related. The relationship informs SQLAlchemy that
# there’s a relationship between the **Cohort and Dataset classes**. SQLAlchemy will find the
# relationship in the Dataset class definition (line 3 of Dataset class)
# The backref parameter creates an author attribute for each Book instance. This attribute refers
# to the parent Author that the Book instance is related to.

class Cohort(Base):
    __tablename__='cohort'

    id=Column(Integer,primary_key=True)
    cname=Column(String)
    datasets=relationship("Dataset", backref=backref("cohort"),cascade="delete, merge, save-update")

In [4]:
class Dataset(Base):
    __tablename__="dataset"

    id=Column(Integer,primary_key=True)
    cohort_id=Column(Integer, ForeignKey("cohort.id"))
    year= Column(Integer)
    name = Column(String)
    location = Column(String)
    status = Column(SmallInteger)
    validation_tasks=relationship("ValidationTask",backref=backref("dataset"))

In [5]:
class Descriptor(Base):
    __tablename__="descriptor"

    id=Column(Integer,primary_key=True)
    dataset_id=Column(Integer, ForeignKey("dataset.id"))
    name = Column(String)
    location = Column(String)

In [6]:
class ValidationRule(Base):
    __tablename__="validation_rule"

    id=Column(Integer,primary_key=True)
    name = Column(String)
    description=Column(Text)
    args= Column(String)
    kwargs= Column(String)
    validation_tasks=relationship("ValidationTask",backref=backref("validation_rule"))

In [7]:
class ValidationTask(Base):
    __tablename__="validation_task"

    id=Column(Integer,primary_key=True)
    start_date=Column(DateTime)
    end_date=Column(DateTime)
    dataset_id=Column(Integer, ForeignKey("dataset.id"))
    validation_rule_id=Column(Integer,ForeignKey("validation_rule.id"))
    task_status = Column(SmallInteger)
    output = Column(Text)


In [8]:
Base.metadata.create_all(engine)

2023-03-16 09:03:46,848 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-03-16 09:03:46,852 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("cohort")
2023-03-16 09:03:46,853 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-03-16 09:03:46,864 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("dataset")
2023-03-16 09:03:46,865 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-03-16 09:03:46,867 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("descriptor")
2023-03-16 09:03:46,867 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-03-16 09:03:46,868 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("validation_rule")
2023-03-16 09:03:46,869 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-03-16 09:03:46,870 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("validation_task")
2023-03-16 09:03:46,870 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-03-16 09:03:46,887 INFO sqlalchemy.engine.Engine COMMIT


## 3.2 Session

In order to interact with the database, we need to obtain its handle. A **session object** is the handle to database. Session class is defined using **sessionmaker()**  a configurable session factory method which is bound to the `engine object`.

In [9]:
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind = engine)

In [10]:
session = Session()

### 3.2.1 Insert rows by using session

Session class provides two function to add rows into tables
- .add(): add one entry a time
- .add_all(): add a list of entry


In [11]:
c1=Cohort(id=0,cname="breast_cancer")
session.add(c1)

In [12]:
# Note that the above transaction is pending until the same is flushed using commit() method.
session.commit()

2023-03-16 09:03:55,814 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-03-16 09:03:55,816 INFO sqlalchemy.engine.Engine INSERT INTO cohort (id, cname) VALUES (?, ?)
2023-03-16 09:03:55,819 INFO sqlalchemy.engine.Engine [generated in 0.00243s] (0, 'breast_cancer')
2023-03-16 09:03:55,824 INFO sqlalchemy.engine.Engine ROLLBACK


IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: cohort.id
[SQL: INSERT INTO cohort (id, cname) VALUES (?, ?)]
[parameters: (0, 'breast_cancer')]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

In [13]:
c2=Cohort(id=1,cname="colon_cancer")
d1=Dataset(id=1,cohort_id=0,year=2019,name="breast_clinical_meds.csv",location="s3://minio.casd.local/data/casd_cancer",status=0)

d2=Dataset(id=2,cohort_id=0,year=2018,name="breast_cancer_death_rate.csv",location="s3://minio.casd.local/data/casd_cancer",status=0)

session.add_all([c2,d1,d2])

In [14]:
session.commit()

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (sqlite3.IntegrityError) UNIQUE constraint failed: cohort.id
[SQL: INSERT INTO cohort (id, cname) VALUES (?, ?)]
[parameters: (0, 'breast_cancer')]
(Background on this error at: https://sqlalche.me/e/14/gkpj) (Background on this error at: https://sqlalche.me/e/14/7s2a)

### 3.2.2 Other useful method

Session is the major way to interact with a db via `SQLAlchemy ORM`. It provides many methods to add, delete and update record/row of a table. Some frequently required methods of session class are listed below:

- begin(): begins a transaction on this session

- add(): places an object in the session. Its state is persisted in the database on next flush operation
- add_all(): adds a collection of objects to the session

- commit(): flushes all items and any transaction in progress

- delete(): marks a transaction as deleted

- execute(): executes a SQL expression

- expire(): marks attributes of an instance as out of date

- flush(): flushes all object changes to the database

- invalidate(): closes the session using connection invalidation

- rollback(): rolls back the current transaction in progress
- close(): Closes current session by clearing all items and ending any transaction in progress

### 3.2.2 Querying tables

All `SELECT statements generated by SQLAlchemy ORM are constructed by Query object`. It provides a generative interface, hence successive calls return a new Query object, a copy of the former with additional criteria and options associated with it.

Query objects are initially generated using the query() method of the Session as follows

```python
q = session.query(mapped class)

# below statement is also equivalent to the above given statement
q = Query(mappedClass, session)
```

In [12]:
result = session.query(Cohort).all()

for row in result:
    print("Id: ", row.id, "| Name: ", row.cname)

2023-01-02 09:03:12,168 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort
2023-01-02 09:03:12,171 INFO sqlalchemy.engine.Engine [cached since 15.59s ago] ()
Id:  0 | Name:  breast_cancer
Id:  1 | Name:  colon_cancer
Id:  2 | Name:  HIS
Id:  3 | Name:  Hyper_tension


#### Other usefully function of query object

1. add_columns(): It adds one or more column expressions to the list of result columns to be returned.

2. add_entity(): It adds a mapped entity to the list of result columns to be returned.

3. count(): It returns a count of rows this Query would return.

4. delete(): It performs a bulk delete query. Deletes rows matched by this query from the database.

5. distinct(): It applies a DISTINCT clause to the query and return the newly resulting Query.

6. filter(): It applies the given filtering criterion to a copy of this Query, using SQL expressions.
7. first(): It returns the first result of this Query or None if the result doesn’t contain any row.
8. get(): It returns an instance based on the given primary key identifier providing direct access to the identity map of the owning Session.
9. group_by() : It applies one or more GROUP BY criterion to the query and return the newly resulting Query
10. join(): It creates a SQL JOIN against this Query object’s criterion and apply generatively, returning the newly resulting Query.
11. one(): It returns exactly one result or raise an exception.

12. order_by(): It applies one or more ORDER BY criterion to the query and returns the newly resulting Query.

13. update(): It performs a bulk update query and updates rows matched by this query in the database.

In [17]:
# get the cohort with primary key=0
row= session.query(Cohort).get(0)
print("Id: ", row.id, ", Name: ", row.cname)

Id:  0 , Name:  breast_cancer


In [18]:
# get the row number of cohort table
count=session.query(Cohort).count()
print(f"Cohort table has {count} rows")

2022-12-12 14:57:47,593 INFO sqlalchemy.engine.Engine SELECT count(*) AS count_1 
FROM (SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort) AS anon_1
2022-12-12 14:57:47,602 INFO sqlalchemy.engine.Engine [generated in 0.00842s] ()
Cohort table has 2 rows


### 3.2.3 Updating values

To update the value, you have two options:

  - modify the attributes of the mapping object : Single row update
  - Use Update(): bulk updates

In [19]:
# Get the first row of cohort table
first=session.query(Cohort).first()

# before update value
print("Id: ", first.id, ", Name: ", first.cname)

2022-12-12 15:19:53,249 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort
 LIMIT ? OFFSET ?
2022-12-12 15:19:53,255 INFO sqlalchemy.engine.Engine [generated in 0.00626s] (1, 0)
Id:  0 , Name:  breast_cancer


In [20]:
# Now we want to update the value cname
first.cname="breast_cancer_2022"

# after the update
print("Id: ", first.id, ", Name: ", first.cname)

Id:  0 , Name:  breast_cancer_2022


In [21]:
# You can notice the value has been updated. What if you want to undo the changes. You can call rollback() method, which will restore the session to the state of last commit.

# note all change after the last commit will be deleted.
session.rollback()

# after the rollback
print("Id: ", first.id, ", Name: ", first.cname)

2022-12-12 15:24:49,660 INFO sqlalchemy.engine.Engine ROLLBACK
2022-12-12 15:24:49,669 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-12 15:24:49,671 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id = ?
2022-12-12 15:24:49,681 INFO sqlalchemy.engine.Engine [generated in 0.00992s] (0,)
Id:  0 , Name:  breast_cancer


In [30]:
from sqlalchemy import update

# get all rows of Cohort
session.query(Cohort).filter(Cohort.id!=0)


<sqlalchemy.orm.query.Query at 0x7fc15e2ea850>

The update() method requires two parameters as follows:

- A dictionary of key-values with key being the attribute to be updated, and value being the new contents of attribute.

- synchronize_session attribute mentioning the strategy to update attributes in the session. Valid values are false: for not synchronizing the session, fetch: performs a select query before the update to find objects that are matched by the update query; and evaluate: evaluate criteria on objects in the session.

In [31]:
# add a postfix value

update({Cohort.cname:Cohort.cname+"_cohort"},sychronize_session=False)

ArgumentError: subject table for an INSERT, UPDATE or DELETE expected, got {<sqlalchemy.orm.attributes.InstrumentedAttribute object at 0x7fc177075ea0>: <sqlalchemy.sql.elements.BinaryExpression object at 0x7fc15e27f910>}.

### 3.2.4 Applying filter

The general form is

```text
session.query(map_class_name).filter(bool_condition)
```

The bool_condition can be combined with `and_(cond1,cond2)`, `or_(cond1,cond2)`


In [32]:
result = session.query(Cohort).filter(Cohort.id>=0)
for row in result:
    print("Id: ", row.id, "Name: ", row.cname)

2022-12-12 15:50:30,677 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-12 15:50:30,687 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id >= ?
2022-12-12 15:50:30,701 INFO sqlalchemy.engine.Engine [generated in 0.01361s] (0,)
Id:  0 , Name:  breast_cancer
Id:  1 , Name:  colon_cancer
Id:  2 , Name:  HIS
Id:  3 , Name:  Hyper_tension


In [33]:
result = session.query(Cohort).filter(Cohort.id==3)
for row in result:
    print("Id: ", row.id, "Name: ", row.cname)

2022-12-12 15:51:31,607 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id = ?
2022-12-12 15:51:31,608 INFO sqlalchemy.engine.Engine [generated in 0.00181s] (3,)
Id:  3 Name:  Hyper_tension


In [35]:
# like operator on string column
result = session.query(Cohort).filter(Cohort.cname.like("%_cancer"))
for row in result:
    print("Id: ", row.id, "Name: ", row.cname)

2022-12-12 15:53:32,501 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.cname LIKE ?
2022-12-12 15:53:32,504 INFO sqlalchemy.engine.Engine [cached since 34.35s ago] ('%_cancer',)
Id:  0 Name:  breast_cancer
Id:  1 Name:  colon_cancer


In [36]:
# use IN operator to match values in a list
result = session.query(Cohort).filter(Cohort.id.in_([1,2]))
for row in result:
    print("Id: ", row.id, "Name: ", row.cname)

2022-12-12 15:55:05,411 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id IN (?, ?)
2022-12-12 15:55:05,412 INFO sqlalchemy.engine.Engine [generated in 0.00134s] (1, 2)
Id:  1 Name:  colon_cancer
Id:  2 Name:  HIS


In [38]:
from sqlalchemy import and_

# multiple condition
result = session.query(Cohort).filter(and_(Cohort.id!=0,Cohort.cname.like("%_cancer")))
for row in result:
    print("Id: ", row.id, "Name: ", row.cname)

2022-12-12 16:05:56,431 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id != ? AND cohort.cname LIKE ?
2022-12-12 16:05:56,441 INFO sqlalchemy.engine.Engine [generated in 0.00991s] (0, '%_cancer')
Id:  1 Name:  colon_cancer


### 3.2.5 Control the returned result

The returned result of a filter may contain thousands of rows. To load them all in memory may cause performance issues. We can control how the session return these value.

- all(): return all matching result as a list
- first(): It applies a limit of one on the query and returns the first result as a scalar. The bound parameters for LIMIT is 1 and for OFFSET is 0
- one(): It fully fetches all rows, and if there is not exactly one object identity or composite row present in the result, it raises an error. It is useful for systems that expect to handle “no items found” versus “multiple items found” differently.
- scalar(): It invokes the one() method, and upon success returns the first column of the row as follows

In [14]:
# all example
result=session.query(Cohort).all()
for row in result:
    print("Id: ", row.id, "| Name: ", row.cname)

2023-01-02 09:04:34,578 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort
2023-01-02 09:04:34,581 INFO sqlalchemy.engine.Engine [cached since 98s ago] ()
Id:  0 | Name:  breast_cancer
Id:  1 | Name:  colon_cancer
Id:  2 | Name:  HIS
Id:  3 | Name:  Hyper_tension


In [19]:
# first example
result=session.query(Cohort).first()
# You can notice the type of result is Cohort
print(type(result))
# so you can access its attribute
print("Id: ", result.id, "| Name: ", result.cname)

2023-01-02 09:13:00,221 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort
 LIMIT ? OFFSET ?
2023-01-02 09:13:00,225 INFO sqlalchemy.engine.Engine [cached since 234.7s ago] (1, 0)
<class '__main__.Cohort'>
Id:  0 | Name:  breast_cancer


In [20]:
# one example
# if multi rows are found, it throws MultipleResultsFound error
result=session.query(Cohort).filter(Cohort.id>0).one()


2023-01-02 09:15:33,087 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id > ?
2023-01-02 09:15:33,090 INFO sqlalchemy.engine.Engine [generated in 0.00249s] (0,)


MultipleResultsFound: Multiple rows were found when exactly one was required

In [21]:
# if no row found, it throws NoResultFound error
result=session.query(Cohort).filter(Cohort.id<0).one()

2023-01-02 09:16:59,906 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id < ?
2023-01-02 09:16:59,907 INFO sqlalchemy.engine.Engine [generated in 0.00100s] (0,)


NoResultFound: No row was found when one was required

In [22]:
# scalar example
result=session.query(Cohort).filter(Cohort.id>0).scalar()

2023-01-02 09:18:59,826 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id > ?
2023-01-02 09:18:59,828 INFO sqlalchemy.engine.Engine [cached since 206.7s ago] (0,)


MultipleResultsFound: Multiple rows were found when exactly one was required

In [27]:
# there is a bug, scalar does not return no result found error, but return a none object
result=session.query(Cohort).filter(Cohort.id<0).scalar()
print("Id: ", result.id, "| Name: ", result.cname)

2023-01-02 09:20:29,723 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort 
WHERE cohort.id < ?
2023-01-02 09:20:29,725 INFO sqlalchemy.engine.Engine [cached since 209.8s ago] (0,)


AttributeError: 'NoneType' object has no attribute 'id'

## 3.3 Building table relationship

In a relational database, table has relations with each other, we need to express these relation in the ORM mapping class.

We have four basic relationship patterns:

- **One To Many**: A single record from one table can be linked to zero or more rows in another table. For example, The customer table stores customer information, where customer.id is the primary key. The invoice table holds the invoices for customer, where invoice.id is the primary key and customer.id is the foreign key.

- **Many To One**: It's the reverse of one to many.

- **One To One**: In One-to-One relationship, one record of the first table will be linked to zero or one record of another table. For example, each employee in the `Employee` table will have a corresponding row in `EmployeeDetails` table that stores the current passport details for that particular employee. So, each employee will have zero or one record in the EmployeeDetails table. This is called zero or one-to-one relationship.

- **Many To Many**: It is established by adding an association table related to two classes by defining attributes with their foreign keys. It is indicated by the secondary argument to relationship(). Usually, the Table uses the MetaData object associated with the declarative base class, so that the ForeignKey directives can locate the remote tables with which to link. The relationship.back_populates parameter for each relationship() establishes a bidirectional relationship. Both sides of the relationship contain a collection.

### 3.3.1 One To Many
Below is an example of one to many, A customer (one class) may have many Invoice (many class). You can notice all the relationship declaration is located in the `many class (Invoice)`. The `relationship.back_populates` parameter is used to establish a bidirectional relationship in one-to-many, where the “reverse” side is a many to one.

In [23]:
class Customer(Base):
   __tablename__ = 'customers'
   __table_args__ = {'extend_existing': True}
   id = Column(Integer, primary_key = True)
   name = Column(String)
   address = Column(String)
   email = Column(String)
   invoices = relationship("Invoice", back_populates = "customer")

class Invoice(Base):
   __tablename__ = 'invoices'
   __table_args__ = {'extend_existing': True}
   id = Column(Integer, primary_key = True)
   custid = Column(Integer, ForeignKey('customers.id'))
   invno = Column(Integer)
   amount = Column(Integer)
   customer = relationship("Customer", back_populates = "invoices")

# this line is very important

# create the table in db by using the base metadata
# Base.metadata.create_all(engine)


In [25]:
# there is a bug with the notebook. so below code does not work
# the full code example can be found in customer_invoice.py
c1 = Customer(name = "Gopal Krishna", address = "Bank Street Hydarebad", email = "gk@gmail.com")
c1.invoices = [Invoice(invno = 10, amount = 15000), Invoice(invno = 14, amount = 3850)]
session.add(c1)
session.commit()


InvalidRequestError: One or more mappers failed to initialize - can't proceed with initialization of other mappers. Triggering mapper: 'mapped class Invoice->invoices'. Original exception was: Mapper 'mapped class Customer->customers' has no property 'invoices'

### 3.3.2 One To One

### 3.3.3 Many To Many

`Many to Many relationship` between two tables is achieved by adding an association table such that it has two foreign keys - one from each table’s primary key. Moreover, classes mapping to the two tables have an attribute with a collection of objects of other association tables assigned as secondary attribute of relationship() function.

For example, we assume that `an employee is a part of more than one department`, and `a department has more than one employee`. This constitutes many-to-many relationship.

Below mapping class will create three tables in the db, the corresponding sql scripts is shown below.

```sql
CREATE TABLE department (
   id INTEGER NOT NULL,
   name VARCHAR,
   PRIMARY KEY (id)
)

CREATE TABLE employee (
   id INTEGER NOT NULL,
   name VARCHAR,
   PRIMARY KEY (id)
)

CREATE TABLE dep_emp (
   department_id INTEGER NOT NULL,
   employee_id INTEGER NOT NULL,
   PRIMARY KEY (department_id, employee_id),
   FOREIGN KEY(department_id) REFERENCES department (id),
   FOREIGN KEY(employee_id) REFERENCES employee (id)
)

```


In [45]:
class Department(Base):
   __tablename__ = 'department'
   id = Column(Integer, primary_key = True)
   name = Column(String)
   employees = relationship('Employee', secondary = 'dep_emp')

class Employee(Base):
   __tablename__ = 'employee'
   id = Column(Integer, primary_key = True)
   name = Column(String)
   departments = relationship(Department,secondary='dep_emp')

class DepEmp(Base):
   __tablename__ = 'dep_emp'
   department_id = Column(
       Integer,
       ForeignKey('department.id'),
       primary_key = True)

   employee_id = Column(
       Integer,
       ForeignKey('employee.id'),
       primary_key = True)

# create table in the db
# Base.metadata.create_all(engine)

  class Department(Base):


InvalidRequestError: Table 'department' is already defined for this MetaData instance.  Specify 'extend_existing=True' to redefine options and columns on an existing Table object.

In [41]:
# creating new rows
d1 = Department(name = "Accounts")
d2 = Department(name = "Sales")
d3 = Department(name = "Marketing")

e1 = Employee(name = "John")
e2 = Employee(name = "Tony")
e3 = Employee(name = "Graham")

Each table has a collection attribute having append() method. We can add Employee objects to Employees collection of Department object. Similarly, we can add Department objects to departments collection attribute of Employee objects.

> Even thought the Mapping class (e.g. Employee) does not have attribute departments defined inside it. But the link class DepEmp creates the relation between Employee and Department.



In [42]:
e1.departments.append(d1)
e2.departments.append(d3)
d1.employees.append(e3)
d2.employees.append(e2)
d3.employees.append(e1)
e3.departments.append(d2)

In [43]:
session.add(e1)
session.add(e2)
session.add(d1)
session.add(d2)
session.add(d3)
session.add(e3)
session.commit()

2023-01-02 13:31:28,595 INFO sqlalchemy.engine.Engine INSERT INTO department (name) VALUES (?)
2023-01-02 13:31:28,605 INFO sqlalchemy.engine.Engine [generated in 0.01069s] ('Accounts',)
2023-01-02 13:31:28,610 INFO sqlalchemy.engine.Engine INSERT INTO department (name) VALUES (?)
2023-01-02 13:31:28,611 INFO sqlalchemy.engine.Engine [cached since 0.01608s ago] ('Sales',)
2023-01-02 13:31:28,613 INFO sqlalchemy.engine.Engine INSERT INTO department (name) VALUES (?)
2023-01-02 13:31:28,614 INFO sqlalchemy.engine.Engine [cached since 0.01887s ago] ('Marketing',)
2023-01-02 13:31:28,615 INFO sqlalchemy.engine.Engine INSERT INTO employee (name) VALUES (?)
2023-01-02 13:31:28,616 INFO sqlalchemy.engine.Engine [generated in 0.00071s] ('John',)
2023-01-02 13:31:28,616 INFO sqlalchemy.engine.Engine INSERT INTO employee (name) VALUES (?)
2023-01-02 13:31:28,617 INFO sqlalchemy.engine.Engine [cached since 0.0022s ago] ('Graham',)
2023-01-02 13:31:28,618 INFO sqlalchemy.engine.Engine INSERT INTO 

In [46]:
for x in session.query( Department, Employee).filter(DepEmp.department_id == Department.id,
   DepEmp.employee_id == Employee.id).order_by(DepEmp.department_id).all():
   print ("Department: {} Name: {}".format(x.Department.name, x.Employee.name))

NameError: name 'DepEmp' is not defined

### 3.4 Working with joins

Now we have two tables that have relationship with each other. We need to use join operation if we want to combine the information of the two table together.
The JOIN operation is easily achieved using the **Query.join()** method

In [12]:
result=session.query(Cohort).join(Dataset).all()
for row in result:
    for dataset in row.datasets:
        print ("Cohort_ID: {} Cohort_Name: {} dataset_id: {} dataset_name: {}".format(row.id,row.cname, dataset.id, dataset.name))

2023-01-02 11:12:13,960 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort JOIN dataset ON cohort.id = dataset.cohort_id
2023-01-02 11:12:13,963 INFO sqlalchemy.engine.Engine [cached since 382.7s ago] ()
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 1 dataset_name: breast_clinical_meds.csv
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 2 dataset_name: breast_cancer_death_rate.csv


In the above code, we just used the join method. If you don't want to use join. Below

In [11]:
for c,d in session.query(Cohort,Dataset).filter(Cohort.id==Dataset.cohort_id).all():
     print ("Cohort_ID: {} Cohort_Name: {} dataset_id: {} dataset_name: {}".format(c.id,c.cname, d.id, d.name))

2023-01-02 11:09:02,793 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname, dataset.id AS dataset_id, dataset.cohort_id AS dataset_cohort_id, dataset.year AS dataset_year, dataset.name AS dataset_name, dataset.location AS dataset_location, dataset.status AS dataset_status 
FROM cohort, dataset 
WHERE cohort.id = dataset.cohort_id
2023-01-02 11:09:02,794 INFO sqlalchemy.engine.Engine [generated in 0.00137s] ()
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 1 dataset_name: breast_clinical_meds.csv
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 2 dataset_name: breast_cancer_death_rate.csv


### 3.4.1 Join with filter

We can also apply filter after the join operation. Note the filter operator must use the complete path (className.attributeName) below code is an example.

In [13]:
result=session.query(Cohort).join(Dataset).filter(Dataset.status==0)
for row in result:
    for dataset in row.datasets:
        print ("Cohort_ID: {} Cohort_Name: {} dataset_id: {} dataset_name: {} dataset_status: {}".format(row.id,row.cname, dataset.id, dataset.name, dataset.status))

2023-01-02 11:20:21,269 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort JOIN dataset ON cohort.id = dataset.cohort_id 
WHERE dataset.status = ?
2023-01-02 11:20:21,270 INFO sqlalchemy.engine.Engine [generated in 0.00078s] (0,)
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 1 dataset_name: breast_clinical_meds.csv dataset_status: 0
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 2 dataset_name: breast_cancer_death_rate.csv dataset_status: 0


### 3.4.2 Explicit Joins

In above example, Query.join() knows how to join between these tables because there’s only `one foreign key` between them. If there were `no foreign keys, or more foreign keys`, Query.join() works better when one of the following forms are used:
- query(Cohort).join(Dataset,Cohort.id==Dataset.cohort_id):	explicit join condition
- query(Cohort).join(Cohort.datasets):	specify relationship from left to right
- query(Cohort).join(Dataset, Cohort.datasets)	same, with explicit target
- query(Cohort).join('datasets') :	same, using a string

In [15]:
result=session.query(Cohort).join(Dataset,Cohort.id==Dataset.cohort_id).all()
for row in result:
    for dataset in row.datasets:
        print ("Cohort_ID: {} Cohort_Name: {} dataset_id: {} dataset_name: {}".format(row.id,row.cname, dataset.id, dataset.name))

2023-01-02 11:35:42,238 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort JOIN dataset ON cohort.id = dataset.cohort_id
2023-01-02 11:35:42,239 INFO sqlalchemy.engine.Engine [generated in 0.00128s] ()
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 1 dataset_name: breast_clinical_meds.csv
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 2 dataset_name: breast_cancer_death_rate.csv


In [18]:
result=session.query(Cohort).join(Cohort.datasets).all()
for row in result:
    for dataset in row.datasets:
        print ("Cohort_ID: {} Cohort_Name: {} dataset_id: {} dataset_name: {}".format(row.id,row.cname, dataset.id, dataset.name))

2023-01-02 11:38:14,511 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort JOIN dataset ON cohort.id = dataset.cohort_id
2023-01-02 11:38:14,513 INFO sqlalchemy.engine.Engine [generated in 0.00291s] ()
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 1 dataset_name: breast_clinical_meds.csv
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 2 dataset_name: breast_cancer_death_rate.csv


In [19]:
result=session.query(Cohort).join(Dataset, Cohort.datasets).all()
for row in result:
    for dataset in row.datasets:
        print ("Cohort_ID: {} Cohort_Name: {} dataset_id: {} dataset_name: {}".format(row.id,row.cname, dataset.id, dataset.name))

2023-01-02 11:39:09,354 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort JOIN dataset ON cohort.id = dataset.cohort_id
2023-01-02 11:39:09,355 INFO sqlalchemy.engine.Engine [generated in 0.00192s] ()
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 1 dataset_name: breast_clinical_meds.csv
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 2 dataset_name: breast_cancer_death_rate.csv


In [21]:
result=session.query(Cohort).join('datasets').all()
for row in result:
    for dataset in row.datasets:
        print ("Cohort_ID: {} Cohort_Name: {} dataset_id: {} dataset_name: {}".format(row.id,row.cname, dataset.id, dataset.name))

2023-01-02 11:39:59,561 INFO sqlalchemy.engine.Engine SELECT cohort.id AS cohort_id, cohort.cname AS cohort_cname 
FROM cohort JOIN dataset ON cohort.id = dataset.cohort_id
2023-01-02 11:39:59,562 INFO sqlalchemy.engine.Engine [generated in 0.00153s] ()
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 1 dataset_name: breast_clinical_meds.csv
Cohort_ID: 0 Cohort_Name: breast_cancer dataset_id: 2 dataset_name: breast_cancer_death_rate.csv


### 3.4.3 Left join

Sqlalchemy does not have explicit method for left join. It uses a method called `outerjoin()` (left outer join)

Below is an general form:

```python
# use outerjoin
session.query(table1).outerjoin(table2,table1.pri_id == table2.id)

# we can also use a join with isouter option to True
session.query(table1).join(table2, table1.pri_id == table2.id, isouter=True)
```

Below is an example of the outer join

In [24]:
from sqlalchemy.sql import func

stmt = session.query(
   Invoice.custid, func.count('*').label('invoice_count')
).group_by(Invoice.custid).subquery()

In [25]:
for u, count in session.query(Customer, stmt.c.invoice_count).outerjoin(stmt, Customer.id == stmt.c.custid).order_by(Customer.id):
   print(u.name, count)

2023-01-02 11:52:19,156 INFO sqlalchemy.engine.Engine SELECT customers.id AS customers_id, customers.name AS customers_name, customers.address AS customers_address, customers.email AS customers_email, anon_1.invoice_count AS anon_1_invoice_count 
FROM customers LEFT OUTER JOIN (SELECT invoices.custid AS custid, count(?) AS invoice_count 
FROM invoices GROUP BY invoices.custid) AS anon_1 ON customers.id = anon_1.custid ORDER BY customers.id
2023-01-02 11:52:19,157 INFO sqlalchemy.engine.Engine [generated in 0.00092s] ('*',)
Gopal Krishna 2
Toto titi 2
Govind Kala 2
Abdul Rahman 2


## 3.5 Cascade deletion

It is easy to perform delete operation on a single table. All you have to do is to delete an object of the mapped class from a session and commit the action. However, delete operation on multiple related tables is little tricky.

Let's try to delete a customer

In [26]:
result=session.query(Customer).all()
for row in result:
    print ("ID: {} Name: {} Address: {} email: {}".format(row.id,row.name, row.address, row.email))

2023-01-02 12:02:44,488 INFO sqlalchemy.engine.Engine SELECT customers.id AS customers_id, customers.name AS customers_name, customers.address AS customers_address, customers.email AS customers_email 
FROM customers
2023-01-02 12:02:44,489 INFO sqlalchemy.engine.Engine [generated in 0.00094s] ()
ID: 1 Name: Gopal Krishna Address: Bank Street Hydarebad email: gk@gmail.com
ID: 2 Name: Toto titi Address: the mother land email: titi@gmail.com
ID: 3 Name: Govind Kala Address: Gulmandi Aurangabad email: kala@gmail.com
ID: 4 Name: Abdul Rahman Address: Rohtak email: abdulr@gmail.com


In [27]:
# get customer with id=2
c=session.query(Customer).get(2)
print(c.name)


Toto titi


In [28]:
# delete this customer
session.delete(c)

In [31]:
# count the number of customer with name Toto titi, it should be 0.
session.query(Customer).filter(Customer.name == 'Toto titi').count()

2023-01-02 12:10:07,061 INFO sqlalchemy.engine.Engine SELECT count(*) AS count_1 
FROM (SELECT customers.id AS customers_id, customers.name AS customers_name, customers.address AS customers_address, customers.email AS customers_email 
FROM customers 
WHERE customers.name = ?) AS anon_1
2023-01-02 12:10:07,067 INFO sqlalchemy.engine.Engine [cached since 193.4s ago] ('Toto titi',)


0

In [38]:
result=session.query(Invoice).filter(Invoice.custid>0)
for row in result:
    print ("ID: {} Customer_id: {} invoice_number: {} amount: {}".format(row.id,row.custid, row.invno, row.amount))

2023-01-02 12:15:24,912 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-01-02 12:15:24,913 INFO sqlalchemy.engine.Engine SELECT invoices.id AS invoices_id, invoices.custid AS invoices_custid, invoices.invno AS invoices_invno, invoices.amount AS invoices_amount 
FROM invoices 
WHERE invoices.custid > ?
2023-01-02 12:15:24,913 INFO sqlalchemy.engine.Engine [cached since 142.9s ago] (0,)
ID: 1 Customer_id: 1 invoice_number: 10 amount: 15000
ID: 2 Customer_id: 1 invoice_number: 14 amount: 3850
ID: 3 Customer_id: 2 invoice_number: 3 amount: 10000
ID: 4 Customer_id: 2 invoice_number: 4 amount: 5000
ID: 5 Customer_id: 3 invoice_number: 7 amount: 12000
ID: 6 Customer_id: 3 invoice_number: 8 amount: 18500
ID: 7 Customer_id: 4 invoice_number: 9 amount: 15000
ID: 8 Customer_id: 4 invoice_number: 11 amount: 6000


The invoice for customer Toto is still in the DB. This is because SQLAlchemy doesn’t assume the deletion of cascade; we have to give a command to delete it.

To change the behavior, we configure cascade options on the Customer.invoices relationship. Below code is the new Customer class definition


```python
class Customer(Base):
   __tablename__ = 'customers'

   id = Column(Integer, primary_key = True)
   name = Column(String)
   address = Column(String)
   email = Column(String)
   invoices = relationship(
      "Invoice",
      order_by = Invoice.id,
      back_populates = "customer",
      cascade = "all,
      delete, delete-orphan"
   )
```

The cascade attribute in relationship function is a `comma-separated list of cascade rules` which determines how Session operations should be “cascaded” from parent to child. By default, it is False, which means that it is "save-update, merge".

The available cascades are as follows −

- save-update
- merge
- expunge
- delete
- delete-orphan
- refresh-expire

Often used option is "all, delete-orphan" to indicate that related objects should follow along with the parent object in all cases, and be deleted when de-associated.