# SQLAlchemy cascade delete

Since there is overlapping **cascade delete** functionality supported by SQLAlchemy -- with the ORM handling some deletes and the database itself handling others -- it can be hard to know the right way to set it up. Here are some examples to help clarify how it all works. For more information, you can visit the official [doc](https://docs.sqlalchemy.org/en/13/orm/cascades.html) of the sqlalchemy.

In [1]:
from sqlalchemy import create_engine, Column, Integer, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
from sqlalchemy.orm.session import sessionmaker

base_path="../../../data/cascade_delete.db"
db_url=f"sqlite:///{base_path}"
# echo(default is false) when set to True will generate the activity log
# Below command will create the sqlite db, if not existed
# create_engine() will return an engine object.
# The Engine establishes a real DBAPI connection to the database when
# a method like Engine.execute() or Engine.connect() is called.
engine = create_engine(db_url, echo = True)

In [2]:
Base = declarative_base()

In [3]:
def createTables(engine):
    # create table with the metadata stored in model.Base
    Base.metadata.create_all(engine)


def dropAllTables(engine):
    Base.metadata.drop_all(engine)

## 1 A simple example

In this simple example, we have two tables: A table project which can contain one or more `Task`. You can check below two mapping class.

The task table has a foreignkey project_id which links the two tables with a one-to-many relation.

> We don't configure cascade delete for now

In [4]:
class Project(Base):
    __tablename__ = "project"
    id = Column(Integer, primary_key=True)
    tasks = relationship("Task", back_populates="project")


class Task(Base):
    __tablename__ = "task"
    id = Column(Integer, primary_key=True)
    # the nullable=False means, all task must have a project id.
    project_id = Column(Integer, ForeignKey("project.id"),nullable=False)
    project = relationship("Project", back_populates="tasks")


In [5]:
dropAllTables(engine)
createTables(engine)

2023-03-16 15:26:11,744 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-03-16 15:26:11,745 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("project")
2023-03-16 15:26:11,746 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-03-16 15:26:11,747 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("task")
2023-03-16 15:26:11,754 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-03-16 15:26:11,756 INFO sqlalchemy.engine.Engine 
DROP TABLE task
2023-03-16 15:26:11,756 INFO sqlalchemy.engine.Engine [no key 0.00076s] ()
2023-03-16 15:26:11,766 INFO sqlalchemy.engine.Engine 
DROP TABLE project
2023-03-16 15:26:11,767 INFO sqlalchemy.engine.Engine [no key 0.00066s] ()
2023-03-16 15:26:11,773 INFO sqlalchemy.engine.Engine COMMIT
2023-03-16 15:26:11,774 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-03-16 15:26:11,779 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("project")
2023-03-16 15:26:11,780 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-03-16 15:26:11,782 INFO sqlalchemy.

Run below code to insert some entities in the database.

In [6]:
Session = sessionmaker(bind=engine)
session = Session()

task1 = Task(id=1,project_id=0)
task2 = Task(id=2,project_id=0)

project = Project(id=0)

session.add(project)
session.add(task1)
session.add(task2)
session.commit()


2023-03-16 15:26:18,114 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-03-16 15:26:18,117 INFO sqlalchemy.engine.Engine INSERT INTO project (id) VALUES (?)
2023-03-16 15:26:18,117 INFO sqlalchemy.engine.Engine [generated in 0.00083s] (0,)
2023-03-16 15:26:18,119 INFO sqlalchemy.engine.Engine INSERT INTO task (id, project_id) VALUES (?, ?)
2023-03-16 15:26:18,119 INFO sqlalchemy.engine.Engine [generated in 0.00061s] ((1, 0), (2, 0))
2023-03-16 15:26:18,120 INFO sqlalchemy.engine.Engine COMMIT


Now you can check the content of your database, you should see 1 project with id =0 and two tasks which are attached to the task. Now let's try to delete the project, if you set
nullable=True (default value), then the project is deleted. And the project_id of the tasks will be set to `null`.
```python
project_id = Column(Integer, ForeignKey("project.id"),nullable=True)
```

If you set `nullable=False` for the project_id, when you run below command, you will receive an IntegrityError. To avoid this error and delete the project, you need to delete all the tasks of this project first. This can cost you a lot if your have many tasks.


In [7]:
session.delete(project)
session.commit()

2023-03-16 15:26:28,993 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-03-16 15:26:28,997 INFO sqlalchemy.engine.Engine SELECT project.id AS project_id 
FROM project 
WHERE project.id = ?
2023-03-16 15:26:28,998 INFO sqlalchemy.engine.Engine [generated in 0.00106s] (0,)
2023-03-16 15:26:29,001 INFO sqlalchemy.engine.Engine SELECT task.id AS task_id, task.project_id AS task_project_id 
FROM task 
WHERE ? = task.project_id
2023-03-16 15:26:29,001 INFO sqlalchemy.engine.Engine [generated in 0.00074s] (0,)
2023-03-16 15:26:29,005 INFO sqlalchemy.engine.Engine UPDATE task SET project_id=? WHERE task.id = ?
2023-03-16 15:26:29,006 INFO sqlalchemy.engine.Engine [generated in 0.00089s] ((None, 1), (None, 2))
2023-03-16 15:26:29,006 INFO sqlalchemy.engine.Engine ROLLBACK


IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed: task.project_id
[SQL: UPDATE task SET project_id=? WHERE task.id = ?]
[parameters: ((None, 1), (None, 2))]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

## 2. ORM cascades

To avoid the above problem, we can use the orm cascade delete by using the option `cascade` in the relationship column of the project table.

Below is the new version of the Project table mapping class

```python
class Project(Base):
    __tablename__ = "project"
    id = Column(Integer, primary_key=True)
    tasks = relationship(
        "Task", back_populates="project", cascade="delete, merge, save-update"
    )
```

**Now restart your jupyter kernel and run below code**. This time, you should see when you delete a project, all tasks of this project will be deleted

In [1]:
from sqlalchemy import create_engine, Column, Integer, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
from sqlalchemy.orm.session import sessionmaker

base_path="../../../data/cascade_delete.db"
db_url=f"sqlite:///{base_path}"
# echo(default is false) when set to True will generate the activity log
# Below command will create the sqlite db, if not existed
# create_engine() will return an engine object.
# The Engine establishes a real DBAPI connection to the database when
# a method like Engine.execute() or Engine.connect() is called.
engine = create_engine(db_url, echo = True)

In [2]:
Base = declarative_base()

In [3]:
def createTables(engine):
    # create table with the metadata stored in model.Base
    Base.metadata.create_all(engine)


def dropAllTables(engine):
    Base.metadata.drop_all(engine)

In [4]:
class Project(Base):
    __tablename__ = "project"
    id = Column(Integer, primary_key=True)
    tasks = relationship("Task", back_populates="project",cascade="delete, merge, save-update")


class Task(Base):
    __tablename__ = "task"
    id = Column(Integer, primary_key=True)
    # the nullable=False means, all task must have a project id.
    project_id = Column(Integer, ForeignKey("project.id"), nullable=False)
    project = relationship("Project", back_populates="tasks")

In [5]:
dropAllTables(engine)
createTables(engine)

2023-03-16 16:04:05,144 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-03-16 16:04:05,145 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("project")
2023-03-16 16:04:05,156 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-03-16 16:04:05,157 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("task")
2023-03-16 16:04:05,158 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-03-16 16:04:05,166 INFO sqlalchemy.engine.Engine 
DROP TABLE task
2023-03-16 16:04:05,168 INFO sqlalchemy.engine.Engine [no key 0.00283s] ()
2023-03-16 16:04:05,171 INFO sqlalchemy.engine.Engine 
DROP TABLE project
2023-03-16 16:04:05,172 INFO sqlalchemy.engine.Engine [no key 0.00111s] ()
2023-03-16 16:04:05,181 INFO sqlalchemy.engine.Engine COMMIT
2023-03-16 16:04:05,183 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-03-16 16:04:05,183 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("project")
2023-03-16 16:04:05,184 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-03-16 16:04:05,185 INFO sqlalchemy.

In [6]:
Session = sessionmaker(bind=engine)
session = Session()

task1 = Task(id=1,project_id=0)
task2 = Task(id=2,project_id=0)

project = Project(id=0)

session.add(project)
session.add(task1)
session.add(task2)
session.commit()

2023-03-16 16:04:08,003 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-03-16 16:04:08,007 INFO sqlalchemy.engine.Engine INSERT INTO project (id) VALUES (?)
2023-03-16 16:04:08,007 INFO sqlalchemy.engine.Engine [generated in 0.00081s] (0,)
2023-03-16 16:04:08,014 INFO sqlalchemy.engine.Engine INSERT INTO task (id, project_id) VALUES (?, ?)
2023-03-16 16:04:08,015 INFO sqlalchemy.engine.Engine [generated in 0.00060s] ((1, 0), (2, 0))
2023-03-16 16:04:08,017 INFO sqlalchemy.engine.Engine COMMIT


In [7]:
session.delete(project)
session.commit()

2023-03-16 16:04:21,150 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-03-16 16:04:21,154 INFO sqlalchemy.engine.Engine SELECT project.id AS project_id 
FROM project 
WHERE project.id = ?
2023-03-16 16:04:21,155 INFO sqlalchemy.engine.Engine [generated in 0.00092s] (0,)
2023-03-16 16:04:21,157 INFO sqlalchemy.engine.Engine SELECT task.id AS task_id, task.project_id AS task_project_id 
FROM task 
WHERE ? = task.project_id
2023-03-16 16:04:21,158 INFO sqlalchemy.engine.Engine [generated in 0.00058s] (0,)
2023-03-16 16:04:21,161 INFO sqlalchemy.engine.Engine DELETE FROM task WHERE task.id = ?
2023-03-16 16:04:21,168 INFO sqlalchemy.engine.Engine [generated in 0.00716s] ((1,), (2,))
2023-03-16 16:04:21,170 INFO sqlalchemy.engine.Engine DELETE FROM project WHERE project.id = ?
2023-03-16 16:04:21,170 INFO sqlalchemy.engine.Engine [generated in 0.00073s] (0,)
2023-03-16 16:04:21,171 INFO sqlalchemy.engine.Engine COMMIT


### 2.1 Other cascade actions

You probably noticed that in the above example, the cascade parameter is set to "delete, merge, save-update" rather than just "delete". This is because the ORM has other cascade behaviors aside from "delete", and "merge, save-update" are the ones that are on by default. If you set cascade="delete", you're turning off other cascade default behavior of the ORM.

Using "save-update" as an example, you may know that with SQLAlchemy you don't normally have to explicitly call session.add() on every single object to add it to the database. If you had to do that, it would look like this:

```python
task1 = Task()
session.add(task1)
task2 = Task()
session.add(task2)
project = Project(tasks=[task1, task2])
session.add(project)
session.commit()
```

Since there is a relationship from a project to its tasks, we can do the simpler:

```python
project = Project(tasks=[Task(), Task()])
session.add(project)
session.commit()
```

The official doc propose to use **cascade="all, delete-orphan"** in general, when you want to do cascade deletes. You can modify replace the cascade option with `all, delete-orphan` in the above code. You will notice the behavior is the same.


## 3. Database level cascades

The database also provides cascade delete on its level.
For example, for `postgres` we can add **ondelete="CASCADE"** to the `Task.prject_id` foreign key.
```postgresql
CREATE TABLE task
(
  id INT PRIMARY KEY,
  project_id INT NOT NULL REFERENCES project(id) ON DELETE CASCADE
);
```
After the creating the table, you can check the status of the table with below command

```bash
cascade> \d task
+------------+---------+----------------------------------------------------+
| Column     | Type    | Modifiers                                          |
|------------+---------+----------------------------------------------------|
| id         | integer |  not null default nextval('task_id_seq'::regclass) |
| project_id | integer |  not null                                          |
+------------+---------+----------------------------------------------------+
Indexes:
    "task_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "task_project_id_fkey" FOREIGN KEY (project_id) REFERENCES project(id) ON DELETE CASCADE

Time: 0.020s
```


For `sqlite`, you can use below command to create the table

```sqlite
CREATE TABLE task
(
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  project_id INTEGER NOT NULL,
  CONSTRAINT fk_projects
    FOREIGN KEY (project_id)
    REFERENCES project (id)
    ON DELETE CASCADE
);
```



### 3.1 SqlAlchemy orm config

If we only add ondelete="CASCADE" on the database level, however, we get another 'NotNullViolation' when running the code:

```text
sqlalchemy.exc.IntegrityError: (psycopg2.errors.NotNullViolation) null value
in column "project_id" violates not-null constraint
```

If we turn ORM-level cascades back on, declare victory and move on. The ondelete="CASCADE" might remain, but it is never used since the ORM will have done the deletions before the database gets a chance.

**The integrity error happens because the ORM still sets the project_id of each task to None**. To make the ondelete="CASCADE" works, we need to additionally set `passive_deletes=True` on the `Project.tasks` relationship, which disables the ORM from loading each related task and setting project_id to None.

```python
class Project(Base):
    __tablename__ = "project"
    id = Column(Integer, primary_key=True)
    tasks = relationship("Task", back_populates="project", passive_deletes=True)


class Task(Base):
    __tablename__ = "task"
    id = Column(Integer, primary_key=True)
    project_id = Column(
        Integer, ForeignKey("project.id", ondelete="CASCADE"), nullable=False
    )
    project = relationship("Project", back_populates="tasks")

```

## ORM level VS Database level

Pros of Database level cascade delete:
1. Performance: The SQLAlchemy documentation says: **Database level ON DELETE cascade is vastly more efficient than that of SQLAlchemy**.
2. Cover all: If you use sql query directly in sqlalchemy, the cascade delete will still happen. (Not for the ORM level)

When to Use ORM level?
1. Maybe you have an existing schema that you're not free to change for some reason, but you still want automatic cascade deletes in your code.
2. One of the models you'd like to cascade uses joined table inheritance and you want to avoid a "half deleted" object.
3. Some databases don't support (or don't support by default) FOREIGN KEY, and therefore ON DELETE CASCADE.
4. Maybe delete was added for completeness along with the other cascade behaviors that only make sense in the context of an ORM.


