# Databases and Asynchronous ORMs

The main goal of a REST API is, of course, to read and write data. So far, we've solely
worked with the tools given by Python and FastAPI, allowing us to build reliable
endpoints to process and answer requests. However, we haven't been able to effectively
retrieve and persist that information: we missed a **database**. 

In this notebook we will deal with interacting with databases and related libraries inside FastAPI. Note that FastAPI is completely agnostic regarding databases and leaves integration of any system to the developer. We will review three different approaches to integrate a database:
(1) using basic **SQL queries**, (2) using **Object-Relational Mapping** (**ORM**), and (3) using a **NoSQL database**.

## An overview of relational and NoSQL databases

The role of a database is to store data in a structured way, preserve the integrity of the
data, and offer a query language that enables you to retrieve this data when an application
needs it. Nowadays, when it comes to choosing a database for your web project, you have two main
choices: relational databases, with their associated SQL query language, and NoSQL
databases, named in opposition to the first category. In this section, we'll outline the main characteristics and
features of those two database families and try to give you some insights into choosing the
right one for your project.

### Relational databases

Relational databases implement the relational model: each entity, or object, of the
application is stored in **tables**. Each table has several **columns** containing attributes of the entity. One of the key points of relational databases is, as their name suggests, relationships. Each
table can be in relation to others, with rows referring to other rows in other tables. 

The main motivation behind this is to avoid duplication. Indeed, it wouldn't be very
efficient to repeat an object's attributes in each related to it. If it needs to be modified
at some point, we would have to go through each related entity, which is error-prone and puts data
consistency at risk. This is why we prefer to references to entities using unique identifiers. 

To do this, each row in a relational database has an identifier, called a **primary key**. This is
unique in the table and will allow you to uniquely identify this row. Therefore, it's possible
to use this key in another table to reference it. We call it a **foreign key**: the key is foreign in
the sense that it refers to another table. Relational databases are designed to perform **join queries** efficiently, which will return all the relevant records
based on the foreign keys. However, those operations can become expensive if the schema is more complex. This is why it's important to carefully design a relational schema and its queries.

### NoSQL databases

Most of the time when we talk about "NoSQL databases", we are implicitly referring to document-oriented databases. They are the ones that interest us in this notebook. Document-oriented databases move away from the relational architecture and try to store
all the information of a given object inside a single **document**. As such, performing a join
query is much rarer and usually more difficult.

Those documents are stored in **collections**. Contrary to relational databases, documents
in a collection might not have all of the same attributes: while tables in relational
databases have a defined schema, collections accept any kind of document. In order to retrieve all of the information about a user and their hobbies, a single document can be retrieved from the database. No joins are required, resulting in faster queries.

```
{
   "_id": 1,
   "first_name": "Leslie",
   "last_name": "Yepp",
   "cell": "8125552344",
   "city": "Pawnee",
   "hobbies": ["scrapbooking", "eating waffles", "working"]
}
```

This was main motivation behind the development of document-oriented databases: increase the query performance by limiting the need to
look at several collections. 

### Which one should you choose?

For small and medium-sized applications, the choice doesn't really matter: both relational
databases and document-oriented databases are very optimized and will deliver awesome
performance at such scales. But here are some
elements for you to think about:

* Relational databases are very good for storing structured data with a lot of relationships
between the entities. Besides, they maintain data consistency at all costs, even in the event
of errors or hardware failures. However, you'll have to precisely define your schema and
consider a migration system to update your schema if your needs evolve.

+++

* On the other hand, document-oriented databases don't require you to define a schema:
they accept any document structure, so it can be convenient if your data is highly variable
or if your project is not mature enough. The downside of this is that they are far less picky
in terms of data consistency, which could result in data loss or inconsistencies.

+++



## Communicating with a SQL database with SQLAlchemy

To begin, we will discuss how to work with a relational database using the SQLAlchemy
library. Note that we will only consider the core part of the library, which
only provides the tools to abstract communication with a SQL database. We won't
consider the ORM part, as, in the next section, we'll focus on another ORM: Tortoise. We will combine SQLAlchemy with the `databases` library by Encode, the same team
behind Starlette, which provides an asynchronous connection layer for SQLAlchemy:

```{figure} ../../img/sqlalch-encode.png
---
name: sqlalch-encode
---


```

### Creating the table schema

First, you need to define the SQL schema for your tables: the name, the columns, and their
associated types and properties. In the following example, you can view the definition of the
`posts` table:

In [3]:
# sqlalchemy/models.py
import sqlalchemy

metadata = sqlalchemy.MetaData()

posts = sqlalchemy.Table(
    "posts",
    metadata,
    sqlalchemy.Column("id", sqlalchemy.Integer, primary_key=True, autoincrement=True),
    sqlalchemy.Column("publication_date", sqlalchemy.DateTime(), nullable=False),
    sqlalchemy.Column("title", sqlalchemy.String(length=255), nullable=False),
    sqlalchemy.Column("content", sqlalchemy.Text(), nullable=False),
)

First, we created a `metadata` object. Its role is to keep all the information of a database
schema together. This is why you should create it only once in your whole project and
always use the same one throughout.

Next, we defined a table using the `Table` class. The first argument is the name of the
table, followed by the metadata object. Then, we list all of the columns that should be
defined in our table, thanks to the `Column` class. The first argument is the name of the
column, followed by its [type](https://docs.sqlalchemy.org/en/13/core/type_basics.html#generic-types) and [some options](https://docs.sqlalchemy.org/en/13/core/metadata.html#:~:text=sqlalchemy.schema.Column.__init__). For example, we define
our `id` column as a primary key with auto-increment, which is quite common in
a SQL database.

We will also define the
corresponding Pydantic models for our post entity in the same file. Since they will be used by FastAPI to
validate the request payload, they must match the SQL definition to avoid any errors from
the database when we try to insert a new row later.

In [4]:
# sqlalchemy/models.py
from datetime import datetime
from typing import Optional
from pydantic import BaseModel, Field

class PostBase(BaseModel):
    title: str
    content: str
    publication_date: datetime = Field(default_factory=datetime.now)

class PostPartialUpdate(BaseModel):
    title: Optional[str] = None
    content: Optional[str] = None

class PostCreate(PostBase):
    pass

class PostDB(PostBase):
    id: int

### Connecting to a database

#### Setting up connection

Now that our table is ready, we have to set up the connection between our FastAPI app
and the database engine.

In [15]:
# sqlalchemy/database.py
import sqlalchemy
from databases import Database

DATABASE_URL = "sqlite:///chapter6_sqlalchemy.db"
database = Database(DATABASE_URL)
sqlalchemy_engine = sqlalchemy.create_engine(DATABASE_URL)

def get_database() -> Database:
    return database

Observe that we instantiate a `Database` instance using the database URL. This is the connection layer provided by `databases` that will allow us to perform asynchronous queries. Notice that the standard synchronous connection established in `sqlalchemy_engine` overlaps with `database`. This will be clarified later.

The function `get_database` will be used as a dependency easily retrieve the database instance in our path operation functions. Setting up a dependency like this instead of directly importing objects will benefit us during automated testing.

#### Startup and shutdown

Now, we need to tell FastAPI to open the connection with the database when it starts
the application and then close it when exiting. FastAPI provides two
special decorators to perform tasks at startup and shutdown, as you can see in the
following example:

In [20]:
# sqlalchemy/app.py
from fastapi import FastAPI
# ...

app = FastAPI()

@app.on_event("startup")
async def startup():
    await database.connect()
    metadata.create_all(sqlalchemy_engine)

@app.on_event("shutdown")
async def shutdown():
    await database.disconnect()

Additionally, you can see that we call the `create_all` method on the `metadata` object. This is the same `metadata` object we defined in the previous section and that we have
imported here. The goal of this method is to create the table's schema inside our database.
If we don't do that, our database would be empty and we wouldn't be able to save
or retrieve data. This method is designed to work with a standard SQLAlchemy engine;
this is why we instantiated `sqlalchemy_engine` earlier. It has no other use in the application; instead, we will be using `database`. 