# Adding Relationships to SQLModel

<strong>Author(s):</strong> Jessica A. Nash, The Molecular Sciences Software Institute

<div class="alert alert-block alert-info"> 
<h2>Overview</h2>

<strong>Questions:</strong>

* How can I automatically link tables in a database using SQLModel?
* How do relationships simplify querying across tables?

<strong>Objectives:</strong>

* Add relationships to SQLModel models.

</div>

Our tables are now fully defined as we had discussed in the previous notebooks.
However, there are more benefits to using ORMs than we were able to discuss.
One conveneint thing about ORMs is the ability to define relationships betweent tables on the classes you define.

Although not discussed in the previous notebook, if we wanted to query our previous database for an article and access the keywords,
we would have to either do two queries or we would need to do a SQL `join` operation.
However, when using an ORM, we can define additional relationships on the tables that allow us to access this more easily.

In this next section, we'll add relationships to our SQLModel models.
To get started, we'll remove our old database and get started with a new one

In [None]:
import os 

from typing import Optional, List

from sqlmodel import Field, SQLModel, Session, Relationship, create_engine

def remove_db():
    """Convenience function to remove database file for notebook."""
    if os.path.exists("sqlmodel_database_relationships.db"):
        os.remove("sqlmodel_database_relationships.db")

remove_db()

As discussed in our first notebook on databases, articles and authors have a `many to many` relationship.
This means one article can have many authors and vice versa. 
Using SQLModel, we can define this many to many relationship on our tables.
Essentially, we will tell SQLModel to use our association table automatically.

The cell below redefines our associative tables first, then repeats our `Author` definition.
A final line is added to both `Article` and `Author` that defines a relationship between articles and authors.
The lines added specifically are

(in `Article`)
```python
authors: List["Author"] = Relationship(back_populates="articles", link_model=ArticleAuthor)
```

and 

(in `Author`)
```python
 articles: List["Article"] = Relationship(back_populates="authors", link_model=ArticleAuthor)
```

This defines a `relationship` on the `article` table and on the `author` table. 
It says when we access an article we should get a list of `Author` objects with the article.
We are also telling the ORM that these two are linked using the `AriticleAuthor` table by giving the `linkmodel` keyword.

After this definition, we can use `Article.authors` or `Author.articles` to reference across tables.


In [None]:
class ArticleKeyword(SQLModel, table=True):
    __table_args__ = {"extend_existing": True} # This lets us run the Jupyter notebook cell multiple times without error
    
    article_doi: str = Field(foreign_key="article.doi", primary_key=True)
    keyword_id: int = Field(foreign_key="keyword.id", primary_key=True)

class ArticleAuthor(SQLModel, table=True):
    __table_args__ = {"extend_existing": True}

    article_doi: str = Field(foreign_key="article.doi", primary_key=True)
    author_id: int = Field(foreign_key="author.id", primary_key=True)


class Article(SQLModel, table=True):
    __table_args__ = {"extend_existing": True}
    
    doi: str = Field(primary_key=True)
    title: str
    publication_year: int
    abstract: Optional[str] = Field(default=None)

    authors: List["Author"] = Relationship(back_populates="articles", link_model=ArticleAuthor)

class Author(SQLModel, table=True):
    __table_args__ = {"extend_existing": True}

    id: Optional[int] = Field(primary_key=True)
    first_name: str
    last_name: str
    affiliation: Optional[str] = Field(default=None)

    articles: List["Article"] = Relationship(back_populates="authors", link_model=ArticleAuthor)


<div class="alert alert-block alert-warning">

## Exercise
Redefine the Keyword table and add a relationship with `Article`.
</div>

In [None]:
class Article(SQLModel, table=True):
    __table_args__ = {"extend_existing": True}
    
    doi: str = Field(primary_key=True)
    title: str
    publication_year: int
    abstract: Optional[str] = Field(default=None)

    authors: List["Author"] = Relationship(back_populates="articles", link_model=ArticleAuthor)

    # add relationship to keywords
    keywords: List["Keyword"] = Relationship(back_populates="articles", link_model=ArticleKeyword)

## Add Keyword table with relationship here
class Keyword(SQLModel, table=True):
    __table_args__ = {"extend_existing": True}

    id: None | int = Field(primary_key=True)
    keyword: str = Field(unique=True, index=True)

    articles: list["Article"] = Relationship(back_populates="keywords", link_model=ArticleKeyword)

In the cell below we create another database. 
This time, we set `echo=False` so that we don't see the SQL commands that are being executed.

In [None]:
sqlite_file_name = "sqlmodel_database_relationships.db"
sqlite_url = f"sqlite:///{sqlite_file_name}"

engine = create_engine(sqlite_url, echo=False)

SQLModel.metadata.create_all(engine)

The cell below adds 50 papers pulled from ChemRxiv to the database.
As you can see, we are adding keywords to the `Article` object instead of adding each keyword individually and linking them.

In [None]:
import json

from sqlmodel import select, col

with open("data/fifty_papers.json") as f:
    data = json.load(f)["itemHits"]

with Session(engine) as session:

    for paper in data:
        paper_data = paper["item"]

        keywords = []
        for keyword in paper_data["keywords"]:
            # Check if keyword is in database\
            keyword = keyword.lower()

            stmt = select(Keyword).where(Keyword.keyword == keyword)

            result = session.exec(stmt).one_or_none()

            if not result:
                keyword_obj = Keyword(keyword=keyword.lower())
            else:
                keyword_obj = result
            
            keywords.append(keyword_obj)
            

        article = Article(doi=paper_data["doi"], title=paper_data["title"], publication_year=2024, abstract=paper_data["abstract"], keywords=keywords)
        session.add(article)
        session.commit()

The cell below shows querying the `article` table for articles that have keywords that contain the phrase "molecular dynamics".
The Python code for `stmt` results in a SQL Alchemy query being built, as showin printed below.

In [None]:
stmt = select(Article).where(
        Article.keywords.any(Keyword.keyword.contains("molecular dynamics"))
) 

print(stmt)

In [None]:
# perform query
result = session.exec(stmt).all()

print(len(result))