# OntoWeaver Vignette
OntoWeaver is a tool for constructing Semantic Knowledge Graphs (SKGs) from iterative data, such as CSV files. It allows users to define mappings between the data and an ontology, enabling the creation of a graph that can be queried and analyzed. This notebook provides a step-by-step guide on how to use OntoWeaver to build an SKG from synthetic clinical and genomic data, including single nucleotide variants (SNVs) and copy number alterations (CNAs).

## Semantic Knowledge Graphs


## Description of the data

### Single Nucleotide Variants (SNVs)

### Copy Number Alterations (CNAs)

### Treatments (OncoKB)


## Set-up

### Installing dependencies

We use *Poetry* to manage dependencies and virtual environments. If you don't have it installed, you can install it with:

In [8]:
! poetry install

[34mInstalling dependencies from lock file[39m

No dependencies to install or update


So far, the OntoWeaver package works under python 3.12. If you have multiple python versions installed, you can direct *Poetry* to use the correct one with the following command:
`! poetry env use $(which python3.12)`

### Starting the *poetry* environment

In [9]:
! eval $(poetry env activate) # new implementation of `poetry shell`

## SKG construction

### 1. Simple mapping using SNVs

We want to build a KG with a simple schema, encompassing patient IDs, the IDs of the samples they provided, the sequence variants they have, and the genes those variants are in. We first start by defining the schema of the desired graph, which would in our case look like this:

After defining the schema, we need to define the mappings between the data and the ontology. The mappings will specify how each column in the CSV files corresponds to a node or edge in the graph. This is defined in the OntoWeaver mapping files, which are YAML files that describe the structure of the data and how it should be transformed into nodes and edges in the graph.

Below we display the mapping file we use for the build up of the first example graph of the SNV database.

In [23]:
import yaml
from IPython.display import display, JSON

# Read the file content.
with open("jobim/1_Simple_mapping/snv_1.yaml", "r") as file:
    content = yaml.safe_load(file)

# Display the content.
display(JSON(content))

<IPython.core.display.JSON object>

OntoWeaver maps the databases row by row, so the mapping file first specifies how the subject node of each row mapped will be created. We first define that the subject ID will be created from the `patient_id`, using the `columns` keyword, and that the node will be of type `patient`, using the `to_subject` keyword.

For each column we want to map, we must define the strategy of the extraction of the values from each cell of the column, which will serve as the ID of the created node. For this, we use `transformers`. TODO EXPLAIN MORE.

For simplicity, in this first section we keep to the usage of only the `map` transformer, which simply extracts the data as it is from the cells of the defined column.






In [27]:
snv_database = pd.read_csv("./data/step_1/subset_1_anon_snv_annotated_external.csv")
snv_database.head(n=5)

Unnamed: 0,patient_id,sample_id,alteration,hugoSymbol
0,CC0010,CC5366_iAdnL1_DNA1,GPS2:chr17:7314304:G:A,TNXB
1,CC2871,CC1112_r1Oth1_DNA2,ATG9B:chr7:151023728:AGCCTGGGCACAGAGGGGAGAGT:A,TNC
2,CC9768,CC2862_pTubL1_TR,WFIKKN1:chr16:631371:AACCTGTGGGTGGACGCCCAGAGC:A,DZIP3
3,CC9341,CC8885_iAdnL1_DNA1,ZNF8:chr19:58295436:TC:T,KDM4E
4,CC1344,CC2821_pOme1_DNA1,IL12RB1:chr19:18072287:G:T,SKIC3
