## Use Case: FaceBase Enhancers

In this use case example, we demonstrate evolving the FaceBase 'enhancer' table schema. The table contains "enhancer reporter assays" which include details on a test genomic sequence, the original genomic loci, the other/target genomic loci, and the genomic loci used for visualization. They also include lists of nearest affected genes and lists of anatomical structures that exhibited phenotypic variance. The FaceBase use case is described in greater detail in earlier work [1].

1. Bugacov, Alejandro and Czajkowski, Karl and Kesselman, Carl and Kumar, Anoop and Schuler, Robert E. and Tangmunarunkit, Hongsuda. 2017. Experiences with DERIVA: An asset management platform for accelerating eScience.

In [1]:
import chisel

In [2]:
# Connect to the source database with the schema we want to evolve. We are connecting
# to the database using a 'versioned' catalog identifier ('1@2RM-3X92-ET06') so that
# this notebook should be runable in the future even as the FaceBase database evolves.
catalog = chisel.connect('https://www.facebase.org/ermrest/catalog/1@2RM-3X92-ET06')

In [3]:
# For demonstration purposes, we will materialize the evolved relations in a
# "local" (i.e., file system) catalog here. The directory "./catalog" must
# exist for the connect function to return without error.
local = chisel.connect('./catalog')

In [4]:
# For convenience, assign these tables to variables in the script
enhancer = catalog['isa']['enhancer'] # the table we want to evolve
anatomy = catalog['vocab']['anatomy'] # a vocabulary table
gene_names = catalog['vocab']['gene'] # a vocabulary table

In [5]:
# Step 1: reify a sub-concept, the 'original species' loci, from 
#         the enhancer table into its own table structure.
original_species_assembly = enhancer.reify_sub(
    enhancer['original_species_assembly'],
    enhancer['original_species_chromosome'],
    enhancer['original_species_start'],
    enhancer['original_species_end']
)

In [6]:
# Although the new relation has not been materialized, we can look 
# at its defition already.
original_species_assembly.describe()

### Table "None.enhancer"
| Column                      | Type    | Nullable | Default | Comment |
|-----------------------------|---------|----------|---------|---------|
| id                          | serial4 | False    | None    | None    |
| original\_species\_assembly | text    | True     | None    | None    |
| original\_species\_chromosome | text    | True     | None    | None    |
| original\_species\_start    | int4    | True     | None    | None    |
| original\_species\_end      | int4    | True     | None    | None    |


In [7]:
# Step 2: reify the 'visualization' loci sub-concept.
visualization_assembly = enhancer.reify_sub(
    enhancer['visualization_assembly'],
    enhancer['visualization_assembly_chromosome'],
    enhancer['visualization_assembly_start'],
    enhancer['visualization_assembly_end']
)

In [8]:
# Step 3: reify the 'other' organism loci sub-concept.
other_assembly = enhancer.reify_sub(
    enhancer['other_assembly'],
    enhancer['other_assembly_chromosome'],
    enhancer['other_assembly_start'],
    enhancer['other_assembly_end']
)

In [9]:
# Step 4: now convert a nested, non-normal form, list of gene names into a 
#         normalized structure and align them to the gene name vocabulary.
enhancer_closest_genes = enhancer['list_of_closest_genes'].to_tags(gene_names)

In [10]:
# Again, we can look at the new relation's definition.
enhancer_closest_genes.describe()

### Table "None.019acbd0830911e9816cac87a3187979_gene"
| Column                | Type    | Nullable | Default | Comment                                          |
|-----------------------|---------|----------|---------|--------------------------------------------------|
| id                    | serial4 | False    | None    | None                                             |
| list\_of\_closest\_genes | text    | False    | None    | The preferred human\-readable name for this term\. |


In [11]:
# Step 5: here we rename the column 'list_of_...' to 'closest_genes'
enhancer_closest_genes = enhancer_closest_genes.select(
    enhancer_closest_genes['id'].alias('enhancer_id'),
    enhancer_closest_genes['list_of_closest_genes'].alias('closest_genes')
)

In [12]:
# See the corrected definition.
enhancer_closest_genes.describe()

### Table "None.07bf76dc830911e9816cac87a3187979_gene"
| Column        | Type    | Nullable | Default | Comment                                          |
|---------------|---------|----------|---------|--------------------------------------------------|
| enhancer\_id  | serial4 | False    | None    | None                                             |
| closest\_genes | text    | False    | None    | The preferred human\-readable name for this term\. |


In [13]:
# We can also fetch the relations to preview the new aligned relation.
enhancer_closest_genes.fetch()

[{'enhancer_id': 3, 'closest_genes': 'Smad4'},
 {'enhancer_id': 7, 'closest_genes': 'COL1A2'},
 {'enhancer_id': 8, 'closest_genes': 'Fgfr1'},
 {'enhancer_id': 14, 'closest_genes': 'SOX5'},
 {'enhancer_id': 16, 'closest_genes': 'Sox9'},
 {'enhancer_id': 22, 'closest_genes': 'COL1A2'},
 {'enhancer_id': 23, 'closest_genes': 'COL1A2'},
 {'enhancer_id': 44, 'closest_genes': 'SOX5'},
 {'enhancer_id': 43, 'closest_genes': 'SOX5'},
 {'enhancer_id': 45, 'closest_genes': 'SOX5'},
 {'enhancer_id': 42, 'closest_genes': 'SOX5'}]

In [14]:
# Step 6: do the same for the nested, non-normal form, list of anatomical structures.
enhancer_anotomical_structure = \
    enhancer['list_of_anatomical_structures'].to_tags(anatomy)

In [15]:
# Step 7: again rename the now normalized column.
enhancer_anotomical_structure = enhancer_anotomical_structure.select(
    enhancer_anotomical_structure['id'].alias('enhancer_id'),
    enhancer_anotomical_structure['list_of_anatomical_structures'].alias(
        'anatomical_structures')
)

In [16]:
# And again, preview the aligned terms.
enhancer_anotomical_structure.fetch()

[{'enhancer_id': 4, 'anatomical_structures': 'facial mesenchyme'},
 {'enhancer_id': 4, 'anatomical_structures': 'somite'},
 {'enhancer_id': 4, 'anatomical_structures': 'nose'},
 {'enhancer_id': 1, 'anatomical_structures': 'ear'},
 {'enhancer_id': 13, 'anatomical_structures': 'nose'},
 {'enhancer_id': 2, 'anatomical_structures': 'nose'},
 {'enhancer_id': 2, 'anatomical_structures': 'secondary palate'},
 {'enhancer_id': 3, 'anatomical_structures': 'nose'},
 {'enhancer_id': 6, 'anatomical_structures': 'nose'},
 {'enhancer_id': 7, 'anatomical_structures': 'pharyngeal arch'},
 {'enhancer_id': 7, 'anatomical_structures': 'facial mesenchyme'},
 {'enhancer_id': 8, 'anatomical_structures': 'facial mesenchyme'},
 {'enhancer_id': 9, 'anatomical_structures': 'forebrain'},
 {'enhancer_id': 9, 'anatomical_structures': 'hindbrain'},
 {'enhancer_id': 9, 'anatomical_structures': 'midbrain'},
 {'enhancer_id': 9, 'anatomical_structures': 'neural tube'},
 {'enhancer_id': 9, 'anatomical_structures': 'nose'

In [17]:
# Finally, materialize the relations to the 'local' catalog. To evolve the source
# catalog's schema instead, we would simply replace 'local' with 'catalog' below.
with local.evolve():
   local['.']['original_species_assembly.csv'] = original_species_assembly
   local['.']['visualization_assembly.csv'] = visualization_assembly
   local['.']['other_assembly.csv'] = other_assembly
   local['.']['enhancer_closest_genes.csv'] = enhancer_closest_genes
   local['.']['enhancer_anotomical_structure.csv'] = enhancer_anotomical_structure