# DataPath Example 4

This notebook covers somewhat more advanced examples for using `DataPath`s. It assumes that you understand 
the concepts presented in the previous example notebooks.

You should also read the ERMrest documentation and the derivapy wiki. There are more advanced concepts in this notebook that are demonstrated but not fully (re)explained here, as the concepts are explained in other documentation.

## Exampe Data Model
The examples require that you understand a little bit about the example catalog data model, which in this case manages data for biological experiments.

### Key tables
- `'dataset'` : represents a unit of data usually for a study or set of experiments;
- `'biosample'` : a biosample (describes biological details of a specimen);
- `'replicate'` : a replicate (describes both bio- and technical-replicates);
- `'experiment'` : a bioassay (any type of experiment or assay; e.g., imaging, RNA-seq, ChIP-seq, etc.).

### Relationships
- `dataset <- biosample`: A dataset may have one to many biosamples. I.e., there is a 
  foreign key reference from biosample to dataset.
- `dataset <- experiment`: A dataset may have one to many experiments. I.e., there 
  is a foreign key reference from experiment to dataset.
- `experiment <- replicate`: An experiment may have one to many replicates. I.e., there is a
  foreign key reference from replicate to experiment.

In [1]:
# Import deriva modules and pandas DataFrame (for use in examples only)
from deriva.core import ErmrestCatalog, get_credential
from pandas import DataFrame

In [2]:
# Connect with the deriva catalog
protocol = 'https'
hostname = 'www.facebase.org'
catalog_number = 1
credential = None
# If you need to authenticate, use Deriva Auth agent and get the credential
# credential = get_credential(hostname)
catalog = ErmrestCatalog(protocol, hostname, catalog_number, credential)

In [3]:
# Get the path builder interface for this catalog
pb = catalog.getPathBuilder()

# Get some local variable handles to tables for convenience
dataset = pb.isa.dataset
experiment = pb.isa.experiment
biosample = pb.isa.biosample
replicate = pb.isa.replicate

## Implicit DataPaths
**Proceed with caution**

For compactness, `Table` objects (and `TableAlias` objects) provide `DataPath`-like methods. E.g., `link(...)`, `filter(...)`, and `entities(...)`, which will implicitly create `DataPath`s rooted at the table and return the newly created path. These operations `return` the new `DataPath` rather than mutating the `Table` (or `TableAlias`) objects.

In [4]:
entities = dataset.filter(dataset.released == True).entities()
len(entities)

815

### DataPath-like methods
The `DataPath`-like methods on `Table`s are essentially "wrapper" functions over the implicitly generated `DataPath` rooted at the `Table` instance. The wrappers include, `link(...)`, `filter(...)`, `entities(...)`, `attributes(...)`, `aggregates(...)`, and `groupby(...)`.

## Attribute Examples

### Example: selecting all columns of a table instance
Passing a table (or table instance) object to the `attributes(...)` method will project all (i.e., `*`) of its attributes.

In [5]:
path = dataset.alias('D').path
path.link(experiment).link(replicate)
results = path.attributes()
print(len(results))
print(path.uri)

3132
https://www.facebase.org/ermrest/catalog/1/entity/D:=isa:dataset/experiment:=isa:experiment/replicate:=isa:replicate


In [6]:
results = path.attributes(path.D)
print(results.uri)

https://www.facebase.org/ermrest/catalog/1/attribute/D:=isa:dataset/experiment:=isa:experiment/replicate:=isa:replicate/D:*


It is important to remember that the `attributes(...)` method returns a result set based on the entity type of the last elmenent of the path. In this example that means the number of results will be determined by the number of unique rows in the replicate table instance in the path created above, as the last link method used the replicate table.

### Example: selecting from multiple table instances
More than one table instance may be selected in this manner and it can be mixed and matched with columns from other tables instances. 

In [7]:
results = path.attributes(path.D,
                          path.experiment.experiment_type,
                          path.replicate)
print(len(results))
print(results.uri)

3132
https://www.facebase.org/ermrest/catalog/1/attribute/D:=isa:dataset/experiment:=isa:experiment/replicate:=isa:replicate/D:*,experiment:experiment_type,replicate:*


 If you want to base the results on a different entity, you can introduce a table instance alias into the end of the path, before calling the attributes function.  In this case, even though we are asking for the same attributes, we are getting the set of datasets, not the set of replicates.  Also, since we are including the attributes from dataset in our query, we know that we will not be seeing any duplicate rows.

In [8]:
results = path.D.attributes(path.D,
                            path.experiment.experiment_type,
                            path.replicate)
print(len(results))
print(results.uri)

120
https://www.facebase.org/ermrest/catalog/1/attribute/D:=isa:dataset/experiment:=isa:experiment/replicate:=isa:replicate/$D/D:*,experiment:experiment_type,replicate:*


## Filtering Examples

### Example: filter on `null` attribute
To test for a `null` attribute value, do an equality comparison against the `None` identity.

In [9]:
path = dataset.link(experiment).filter(experiment.molecule_type == None)
print(path.uri)
print(len(path.entities()))

https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/experiment:=isa:experiment/molecule_type::null::
415


### Example: advanced text filters
Deriva supports advanced text filters for regular expressions (`regexp`), case-instansitive regexp (`ciregexp`), and text search (`ts`). You may have to review your text and full-text indexes in your ERMrest catalog before using these features.

In [10]:
path = dataset.filter(dataset.description.ciregexp('palate'))
print(path.uri)
print(len(path.entities()))

https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/description::ciregexp::palate
32


### Example: negate a filter
Use the "inverse" ('`~`') operator to negate a filter. Negation works against simple comparison filters as demonstrated above as well as on logical operators to be discussed next. You must wrap the comparison or logical operators in an extra parens to use the negate operation, e.g., "`~ (...)`". 

In [11]:
path = dataset.filter( ~ (dataset.description.ciregexp('palate')) )
print(path.uri)
print(len(path.entities()))

https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/!(description::ciregexp::palate)
780


### Example: filters with logical operators
This example shows how to combine two comparisons with a conjuncting (i.e., `and` operator). Because Python's logical-and (`and`) keyword cannot be overloaded, we instead overload the bitwise-and (`&`) operator. This approach has become customary among many similar data access libraries.

In [12]:
path = dataset.link(biosample).filter(
    ((biosample.species == 'NCBITAXON:10090') & (biosample.anatomy == 'UBERON:0002490')))

print(path.uri)

https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/biosample:=isa:biosample/(species=NCBITAXON%3A10090)&(anatomy=UBERON%3A0002490)


In [13]:
DataFrame(path.entities())

Unnamed: 0,RCB,RCT,RID,RMB,RMT,_keywords,anatomy,cell_characterization,cell_source,collection_date,...,litter,local_identifier,mutation,origin,phenotype,species,specimen,stage,strain,theiler_stage
0,https://auth.globus.org/8ae274db-d033-47eb-bd3...,2018-12-10T19:42:13.870864-08:00,1-4TT8,https://auth.globus.org/8ae274db-d033-47eb-bd3...,2018-12-10T19:43:19.779894-08:00,,UBERON:0002490,,,,...,,scWFE18_S197,,,,NCBITAXON:10090,FACEBASE:1-4GNR,FACEBASE:1-4GJA,FACEBASE:1-4GYR,
1,,2018-03-12T18:12:27.599487-07:00,2XEJ,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:36.379287-08:00,,UBERON:0002490,,,2016-06-20,...,3/30/15AL1-5,A8IF4SM,,FACEBASE:1-4FV6,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW
2,,2018-03-12T18:12:27.599487-07:00,2XEP,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:36.068132-08:00,,UBERON:0002490,,,2016-06-20,...,9/29/14AL1-5,A8IF1SM,,FACEBASE:1-4FV6,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW
3,,2018-03-12T18:12:27.599487-07:00,2XW6,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:36.644948-08:00,,UBERON:0002490,,,2016-06-20,...,3/30/15AL1-5,A8IF4FR,,FACEBASE:1-4FTR,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW
4,,2018-03-12T18:12:27.599487-07:00,2YAT,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:35.57911-08:00,,UBERON:0002490,,,2016-06-20,...,3/30/15AL1-1,A8IF3FR,,FACEBASE:1-4FTR,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW
5,,2018-03-12T18:12:27.599487-07:00,2YWJ,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:35.368619-08:00,,UBERON:0002490,,,2016-06-20,...,12/17/15AL1-7,A6IF3SM,,FACEBASE:1-4FV6,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJ8,FACEBASE:1-4GYR,FACEBASE:1-4GJT
6,,2018-03-12T18:12:27.599487-07:00,2Z2A,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:35.368619-08:00,,UBERON:0002490,,,2016-06-20,...,12/17/15AL1-4,A6IF1FR,,FACEBASE:1-4FTR,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJ8,FACEBASE:1-4GYR,FACEBASE:1-4GJT
7,,2018-03-12T18:12:27.599487-07:00,2Z46,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:36.644948-08:00,,UBERON:0002490,,,2016-06-20,...,1/8/16AL1-4,A6IF4SM,,FACEBASE:1-4FV6,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJ8,FACEBASE:1-4GYR,FACEBASE:1-4GJT
8,,2018-03-12T18:12:27.599487-07:00,2ZAE,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:35.368619-08:00,,UBERON:0002490,,,2016-06-20,...,3/16/15AL1-1,A8IF2FR,,FACEBASE:1-4FTR,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW
9,,2018-03-12T18:12:27.599487-07:00,2ZEJ,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:36.644948-08:00,,UBERON:0002490,,,2016-06-20,...,5/18/15AL1-4,A8IF5FR,,FACEBASE:1-4FTR,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW


### Example: combine conjunction and disjunctions in filters
Similar to the prior example, the filters allow combining of conjunctive and disjunctive operators. Like the bitwise-and operator, we also overload the bitwise-or (` | `) operator because the logical-or (`or`) operatar cannot be overloaded.

In [14]:
path = dataset.link(biosample).filter(
    ((biosample.species == 'NCBITAXON:10090') & (biosample.anatomy == 'UBERON:0002490')) |
    ((biosample.specimen == 'FACEBASE:1-4GNR') & (biosample.stage == 'FACEBASE:1-4GJA')))

print(path.uri)

https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/biosample:=isa:biosample/((species=NCBITAXON%3A10090)&(anatomy=UBERON%3A0002490));((specimen=FACEBASE%3A1-4GNR)&(stage=FACEBASE%3A1-4GJA))


In [15]:
DataFrame(path.entities())

Unnamed: 0,RCB,RCT,RID,RMB,RMT,_keywords,anatomy,cell_characterization,cell_source,collection_date,...,litter,local_identifier,mutation,origin,phenotype,species,specimen,stage,strain,theiler_stage
0,https://auth.globus.org/8ae274db-d033-47eb-bd3...,2018-12-10T19:42:13.870864-08:00,1-4TT8,https://auth.globus.org/8ae274db-d033-47eb-bd3...,2018-12-10T19:43:19.779894-08:00,,UBERON:0002490,,,,...,,scWFE18_S197,,,,NCBITAXON:10090,FACEBASE:1-4GNR,FACEBASE:1-4GJA,FACEBASE:1-4GYR,
1,,2018-03-12T18:12:27.599487-07:00,2XEJ,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:36.379287-08:00,,UBERON:0002490,,,2016-06-20,...,3/30/15AL1-5,A8IF4SM,,FACEBASE:1-4FV6,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW
2,,2018-03-12T18:12:27.599487-07:00,2XEP,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:36.068132-08:00,,UBERON:0002490,,,2016-06-20,...,9/29/14AL1-5,A8IF1SM,,FACEBASE:1-4FV6,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW
3,,2018-03-12T18:12:27.599487-07:00,2XW6,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:36.644948-08:00,,UBERON:0002490,,,2016-06-20,...,3/30/15AL1-5,A8IF4FR,,FACEBASE:1-4FTR,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW
4,,2018-03-12T18:12:27.599487-07:00,2YAT,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:35.57911-08:00,,UBERON:0002490,,,2016-06-20,...,3/30/15AL1-1,A8IF3FR,,FACEBASE:1-4FTR,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW
5,,2018-03-12T18:12:27.599487-07:00,2YWJ,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:35.368619-08:00,,UBERON:0002490,,,2016-06-20,...,12/17/15AL1-7,A6IF3SM,,FACEBASE:1-4FV6,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJ8,FACEBASE:1-4GYR,FACEBASE:1-4GJT
6,,2018-03-12T18:12:27.599487-07:00,2Z2A,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:35.368619-08:00,,UBERON:0002490,,,2016-06-20,...,12/17/15AL1-4,A6IF1FR,,FACEBASE:1-4FTR,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJ8,FACEBASE:1-4GYR,FACEBASE:1-4GJT
7,,2018-03-12T18:12:27.599487-07:00,2Z46,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:36.644948-08:00,,UBERON:0002490,,,2016-06-20,...,1/8/16AL1-4,A6IF4SM,,FACEBASE:1-4FV6,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJ8,FACEBASE:1-4GYR,FACEBASE:1-4GJT
8,,2018-03-12T18:12:27.599487-07:00,2ZAE,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:35.368619-08:00,,UBERON:0002490,,,2016-06-20,...,3/16/15AL1-1,A8IF2FR,,FACEBASE:1-4FTR,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW
9,,2018-03-12T18:12:27.599487-07:00,2ZEJ,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:53:36.644948-08:00,,UBERON:0002490,,,2016-06-20,...,5/18/15AL1-4,A8IF5FR,,FACEBASE:1-4FTR,FACEBASE:1-4GB0,NCBITAXON:10090,FACEBASE:1-4GNG,FACEBASE:1-4GJA,FACEBASE:1-4GYR,FACEBASE:1-4GJW


### Example: filtering at different stages of the path
Filtering a path does not have to be done at the end of a path. In fact, the initial intention of the ERMrest URI was to mimick "RESTful" semantics where a RESTful "resource" is identified, then filtered, then a "sub-resource" is identified, and then filtered, and so on.

In [16]:
path = dataset.filter(dataset.release_date >= '2017-01-01') \
    .link(experiment).filter(experiment.experiment_type == 'OBI:0001271') \
    .link(replicate).filter(replicate.bioreplicate_number == 1)
    
print(path.uri)

https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/release_date::geq::2017-01-01/experiment:=isa:experiment/experiment_type=OBI%3A0001271/replicate:=isa:replicate/bioreplicate_number=1


In [17]:
DataFrame(path.entities())

Unnamed: 0,RCB,RCT,RID,RMB,RMT,bioreplicate_number,biosample,dataset,experiment,technical_replicate_number
0,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:41:16.162698-07:00,1-3T5A,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:41:16.162698-07:00,1,1-3T0A,1-3SWE,1-3SZA,1
1,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:41:16.162698-07:00,1-3T5E,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:41:16.162698-07:00,1,1-3T0E,1-3SWE,1-3SZA,1
2,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:41:16.162698-07:00,1-3T5J,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:41:16.162698-07:00,1,1-3T0J,1-3SWE,1-3SZA,1
3,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:41:16.162698-07:00,1-3T5P,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:41:16.162698-07:00,1,1-3T0P,1-3SWE,1-3SZA,1
4,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:41:16.162698-07:00,1-3T5T,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:41:16.162698-07:00,1,1-3T0T,1-3SWE,1-3SZA,1
5,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:47:38.483851-07:00,1-3T6T,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:47:38.483851-07:00,1,1-3T0Y,1-3SWE,1-3SZE,1
6,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:47:38.483851-07:00,1-3T6Y,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:47:38.483851-07:00,1,1-3T12,1-3SWE,1-3SZE,1
7,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:47:38.483851-07:00,1-3T72,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:47:38.483851-07:00,1,1-3T16,1-3SWE,1-3SZE,1
8,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:47:38.483851-07:00,1-3T76,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:47:38.483851-07:00,1,1-3T1A,1-3SWE,1-3SZE,1
9,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:47:38.483851-07:00,1-3T7A,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-06-13T16:47:38.483851-07:00,1,1-3T1E,1-3SWE,1-3SZE,1


## Linking Examples

### Example: explicit column links
Up until now, the examples have shown how to link entities via _implicit_ join predicates. That is, we knew there existed a foriegn key reference constraint between foreign keys of one entity and keys of another entity. We needed only to ask ERMrest to link the entities in order to get the linked set.

The problem with implicit links is that it become _ambiguous_ if there are more than one foreign key reference between tables. To support these situations, ERMrest and the `DataPath`'s `link(...)` method can specify the columns to use for the link condition, explicitly.

The structure of the `on` clause is:
- an equality comparison operation where
- the _left_ operand is a column of the _left_ table instance which is also the path _context_ before the link method is called, and
- the _right_ operand is a column of the _right_ table instance which is the table _to be linked_ to the path.

In [18]:
path = dataset.link(experiment, on=(dataset.RID==experiment.dataset))
print(path.uri)

https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/experiment:=(RID)=(isa:experiment:dataset)


**IMPORTANT** Not all tables are related by foreign key references. ERMrest does not allow arbitrary relational joins. Tables must be related by a foreign key reference in order to link them in a data path.

In [19]:
DataFrame(path.entities().fetch(limit=3))

Unnamed: 0,RCB,RCT,RID,RMB,RMT,chromatin_modifier,control_assay,dataset,experiment_type,histone_modification,local_identifier,molecule_type,protocol,rnaseq_selection,strandedness,target_of_assay,transcription_factor
0,https://auth.globus.org/a1d30d14-b3b0-49de-854...,2018-06-08T12:40:48.98831-07:00,1-3SD2,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:54:17.199655-08:00,,,1-3SB2,OBI:0002083,,3XhET4_pHsp68-lacZ-tdTomato,,,,,,
1,https://auth.globus.org/a1d30d14-b3b0-49de-854...,2018-06-08T14:29:49.540747-07:00,1-3SHT,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:54:17.199655-08:00,,,1-3SGA,OBI:0002083,,3XcET4_pHsp68-lacZ-tdTomato,,,,,,
2,https://auth.globus.org/a1d30d14-b3b0-49de-854...,2018-06-08T14:47:53.103435-07:00,1-3SNY,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-11-27T16:54:17.199655-08:00,,,1-3SM2,OBI:0002083,,3XhET7_pHsp68-lacZ-tdTomato,,,,,,


### Example: explicit column links combined with table aliasing
As usual, table instances are generated automatically unless we provide a table alias.

In [20]:
path = dataset.link(biosample.alias('S'), on=(dataset.RID==biosample.dataset))
print(path.uri)

https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/S:=(RID)=(isa:biosample:dataset)


Notice that we cannot use the alias right away in the `on` clause because it was not _bound_ to the path until _after_ the `link(...)` operation was performed.

### Example: links with "outer join" semantics
Up until now, the examples have shown "`link`s" with _inner join_ semantics. _Outer join_ semantics can be expressed as part of explicit column links, and _only_ when using explicit column links.

The `link(...)` method accepts a "`join_type`" parameter, i.e., "`.link(... join_type=TYPE)`", where _TYPE_ may be `'left'`, `'right'`, `'full'`, and defaults to `''` which indicates inner join type.

By '`left`' outer joining in the link from `'dataset'` to `'experiment`' and to `'biosample'`, and then reseting the context of the path to `'dataset'`, the following path gives us a reference to `'dataset'` entities that _whether or not_ they have any experiments or biosamples.

In [21]:
# Notice in between `link`s that we have to reset the context back to `dataset` so that the
# second join is also left joined from the dataset table instance.
path = dataset.link(experiment.alias('E'), on=dataset.RID==experiment.dataset, join_type='left') \
              .dataset \
              .link(biosample.alias('S'), on=dataset.RID==biosample.dataset, join_type='left') \

# Notice that we have to perform the attribute fetch from the context of the `path.dataset`
# table instance.
results = path.dataset.attributes(path.dataset.RID, 
                                  path.dataset.title, 
                                  path.E.experiment_type, 
                                  path.S.species)

print(results.uri)
len(results)

https://www.facebase.org/ermrest/catalog/1/attribute/dataset:=isa:dataset/E:=left(RID)=(isa:experiment:dataset)/$dataset/S:=left(RID)=(isa:biosample:dataset)/$dataset/dataset:RID,dataset:title,E:experiment_type,S:species


815

We can see above that we have a full set of datasets _whether or not_ they have any experiments with biosamples. For further evidence, we can convert to a DataFrame and look at a slice of its entries. Note that the biosample's 'species' and 'stage' attributes do not exist for some results (i.e., `NaN`) because those attributes did not exist for the join condition.

In [22]:
DataFrame(results)[:10]

Unnamed: 0,RID,experiment_type,species,title
0,1-3SB2,OBI:0002083,NCBITAXON:10090,Activity of human neural crest enhancer near G...
1,1-3SGA,OBI:0002083,NCBITAXON:10090,Activity of chimp neural crest enhancer near G...
2,1-3SM2,OBI:0002083,NCBITAXON:10090,Activity of human neural crest enhancer near F...
3,1-3SQJ,OBI:0002083,NCBITAXON:10090,Activity of chimp neural crest enhancer near F...
4,1-3SVP,,,FB0023_18mo male with hypertelorism_Candidate ...
5,1-3SVT,,,FB0043_3 male cousins with natal teeth and ano...
6,1-3SVY,,,FB0051_6 year old male with hypertelorism (mar...
7,1-3SW2,,,FB0064_Male with Congenital craniosynostosis_C...
8,1-3SW6,,,FB0115_11yo male with dysmorphic facial featur...
9,1-3SWA,,,FB0122_29yo female with bilateral hearing loss...


## Faceting Examples
You may have noticed that in the examples above, the 'species' and 'experiment_type' attributes are identifiers ('CURIE's to be precise). We may want to construct filters on our datasets based on these categories. This can be used for "faceted search" modes and can be useful even within the context of programmatic access to data in the catalog.

### Example: faceting on "related" tables
Let's say we want to find all of the biosamples in our catalog where their species are 'Mus musculus' and their age stage are 'E10.5'.

We need to extend our understanding of the data model with the following tables that are related to '`biosample`'.
- `isa.biosample.species -> vocab.species`: the biosample table has a foreign key reference to the '`species`' table.
- `isa.biosample.stage -> vocab.stage`: the biosample table has a foreign key reference to the '`stage`' table.

We may say that `species` and `stage` are _related_ to the `biosample` table in the sense that `biosample` has a direct foreign key relationship from it to them.

For convenience, we will get local variables for the species and stage tables.

In [23]:
species = pb.vocab.species
stage = pb.vocab.stage

First, let's link samples with species and filter on the term "Mus musculus" (i.e., "mouse").

In [24]:
# Here we have to use the container `columns_definitions` because `name` is reserved
path = biosample.alias('S').link(species).filter(species.column_definitions['name'] == 'Mus musculus')
print(path.uri)

https://www.facebase.org/ermrest/catalog/1/entity/S:=isa:biosample/species:=vocab:species/name=Mus%20musculus


Now the _context_ of the path is the `species` table instance, but we need to link from the `biosample` to the age `stage` table.

To do so, we reference the `biosample` table instance, in this case using its alias `S`. Then we link off of that table instance which updates the `path` itself.

In [25]:
path.S.link(stage).filter(stage.column_definitions['name'] == 'E10.5')
print(path.uri)

https://www.facebase.org/ermrest/catalog/1/entity/S:=isa:biosample/species:=vocab:species/name=Mus%20musculus/$S/stage:=vocab:stage/name=E10.5


Now, the path _context_ is the age `stage` table instance, but we wanted to get the entities for the `biosample` table. To do so, again we will reference the `biosample` table instance by the alias `S` we used. From there, we will call the `entities(...)` method to get the samples.

In [26]:
results = path.S.attributes(path.S.RID,
                            path.S.collection_date,
                            path.species.column_definitions['name'].alias('species'),
                            path.species.column_definitions['uri'].alias('species_uri'),
                            path.stage.column_definitions['name'].alias('stage'),
                            path.stage.column_definitions['uri'].alias('stage_uri'))
print(results.uri)

https://www.facebase.org/ermrest/catalog/1/attribute/S:=isa:biosample/species:=vocab:species/name=Mus%20musculus/$S/stage:=vocab:stage/name=E10.5/$S/S:RID,S:collection_date,species:=species:name,species_uri:=species:uri,stage:=stage:name,stage_uri:=stage:uri


In [27]:
DataFrame(results)

Unnamed: 0,RID,collection_date,species,species_uri,stage,stage_uri
0,2XHT,2015-07-14,Mus musculus,https://www.facebase.org/id/1-4FZJ,E10.5,https://www.facebase.org/id/1-4GJ0
1,2XST,2015-07-14,Mus musculus,https://www.facebase.org/id/1-4FZJ,E10.5,https://www.facebase.org/id/1-4GJ0
2,2XXT,,Mus musculus,https://www.facebase.org/id/1-4FZJ,E10.5,https://www.facebase.org/id/1-4GJ0
3,2XXY,,Mus musculus,https://www.facebase.org/id/1-4FZJ,E10.5,https://www.facebase.org/id/1-4GJ0
4,2XYA,,Mus musculus,https://www.facebase.org/id/1-4FZJ,E10.5,https://www.facebase.org/id/1-4GJ0
5,2XYJ,,Mus musculus,https://www.facebase.org/id/1-4FZJ,E10.5,https://www.facebase.org/id/1-4GJ0
6,2Y1A,,Mus musculus,https://www.facebase.org/id/1-4FZJ,E10.5,https://www.facebase.org/id/1-4GJ0
7,2Y4E,,Mus musculus,https://www.facebase.org/id/1-4FZJ,E10.5,https://www.facebase.org/id/1-4GJ0
8,2YBY,2015-07-14,Mus musculus,https://www.facebase.org/id/1-4FZJ,E10.5,https://www.facebase.org/id/1-4GJ0
9,2YMA,2015-07-14,Mus musculus,https://www.facebase.org/id/1-4FZJ,E10.5,https://www.facebase.org/id/1-4GJ0
