# DataPath Example 2
This notebook gives a very basic example of how to access data. 
It assumes that you understand the concepts presented in the 
example 1 notebook.

In [1]:
# Import deriva modules
from deriva.core import ErmrestCatalog, get_credential

In [2]:
# Connect with the deriva catalog
protocol = 'https'
hostname = 'www.facebase.org'
catalog_number = 1
credential = get_credential(hostname)
catalog = ErmrestCatalog(protocol, hostname, catalog_number, credential)

In [3]:
# Get the path builder interface for this catalog
pb = catalog.getPathBuilder()

## DataPaths
The `PathBuilder` object allows you to begin `DataPath`s from the base `Table`s. A `DataPath` begins with a `Table` (or an `TableAlias` to be discussed later) as its "root" from which one can "`link`", "`filter`", and fetch its "`entities`".

### Start a path rooted at a table from the catalog
We will reference a table from the PathBuilder `pb` variable from above. Using the PathBuilder, we will reference the "isa" schema, then the "dataset" table, and from that table start a path.

In [4]:
path = pb.schemas['isa'].tables['dataset'].path

We could have used the more compact dot-notation to start the same path.

In [5]:
path = pb.isa.dataset.path

### Getting the URI of the current path
All DataPaths have URIs for the referenced resources in ERMrest. The URI identifies the resources which are available through "RESTful" Web protocols supported by ERMrest.

In [6]:
print(path.uri)

https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset


## ResultSets
The data from a DataPath are accessed through a pythonic container object, the `ResultSet`. The `ResultSet` is returned by the DataPath's `entities()` and other methods.

In [7]:
results = path.entities()

### Fetch entities from the catalog
Now we can get entities from the server using the ResultSet's `fetch()` method.

In [8]:
results.fetch()

<deriva.core.datapath.ResultSet at 0x113547fd0>

`ResultSet`s behave like python containers. For example, we can check the count of rows in this ResultSet.

In [9]:
len(results)

815

**Note**: If we had not explicitly called the `fetch()` method, then it would have been called implicitly on the first container operation such as `len(...)`, `list(...)`, `iter(...)` or get item `[...]`.

### Get an entity
To get one entity from the set, use the usual container operator to get an item.

In [10]:
results[9]

{'id': 14197,
 'accession': 'FB00000953',
 'title': 'FB0109_Male with Robin sequence, cleft palate, midface hypoplasia, round almond-shaped eyes, negative CMA, positive family Hx_Candidate Gene: HOXB2',
 'project': 309,
 'funding': None,
 'summary': None,
 'description': "The purpose of this study is to collect, process, and study samples from individuals with known or possible genetic disease, and their family members. The study’s broad goals are to better understand the genetic causes of disease in order to improve the ability to diagnose, treat, and even prevent illness. Our goal is to obtain a genetic diagnosis for health problem(s) the proband has, so the information can be used, when appropriate, to guide medical decisions made by the affected individuals doctor.\n\n**This is restricted-access human data.**  To gain access to this data, you must first go through the [process outlined here](/odocs/data-guidelines/).\n\nThis case was brought to the attention of FaceBase from Dr. Pe

### Get a specific attribute value from an entity
To get one attribute value from an entity get the item using its `Column`'s `name` property.

In [11]:
dataset = pb.schemas['isa'].tables['dataset']
print (results[9][dataset.accession.name])

FB00000953


## Fetch a Limited Number of Results
To set a limit on the number of results to be fetched from the catalog, use the explicit `fetch(limit=...)` method with the desired upper limit to fetch from the catalog.

In [12]:
results.fetch(limit=3)
len(results)

3

### Iterate over the ResultSet
`ResultSet`s are iterable like a typical container.

In [13]:
for entity in results:
    print(entity[dataset.accession.name])

FB00000970
FB00000998
FB00000936


## Convert to Pandas DataFrame
ResultSets can be transformed into the popular Pandas DataFrame.

In [14]:
results.dataframe

Unnamed: 0,RCB,RCT,RID,RMB,RMT,_keywords,accession,description,funding,human_anatomic,id,mouse_genetic,project,release_date,released,show_in_jbrowse,study_design,summary,title
0,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-05-02T10:59:18.781231-07:00,3TYP,https://auth.globus.org/b153e992-d274-11e5-8df...,2018-09-05T10:44:44.083799-07:00,,FB00000970,Single Cell RNA-Seq libraries are from C57BL/6...,,,14214,,302,2018-05-02,True,False,,,"Single Cell Sequencing - Coronal Suture, Wild ..."
1,https://auth.globus.org/f8ae714f-6015-48da-971...,2018-08-09T08:59:44.686625-07:00,1-3X0M,https://auth.globus.org/b506963e-d274-11e5-99f...,2018-10-29T18:59:40.882334-07:00,,FB00000998,RNA-Seq libraries are from laser capture micro...,,,14242,,302,2018-08-29,True,True,,,"Lambdoid Suture, WT and Fgfr2+/S252W (Apert sy..."
2,https://auth.globus.org/93416595-2bc4-42a8-b5b...,2018-02-21T15:30:25.884648-08:00,2AC6,https://auth.globus.org/bb256144-d274-11e5-adb...,2018-11-28T14:21:46.302023-08:00,Integrated research of functional genomics and...,FB00000936,,,,14180,,300,2018-03-05,True,True,,E10.5 mandible of mutants and controls were di...,RNAseq of Wnt1cre;Alk5fl/fl mutants and contro...


## Selecting Attributes
It is also possible to fetch only a subset of attributes from the catalog. The `attributes(...)` method accepts a variable argument list followed by keyword arguments. Each argument must be a `Column` object from the table's `columns` container.

### Renaming selected attributes
To rename the selected attributes, use "named" (a.k.a., "keyword") arguments in the method. For example, `attributes(..., new_name=table.column)` will rename `table.column` with `new_name` in the entities returned from the server. (It will not change anything in the stored catalog data.) Note that in python, the named arguments _must come after_ positional arguments.

In [15]:
results = path.attributes(dataset.accession, dataset.title, is_released=dataset.released).fetch(limit=5)

### Convert to list
Now we can look at the results from the above fetch. To demonstrate a different access mode, we can convert the entities to a standard python list and dump to the console.

In [16]:
list(results)

[{'accession': 'FB00000970',
  'title': 'Single Cell Sequencing - Coronal Suture, Wild Type, E18.5 and P10',
  'is_released': True},
 {'accession': 'FB00000998',
  'title': 'Lambdoid Suture, WT and Fgfr2+/S252W (Apert syndrome mouse model), E16.5 and E18.5',
  'is_released': True},
 {'accession': 'FB00000936',
  'title': 'RNAseq of Wnt1cre;Alk5fl/fl mutants and controls at E10.5',
  'is_released': True},
 {'accession': 'FB00000982',
  'title': 'FB0064_Male with Congenital craniosynostosis_Candidate Gene: FREM2',
  'is_released': True},
 {'accession': 'FB00000858',
  'title': 'Wild Type E18.5 Frontal Suture Images',
  'is_released': True}]