In [1]:
from mdf_forge.forge import Forge  # This is the only required import for Forge.
import json
import os


# Authentication
Authentication is handled automatically by virtue of autenticating with workbench using
GlobusAuth. Json keys for each scope are stored in the workbench user's home directory. We retreive
this as a directory to pass into Forge.


In [2]:
token_path="%s/.globus/oauth2.json" % (os.environ["NDSLABS_HOME"])
with open(token_path, "r") as tf:
        tokens = json.load(tf)

In [3]:
# You can set up Forge with no arguments. Forge will automatically authenticate and connect to MDF.
mdf = Forge(oauth_tokens=tokens)

# Basic Queries

### Basic full text search
Using the `search()` method, you can perform a basic text search of the data in MDF.
You will get back a list of matching entries (up to 10,000).

Let's say we want to find data on aluminum. We can just search for "Al" like so:

In [None]:
res = mdf.search("Al")
res[0]

### Advanced-mode searches
You can also query more precisely with the `advanced=True` argument. The basic use is the form `key.subkey:value`. The full documentation for the query syntaz can be found here: http://globus-search-docs.s3-website-us-east-1.amazonaws.com/stable/api/search.html#_query_syntax

In this example, we can search for "Al" inside the "mdf.elements" key.

We're also going to limit the number of results to 10.

In [4]:
res = mdf.search("mdf.elements:Al", advanced=True, limit=10)
res[0]

{'mdf': {'collection': 'SLUSCHI',
  'composition': 'Al104',
  'elements': ['Al'],
  'ingest_date': '2017-08-04T19:56:23.221883Z',
  'links': {'landing_page': 'http://blogs.brown.edu/qhong/?page_id=102#330',
   'outcar': {'globus_endpoint': '82f1b5c6-6e9b-11e5-ba47-22000b92c6ec',
    'http_host': 'https://data.materialsdatafacility.org',
    'path': '/collections/sluschi/sluschi/Dir_CoexRun/1200/1200/1000/anal/OUTCAR'},
   'parent_id': '5984ce6bf2c004385fd54cd4'},
  'mdf_id': '5984d167f2c004385fd54e1e',
  'metadata_version': '0.3.2',
  'resource_type': 'record',
  'scroll_id': 330,
  'source_name': 'sluschi',
  'tags': ['outcar'],
  'title': 'SLUSCHI - Al104'}}

If you want to search on a value with special characters, such as a colon or space, you must wrap the value in double quotes. Otherwise, you may get unexpected results.

In [5]:
res = mdf.search('mdf.title:"ChEMBL Database"', advanced=True)
res[0]

{'mdf': {'citation': ["A.P. Bento, A. Gaulton, A. Hersey, L.J. Bellis, J. Chambers, M. Davies, F.A. Krüger, Y. Light, L. Mak, S. McGlinchey, M. Nowotka, G. Papadatos, R. Santos and J.P. Overington (2014) 'The ChEMBL bioactivity database: an update.' Nucleic Acids Res., 42 1083-1090. DOI: 10.1093/nar/gkt1031 PMID: 24214965",
   "M. Davies, M. Nowotka, G. Papadatos, F. Atkinson, G.J.P. van Westen, N Dedman, R. Ochoa and J.P. Overington  (2014) 'myChEMBL: A Virtual Platform for Distributing Cheminformatics Tools and Open Data' Challenges 5 (334-337) DOI: 10.3390/challe5020334",
   'S. Jupp, J. Malone, J. Bolleman, M. Brandizi, M. Davies, L. Garcia, A. Gaulton, S. Gehant, C. Laibe, N. Redaschi, S.M Wimalaratne, M. Martin, N. Le Novère, H. Parkinson, E. Birney and A.M Jenkinson (2014) The EBI RDF Platform: Linked Open Data for the Life Sciences Bioinformatics 30 1338-1339 DOI: 10.1093/bioinformatics/btt765 PMID: 24413672'],
  'collection': 'ChEMBL db',
  'data_contact': {'email': 'jpo@ebi.a