In [1]:
from mdf_forge.forge import Forge

In [2]:
mdf = Forge()

# Data Retrieval

### globus_download
If you want to access the raw data underlying entries in MDF, you can use `globus_download()` and provide the results from `search()` or `aggregate()`. You can customize how the data files are delivered by specifying a destination path to `dest` (default local directory) and/or setting `preserve_dir=True` if you want to recreate the directory structure of the original data.

In order to use `globus_download()` to download to your computer, you must be running [Globus Connect Personal](https://www.globus.org/app/endpoints/create-gcp) . If you want to download to a different computer (which must be a Globus Endpoint), you have to specify `dest_ep=ID_of_destination_endpoint`.

Please note that while _almost_ all data in MDF is accessible through a Globus Endpoint, there may be some entries that are not. A few datasets may be hosted elsewhere and only accessible through HTTP (see `http_download()`) or hosted elsewhere in a custom, non-programmatic configuration. You may have to use the `"mdf.links.landing_page"` web address to fetch these datasets.

In [3]:
# Running this example will save a file in the current directory.
res = mdf.search("mdf.tags:DFT AND mdf.resource_type:record", limit=1)
mdf.globus_download(res)

Multiple endpoints found:
1 :  jgaff_laptop 	 ce6d512a-b414-11e7-b0a7-22000a92523b
2 :  MDF AWS MRDP 	 b6cbf972-aded-11e7-afcb-22000a92523b
3 :  MDF Open Connect 	 1d14558e-aebc-11e7-b018-22000a92523b

Please choose the endpoint on this machine
Enter the number of the correct endpoint (-1 to cancel): 1


Processing records: 100%|██████████| 1/1 [00:00<00:00,  3.20it/s]
Submitting transfers: 100%|██████████| 1/1 [00:10<00:00, 10.86s/it]

All transfers submitted
Task IDs: 62508454-bf12-11e7-9473-22000a8cbd7d





['62508454-bf12-11e7-9473-22000a8cbd7d']

### http_download
For small data, using Globus is not necessary. You can instead download data using HTTP(S). Except for the endpoint ID, the arguments are the same as `globus_download()`.

In [4]:
# Running this example will save a file in the current directory.
res = mdf.search("mdf.source_name:janaf AND mdf.resource_type:record", limit=1)
mdf.http_download(res)

Fetching files: 100%|██████████| 1/1 [00:01<00:00,  1.67s/it]


### http_stream
If you want to use the data you're downloading directly in your code, you can use `http_stream()` to have the data `yield`-ed to you one entry at a time.

In [5]:
res = mdf.search("AlCu", limit=1)
raw_data = mdf.http_stream(res)
next(raw_data)

'This file contains the embrittling potenciesthat might not be suitable for quantitative analysis.,,,,,\rSolvent,Solute,Boundary Studied,Method,"XC Functional, if DFT","Embrittling Potency (kJ/mol, positive denotes a lowering of boundary cohesion)"\rNi,V,Sigma 5 (012)[100],DFT - Slab,Norm-Conserving Pseudopotentials with LDA,72.3\rFe,Cr,Sigma 5 (210),DFT - PBC\'S,GGA,-14.9\rFe,Mo,Sigma 5 (210),DFT - PBC\'S,GGA,10.1\rFe,Nb,Sigma 5 (210),DFT - PBC\'S,GGA,19.5\rFe,Mn,Sigma 11 [1-10]/(11-3),DFT DMOL,?,197.5\rFe,Cr,Sigma 11 [1-10]/(11-3),DFT DMOL,?,-106\rFe,N,N/A,DFT with -  bad for Fe,LDA,\rFe,P,Sigma 3 [1-10](111),DFT DMOL,?,95.4\rNi,H,Sigma 5 (210),DFT FLAPW - slab,GGA,0.3\rFe,H,Sigma 5 [001](310),MD - EAM,,144.5\rNi,He,Sigma 5 (210),DFT-Slab FLAPW,GGA,240.9\rFe,N,{111},MD - Finnis-Sinclair,,139\rFe,O,{111},MD - Finnis-Sinclair,,132\rFe,S,{111},MD - Finnis-Sinclair,,556\rFe,H,{111},MD - Finnis-Sinclair,,23\rFe,P,{111},MD - Finnis-Sinclair,,355\rFe,C,{111},MD - Finnis-Sinclair,,68\rFe,B,{