# Downloading files from MDF

Entries in MDF consist of metadata. In some cases, this metadata is enough information. However, if access to the raw data is required (and the data is hosted publically), the Forge methods `Forge.http_download()`, `Forge.globus_download()`, `Forge.http_stream()`, and `Forge.http_return()` provide easy access.

In [1]:
from mdf_forge.forge import Forge

In [2]:
mdf = Forge()

## Search for entries and save results

In [3]:
results = mdf.match_sources("fe_cr_al_oxidation").match_field("mdf.resource_type", "record").search()
print(len(results))

1246


## Download the raw data files

`Forge.http_download` will download up to 10 files directly to disk. `Forge.globus_download()` will download unlimited files directly to disk.

The `preserve_dir` flag will preserve the directory structure of the source. If `False` (which is the default), all of the files will be saved in the same `dest` directory.

In [4]:
# This line actually downloads the data to your hard drive. Uncomment only if you want the files.
#mdf.http_download(results[:5], dest=".", preserve_dir=True)

Fetching files: 100%|██████████| 5/5 [00:02<00:00,  1.98it/s]


In [5]:
# This line actually downloads the data to your hard drive. Uncomment only if you want the files.
#mdf.globus_download(results, dest=".", preserve_dir=True)

Processing records: 100%|██████████| 1246/1246 [00:00<00:00, 5557.86it/s]
Submitting transfers:   0%|          | 0/1 [00:00<?, ?it/s]

Transferring...


Submitting transfers: 100%|██████████| 1/1 [01:33<00:00, 93.39s/it]

All transfers submitted
Task IDs: 040f9c42-868f-11e7-a92c-22000a92523b





['040f9c42-868f-11e7-a92c-22000a92523b']

## Use the files in a script

`Forge.http_stream()` will stream the files in a script. `Forge.http_return()` will return all of the files all at once. The streaming method uses a Python generator to `yield` the data.

In [6]:
files = mdf.http_stream(results[:5])
# These methods are functionally identical for this example, but http_return is less memory-efficient
#files = mdf.http_return(results[:5])

In [7]:
for f in files:
    sum_vals = 0
    tot_vals = 0
    for line in f.split("\n"):
        if line.strip():
            sum_vals += float(line.split()[1])
            tot_vals += 1
    print("Average:", sum_vals/tot_vals)

Average: 6897.081023454158
Average: 6839.053304904051
Average: 7838.027718550106
Average: 6969.955223880597
Average: 8386.247334754797
