# Aggregating data with MDF

Searches using `Forge.search()` are limited to 10,000 results. However, there are two methods to circumvent this restriction: `Forge.aggregate_source()` and `Forge.aggregate()`.

In [None]:
import json
from mdf_forge.forge import Forge

In [None]:
mdf = Forge()

## aggregate_source - NIST XPS DB
Example: We want to collect all records from the NIST XPS Database and analyze the quality metrics. This database has almost 30,000 records, so we have to use `aggregate()`.

In [None]:
# First, let's aggregate all the nist_xps_db data.
all_entries = mdf.aggregate_source_names("nist_xps_db")
print(len(all_entries))

In [None]:
# Now, let's parse out the "Quality of Data" and print te results for analysis.
qualities = {}
for record in all_entries:
    if record["mdf"]["resource_type"] == "record":
        raw = json.loads(record["mdf"]["raw"])
        if raw["Quality of Data"] in qualities.keys():
            qualities[raw["Quality of Data"]] += 1
        else:
            qualities[raw["Quality of Data"]] = 1
print(qualities)

## aggregate - Multiple Datasets
Example: We want to analyze how often elements are studied with Gallium (Ga), and what the most frequent elemental pairing is. There are more than 10,000 records containing Gallium data.

In [None]:
# First, let's aggregate everything that has "Ga" in the list of elements.
all_results = mdf.aggregate("material.elements:Ga")
print(len(all_results))

In [None]:
# Now, let's parse out the other elements in each record and keep a running tally to print out.
elements = {}
for record in all_results:
    if record["mdf"]["resource_type"] == "record":
        elems = record["material"]["elements"]
        for elem in elems:
            if elem in elements.keys():
                elements[elem] += 1
            else:
                elements[elem] = 1
print(json.dumps(elements, sort_keys=True, indent=4, separators=(',', ': ')))