In [None]:
from mdf_forge.forge import Forge

In [None]:
mdf = Forge()

# Generally Useful Help

### current_query
You can see the query you're currently building with `current_query()`.

In [None]:
mdf.match_field("mdf.source_name", "oqmd")
mdf.current_query()

### reset_query
If you have a query in memory that you don't want, you can use `reset_query()` to start a new query. This method will clear the current query entirely.

In [None]:
mdf.reset_query()

In [None]:
mdf.current_query()

### Query info
We can build a query using `exclude_field()` and `match_field()` and execute it with `search()`. But if you are interested in knowing more about the query, including the actual query string that was made, you can use the `info=True` argument to `search()`.

In [None]:
mdf.exclude_field("mdf.source_name", "sluschi").match_field("material.elements", "Al").exclude_field("mdf.source_name", "oqmd")
res, info = mdf.search(limit=10, info=True)

When you use the `info=True` argument, `search()` will return a tuple instead of a list. The first element in the tuple will be the same list of results you're used to, but the second tuple element will be a dictionary of query info.

In [None]:
res[0]

In [None]:
info

### Repeat a query
You can stop a query from being cleared out of memory after a search by using the `reset_query=False` argument.

In [None]:
mdf.match_field("mdf.source_name", "nist_xps_db")

In [None]:
res, info = mdf.search(limit=10, info=True, reset_query=False)
info["query"]["q"]

In [None]:
res, info = mdf.search(limit=10, info=True)
info["query"]["q"]

### show_fields
How do you know what fields there are to search on? Use `show_fields()` to find out. If you just call `show_fields()` by itself, it will show you all of the top-level blocks (such as "mdf").

In [None]:
mdf.show_fields()

If you give `show_fields()` a top-level block, it will show you the mapping for that block, including the expected datatypes.

In [None]:
mdf.show_fields("mdf")

# Fetching Datasets

### fetch_datasets_from_results
This method allows you to automatically collect all the datasets that have records returned from a search. In other words, if you search for `mdf.elements:Al` and a _record_ from OQMD is returned, you can pass that record to `fetch_datasets_from_results()` and get the OQMD _dataset_ entry back.

In [None]:
records = mdf.search("dft.converged:true AND mdf.resource_type:record")

In [None]:
res = mdf.fetch_datasets_from_results(records)
res[0]

If you don't want to keep the results at all, you can also use `fetch_datasets_from_results()` to execute a search and use those results instead of passing it your own results.

In [None]:
res = mdf.match_field("material.elements", "Al").fetch_datasets_from_results()
res[0]

# Aggregations

### aggregate
Queries submitted with `search()` are limited to returning 10,000 results. If this limit is too low, you can use `aggregate()` to retrieve _all_ results from a query, no matter how many. Please be careful with this function, as you can easily accidentally retrieve a very large number of results without meaning to. Consider using `search(your_query, limit=0, info=True)` (see above) first to discover how many results you will get beforehand.

For this example, we will see how many results the query will retrieve before aggregating.

In [None]:
mdf.match_field("mdf.source_name", "oqmd").match_field("material.elements", "Pb").exclude_field("material.elements", "Al")
res, info = mdf.search(limit=0, info=True, reset_query=False)
print("Number of results:", info["total_query_matches"])

Assuming we want all of these results, we can use `aggregate()` on the same query.

In [None]:
res = mdf.aggregate()
print("Number of results:", len(res))