# NOMAD Repository Tutorial

This notebook demonstrates how to access and use the NOMAD (Novel Materials Discovery) repository, an open materials science database.

## What is NOMAD?

NOMAD is a large-scale repository and archive for materials science data, particularly computational results. It contains:
- DFT calculations
- Crystal structures
- Electronic properties
- Thermodynamic data
- And much more!

## Resources

- **Website**: https://nomad-lab.eu/
- **Documentation**: https://nomad-lab.eu/prod/v1/docs/
- **API Documentation**: https://nomad-lab.eu/prod/v1/api/v1/extensions/docs

## Installation

```bash
pip install nomad-lab
# OR just use requests for API access
pip install requests
```

In [41]:
import requests
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pprint import pprint

## NOMAD API Basics

NOMAD's API is RESTful and doesn't require authentication for public data access.

Base URL: `https://nomad-lab.eu/prod/v1/api/v1`

In [42]:
# NOMAD API base URL
BASE_URL = "https://nomad-lab.eu/prod/v1/api/v1"

# Common headers
headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/json'
}

## Simple Search: Find Materials

Let's search for lithium-containing materials

In [43]:
# Search for Li-containing materials (fixed)

query = {
    "query": {
        "results.material.elements": "Li"
    },
    "pagination": {
        "page_size": 10
    },
    "required": {
        "include": [
            "entry_id",
            "results.material.chemical_formula_reduced",
            "results.material.structural_type"
            # removed invalid band gap field
        ]
    }
}

response = requests.post(
    f"{BASE_URL}/entries/query",
    headers=headers,
    json=query
)

data = response.json()

# Always sanity-check the response
if response.status_code != 200:
    print("Request failed:", response.status_code)
    print(data)
else:
    # Be defensive about where "total" lives
    total = (
        data.get("pagination", {}).get("total") or
        data.get("meta", {}).get("total") or
        data.get("total") or
        "unknown"
    )

    results = data.get("data", [])

    print(f"Found {total} Li-containing entries")
    print(f"\nShowing first {len(results)} results:")

    for entry in results:
        formula = entry.get("results", {}).get("material", {}).get(
            "chemical_formula_reduced", "N/A"
        )
        entry_id = entry.get("entry_id", "N/A")
        print(f"  {entry_id}: {formula}")


Found 864867 Li-containing entries

Showing first 10 results:
  ---OWGSfJGTXSTY71gcX0l_HFO9A: LiSr2Zr
  ---g0sxp4SCw_qXNyXQv8SacDS10: HgInLiPb2
  ---pRcX7NG_XDx_4ufUaeEnZnmrO: AsLiTi2
  ---sltur0gBkJQUobdR596MoEZZU: CrLi4Pd
  --0TWjKKNj9HLjh3f46XUqenSjm-: CsGe4Li12Na3O16
  --0ttXeSYjWlLRMm2OIrzujNOxCb: H4Li4
  --2bcdlfx7ymcDSlr2BF7l6PSBkm: Li2O6Rh2Te
  --2cfmhLo8lqgiB2sosnn9Z2R0Ha: Ac3LiO9Pt2
  --32_Acudfq_GmS30Xh7Hx6mG3_z: CuLiRe2
  --3_nOBXqml3RI6t_FLQc_m-nPTx: I3Li3


## Search for Specific Compound: LiFePO4

Let's find all entries for LiFePO4 (a common battery material)

In [44]:
import pandas as pd
import requests
from collections import Counter
import re

def parse_formula(formula: str):
    """
    Very small formula parser for strings like 'Fe2Li3O14P2' (no parentheses).
    Returns Counter of element -> count (int).
    """
    if not formula:
        return None
    tokens = re.findall(r"([A-Z][a-z]?)(\d*)", formula)
    if not tokens:
        return None
    c = Counter()
    for el, num in tokens:
        c[el] += int(num) if num else 1
    return c

TARGET = Counter({"Li": 1, "Fe": 1, "P": 1, "O": 4})
ALLOWED_ELEMENTS = set(TARGET.keys())

lifep_rows = []
page_size = 200
max_pages = 20  # safety cap

page = 1
for _ in range(max_pages):
    query = {
        "query": {
            "results.material.elements": ["Li", "Fe", "P", "O"]
        },
        "pagination": {
            "page_size": page_size,
            "page": page
        },
        "required": {
            "include": [
                "entry_id",
                "results.material.chemical_formula_reduced",
                "results.material.structural_type"
            ]
        }
    }

    response = requests.post(f"{BASE_URL}/entries/query", headers=headers, json=query)
    data = response.json()

    if response.status_code != 200:
        print("Request failed:", response.status_code)
        print(data)
        break

    rows = data.get("data", [])
    if not rows:
        break

    for entry in rows:
        results = entry.get("results", {})
        material = results.get("material", {})
        formula = material.get("chemical_formula_reduced")

        comp = parse_formula(formula)
        if not comp:
            continue

        # Must be exactly LiFePO4: only Li/Fe/P/O elements AND exact stoichiometry 1:1:1:4
        if set(comp.keys()) == ALLOWED_ELEMENTS and comp == TARGET:
            lifep_rows.append({
                "entry_id": entry.get("entry_id"),
                "formula": formula,
                "structural_type": material.get("structural_type"),
            })

    # stop early if you already found some
    if len(lifep_rows) >= 20:
        break

    page += 1

df_lifep = pd.DataFrame(lifep_rows)
print(f"Found {len(df_lifep)} entries matching exact LiFePO4 stoichiometry (Li1 Fe1 P1 O4).")
print(df_lifep.head(20))


Found 24 entries matching exact LiFePO4 stoichiometry (Li1 Fe1 P1 O4).
                        entry_id  formula structural_type
0   -0WLGMXIAzWMdptBwdg_oIm6mwIE  FeLiO4P            bulk
1   -8cUk0FONcmvFwnVeDP7LJknbdUg  FeLiO4P            bulk
2   -PRG0Pcfy7iJsN6pYRLTg2CvvpRY  FeLiO4P            bulk
3   -S5lLVd4mms2DNIbvDFTgkamoMQH  FeLiO4P            bulk
4   -_jD7njBGmMLhdNkr4KF32J7gcSP  FeLiO4P            bulk
5   0KhH9_CG7qDXqICK0ScZrRm1JuJy  FeLiO4P            bulk
6   0XnhQqEKMVWdWTq3tlQVEXGSzBZh  FeLiO4P            bulk
7   0j_ItjHHdveAusconu_h2ObRyT0_  FeLiO4P            bulk
8   14VtntFqJxJnGVZ1W0Adbf9WnS_H  FeLiO4P            bulk
9   1cZvUYAiTGZXVwzT_LTjIDNp0O6l  FeLiO4P            bulk
10  2VKkBrC9dRHx8MXzCn0CPbflF-Y5  FeLiO4P            bulk
11  3CDg2K-Bsq8Yab9XfJ5YAe2a7JP4  FeLiO4P            bulk
12  3QD0gLKaCET0K12lgj8st3z9YUun  FeLiO4P            bulk
13  3QJdstaLs7pC3VN51tYBKM5XBTnA  FeLiO4P     unavailable
14  3lxiQzu284zkfrjB6zxH9kvADL26  FeLiO4P            bulk
1

## Get Detailed Information for a Specific Entry

Let's download complete data for one entry

In [45]:
import json
import requests
from pathlib import Path

# Pick an entry_id from the matches we actually found
if "df_lifep" in globals() and len(df_lifep) > 0:
    entry_id = df_lifep.iloc[0]["entry_id"]
elif "lifep_rows" in globals() and len(lifep_rows) > 0:
    entry_id = lifep_rows[0]["entry_id"]
else:
    raise ValueError("No LiFePO4 matches available. Run the LiFePO4 filtering cell first.")

print(f"Fetching details for entry: {entry_id}")

# Fetch full entry
response = requests.get(f"{BASE_URL}/entries/{entry_id}", headers=headers)
entry_details = response.json()

if response.status_code != 200:
    print("Request failed:", response.status_code)
    print(json.dumps(entry_details, indent=2)[:2000])
else:
    # Save to file (local "download")
    out_path = Path(f"entry_{entry_id}.json")
    out_path.write_text(json.dumps(entry_details, indent=2))
    print(f"Saved full entry JSON to: {out_path.resolve()}")

    # Quick peek at structure
    print("\nTop-level keys:")
    print(list(entry_details.keys()))

    # Try to print material summary if present
    data_block = entry_details.get("data", entry_details)  # some APIs wrap in 'data', some don't
    results = data_block.get("results", {})
    material = results.get("material", {})

    if material:
        print("\nMaterial Information:")
        print(f"  Formula: {material.get('chemical_formula_reduced')}")
        print(f"  Elements: {material.get('elements')}")
        print(f"  Structural type: {material.get('structural_type')}")


Fetching details for entry: -0WLGMXIAzWMdptBwdg_oIm6mwIE
Saved full entry JSON to: G:\My Drive\teaching\5540-6640 Materials Informatics\MaterialsInformatics\worked_examples\NOMAD_example\entry_-0WLGMXIAzWMdptBwdg_oIm6mwIE.json

Top-level keys:
['entry_id', 'required', 'data']

Material Information:
  Formula: FeLiO4P
  Elements: ['Fe', 'Li', 'O', 'P']
  Structural type: bulk


## Search with Property Filters

Find materials with specific band gaps

In [None]:
#TBD

## Download Crystal Structure Data

NOMAD allows you to download crystal structure in various formats

In [57]:
#TBD

## Comparing NOMAD with Materials Project

Key Differences:

| Feature | NOMAD | Materials Project |
|---------|-------|------------------|
| Data Type | All computational data | Curated DFT results |
| Access | No API key required | Requires API key |
| Data Volume | Very large (~millions) | Large (~150k) |
| Upload | Yes (open repository) | No (curated only) |
| Quality Control | Minimal (raw data) | High (curated) |
| Standardization | Variable | Highly standardized |
| Best For | Raw data, uploads | Reliable properties |

### When to use NOMAD:
- You want access to raw calculation data
- You need to upload your own data
- You're looking for specific calculation types
- You want to compare different computational methods

### When to use Materials Project:
- You need reliable, curated data
- You want standardized properties
- You need thermodynamic data
- You're doing high-throughput screening

## Export Downloaded Data

In [49]:
# Save the LiFePO4 data to CSV
if len(lifep_data) > 0:
    df_lifep.to_csv('nomad_lifep_data.csv', index=False)
    print(f"Saved {len(df_lifep)} LiFePO4 entries to nomad_lifep_data.csv")
    
# Save band gap data
if len(band_gaps) > 0:
    df_bandgap = pd.DataFrame({
        'formula': formulas,
        'band_gap': band_gaps
    })
    df_bandgap.to_csv('nomad_bandgap_data.csv', index=False)
    print(f"Saved {len(df_bandgap)} band gap entries to nomad_bandgap_data.csv")

Saved 2 band gap entries to nomad_bandgap_data.csv


## Uploading Data to NOMAD

To upload data to NOMAD:

1. **Create an account** at https://nomad-lab.eu/
2. **Prepare your data** in a supported format (VASP, Quantum Espresso, etc.)
3. **Use the web interface** or API to upload

### Upload via Web Interface:
- Go to https://nomad-lab.eu/prod/v1/gui/uploads
- Click "New Upload"
- Drag and drop your calculation files
- Add metadata and publish

### Upload via API:
```python
# Requires authentication token
files = {'file': open('calculation.out', 'rb')}
response = requests.post(
    f"{BASE_URL}/uploads",
    headers={'Authorization': f'Bearer {token}'},
    files=files
)
```

**Note**: For HW1, you can use the staging server for testing uploads without affecting production data.

## Exercise: Explore NOMAD

Try these tasks:

1. Search for a different element or compound system
2. Download data and create visualizations
3. Compare properties from multiple entries
4. Export data in different formats
5. Explore the metadata available for entries

**Challenge**: Find overlapping materials between NOMAD and Materials Project and compare their properties!

In [50]:
# Your code here
# Try your own search!


## Additional Resources

- **NOMAD Website**: https://nomad-lab.eu/
- **Documentation**: https://nomad-lab.eu/prod/v1/docs/
- **API Docs**: https://nomad-lab.eu/prod/v1/api/v1/extensions/docs
- **Tutorials**: https://nomad-lab.eu/prod/v1/docs/tutorials.html
- **GitHub**: https://github.com/nomad-coe