# NOMAD Repository Tutorial

This notebook demonstrates how to access and use the NOMAD (Novel Materials Discovery) repository, an open materials science database.

## What is NOMAD?

NOMAD is a large-scale repository and archive for materials science data, particularly computational results. It contains:
- DFT calculations
- Crystal structures
- Electronic properties
- Thermodynamic data
- And much more!

## Resources

- **Website**: https://nomad-lab.eu/
- **Documentation**: https://nomad-lab.eu/prod/v1/docs/
- **API Documentation**: https://nomad-lab.eu/prod/v1/api/v1/extensions/docs

## Installation

```bash
pip install nomad-lab
# OR just use requests for API access
pip install requests
```

In [None]:
import requests
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pprint import pprint

## NOMAD API Basics

NOMAD's API is RESTful and doesn't require authentication for public data access.

Base URL: `https://nomad-lab.eu/prod/v1/api/v1`

In [None]:
# NOMAD API base URL
BASE_URL = "https://nomad-lab.eu/prod/v1/api/v1"

# Common headers
headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/json'
}

## Simple Search: Find Materials

Let's search for lithium-containing materials

In [None]:
# Search for Li-containing materials
query = {
    "query": {
        "results.material.elements": "Li"
    },
    "pagination": {
        "page_size": 10  # Limit results
    },
    "required": {
        "include": [
            "entry_id",
            "results.material.chemical_formula_reduced",
            "results.material.structural_type",
            "results.properties.electronic.band_structure_electronic.band_gap"
        ]
    }
}

response = requests.post(
    f"{BASE_URL}/entries/query",
    headers=headers,
    json=query
)

data = response.json()
print(f"Found {data['pagination']['total']} Li-containing entries")
print(f"\nShowing first {len(data['data'])} results:")

for entry in data['data']:
    formula = entry.get('results', {}).get('material', {}).get('chemical_formula_reduced', 'N/A')
    entry_id = entry.get('entry_id', 'N/A')
    print(f"  {entry_id}: {formula}")

## Search for Specific Compound: LiFePO4

Let's find all entries for LiFePO4 (a common battery material)

In [None]:
# Search for LiFePO4
query = {
    "query": {
        "results.material.chemical_formula_reduced": "LiFePO4"
    },
    "pagination": {
        "page_size": 20
    },
    "required": {
        "include": [
            "entry_id",
            "results.material.chemical_formula_reduced",
            "results.material.structural_type",
            "results.properties.electronic.band_structure_electronic.band_gap",
            "results.properties.structures.structure_original.lattice_parameters",
            "results.properties.structures.structure_original.space_group_symbol"
        ]
    }
}

response = requests.post(
    f"{BASE_URL}/entries/query",
    headers=headers,
    json=query
)

data = response.json()
print(f"Found {data['pagination']['total']} LiFePO4 entries\n")

# Extract data into a list
lifep_data = []
for entry in data['data']:
    try:
        results = entry.get('results', {})
        properties = results.get('properties', {})
        
        # Try to get band gap
        band_gap = None
        if 'electronic' in properties:
            band_gap_list = properties['electronic'].get('band_structure_electronic', [])
            if band_gap_list and len(band_gap_list) > 0:
                band_gap = band_gap_list[0].get('band_gap', [{}])[0].get('value')
        
        # Try to get space group
        space_group = None
        if 'structures' in properties:
            struct_orig = properties['structures'].get('structure_original', [])
            if struct_orig and len(struct_orig) > 0:
                space_group = struct_orig[0].get('space_group_symbol')
        
        lifep_data.append({
            'entry_id': entry.get('entry_id'),
            'formula': results.get('material', {}).get('chemical_formula_reduced'),
            'space_group': space_group,
            'band_gap': band_gap
        })
    except Exception as e:
        print(f"Error processing entry: {e}")
        continue

# Create DataFrame
df_lifep = pd.DataFrame(lifep_data)
print(df_lifep.head(10))

## Get Detailed Information for a Specific Entry

Let's download complete data for one entry

In [None]:
# Get the first entry ID
if len(lifep_data) > 0:
    entry_id = lifep_data[0]['entry_id']
    print(f"Fetching details for entry: {entry_id}")
    
    response = requests.get(
        f"{BASE_URL}/entries/{entry_id}",
        headers=headers
    )
    
    entry_details = response.json()
    
    # Print structure of the response
    print("\nAvailable data fields:")
    print(json.dumps(list(entry_details.keys()), indent=2))
    
    # Access specific data
    if 'data' in entry_details:
        results = entry_details['data'].get('results', {})
        material = results.get('material', {})
        
        print("\nMaterial Information:")
        print(f"  Formula: {material.get('chemical_formula_reduced')}")
        print(f"  Elements: {material.get('elements')}")
        print(f"  Structural type: {material.get('structural_type')}")
else:
    print("No entries found for LiFePO4")

## Search with Property Filters

Find materials with specific band gaps

In [None]:
# Search for materials with band gap between 1 and 3 eV containing Li
query = {
    "query": {
        "and": [
            {"results.material.elements": "Li"},
            {"results.properties.electronic.band_structure_electronic.band_gap.value:gte": 1.0},
            {"results.properties.electronic.band_structure_electronic.band_gap.value:lte": 3.0}
        ]
    },
    "pagination": {
        "page_size": 50
    },
    "required": {
        "include": [
            "entry_id",
            "results.material.chemical_formula_reduced",
            "results.properties.electronic.band_structure_electronic.band_gap"
        ]
    }
}

response = requests.post(
    f"{BASE_URL}/entries/query",
    headers=headers,
    json=query
)

data = response.json()
print(f"Found {data['pagination']['total']} Li materials with band gap 1-3 eV")

# Extract and visualize
band_gaps = []
formulas = []

for entry in data['data']:
    try:
        results = entry.get('results', {})
        formula = results.get('material', {}).get('chemical_formula_reduced')
        
        properties = results.get('properties', {})
        if 'electronic' in properties:
            band_gap_list = properties['electronic'].get('band_structure_electronic', [])
            if band_gap_list and len(band_gap_list) > 0:
                bg_value = band_gap_list[0].get('band_gap', [{}])[0].get('value')
                if bg_value is not None:
                    band_gaps.append(bg_value)
                    formulas.append(formula)
    except:
        continue

print(f"\nExtracted {len(band_gaps)} band gap values")

In [None]:
# Visualize band gap distribution
if len(band_gaps) > 0:
    plt.figure(figsize=(10, 6))
    plt.hist(band_gaps, bins=20, edgecolor='black', alpha=0.7, color='steelblue')
    plt.xlabel('Band Gap (eV)', fontsize=12)
    plt.ylabel('Count', fontsize=12)
    plt.title('Band Gap Distribution (NOMAD Li Materials)', fontsize=14)
    plt.grid(True, alpha=0.3)
    plt.show()
    
    # Statistics
    print(f"\nBand Gap Statistics:")
    print(f"  Mean: {np.mean(band_gaps):.2f} eV")
    print(f"  Median: {np.median(band_gaps):.2f} eV")
    print(f"  Std Dev: {np.std(band_gaps):.2f} eV")
    print(f"  Min: {np.min(band_gaps):.2f} eV")
    print(f"  Max: {np.max(band_gaps):.2f} eV")

## Download Crystal Structure Data

NOMAD allows you to download crystal structure in various formats

In [None]:
# Download structure as archive
if len(lifep_data) > 0:
    entry_id = lifep_data[0]['entry_id']
    
    # Get archive data (includes full calculation)
    archive_url = f"{BASE_URL}/entries/{entry_id}/archive"
    
    response = requests.get(archive_url, headers=headers)
    
    if response.status_code == 200:
        archive_data = response.json()
        print(f"Downloaded archive for {entry_id}")
        print(f"\nArchive contains {len(archive_data)} sections")
        
        # Save to file
        with open(f'{entry_id}_archive.json', 'w') as f:
            json.dump(archive_data, f, indent=2)
        print(f"Saved to {entry_id}_archive.json")
    else:
        print(f"Failed to download archive: {response.status_code}")

## Comparing NOMAD with Materials Project

Key Differences:

| Feature | NOMAD | Materials Project |
|---------|-------|------------------|
| Data Type | All computational data | Curated DFT results |
| Access | No API key required | Requires API key |
| Data Volume | Very large (~millions) | Large (~150k) |
| Upload | Yes (open repository) | No (curated only) |
| Quality Control | Minimal (raw data) | High (curated) |
| Standardization | Variable | Highly standardized |
| Best For | Raw data, uploads | Reliable properties |

### When to use NOMAD:
- You want access to raw calculation data
- You need to upload your own data
- You're looking for specific calculation types
- You want to compare different computational methods

### When to use Materials Project:
- You need reliable, curated data
- You want standardized properties
- You need thermodynamic data
- You're doing high-throughput screening

## Export Downloaded Data

In [None]:
# Save the LiFePO4 data to CSV
if len(lifep_data) > 0:
    df_lifep.to_csv('nomad_lifep_data.csv', index=False)
    print(f"Saved {len(df_lifep)} LiFePO4 entries to nomad_lifep_data.csv")
    
# Save band gap data
if len(band_gaps) > 0:
    df_bandgap = pd.DataFrame({
        'formula': formulas,
        'band_gap': band_gaps
    })
    df_bandgap.to_csv('nomad_bandgap_data.csv', index=False)
    print(f"Saved {len(df_bandgap)} band gap entries to nomad_bandgap_data.csv")

## Uploading Data to NOMAD

To upload data to NOMAD:

1. **Create an account** at https://nomad-lab.eu/
2. **Prepare your data** in a supported format (VASP, Quantum Espresso, etc.)
3. **Use the web interface** or API to upload

### Upload via Web Interface:
- Go to https://nomad-lab.eu/prod/v1/gui/uploads
- Click "New Upload"
- Drag and drop your calculation files
- Add metadata and publish

### Upload via API:
```python
# Requires authentication token
files = {'file': open('calculation.out', 'rb')}
response = requests.post(
    f"{BASE_URL}/uploads",
    headers={'Authorization': f'Bearer {token}'},
    files=files
)
```

**Note**: For HW1, you can use the staging server for testing uploads without affecting production data.

## Exercise: Explore NOMAD

Try these tasks:

1. Search for a different element or compound system
2. Download data and create visualizations
3. Compare properties from multiple entries
4. Export data in different formats
5. Explore the metadata available for entries

**Challenge**: Find overlapping materials between NOMAD and Materials Project and compare their properties!

In [None]:
# Your code here
# Try your own search!


## Additional Resources

- **NOMAD Website**: https://nomad-lab.eu/
- **Documentation**: https://nomad-lab.eu/prod/v1/docs/
- **API Docs**: https://nomad-lab.eu/prod/v1/api/v1/extensions/docs
- **Tutorials**: https://nomad-lab.eu/prod/v1/docs/tutorials.html
- **GitHub**: https://github.com/nomad-coe