# PEM Electrolyzer Coating Materials: Data Collection & Analysis

**Objective:** Collect and analyze coating material candidates from computational databases (Materials Project) and experimental literature to identify promising candidates for PEM electrolyzer bipolar plate protection.

**Key Performance Targets:**
- Contact resistance: < 10 mΩ·cm²
- Corrosion current: < 1 μA/cm²
- Operational lifetime: 80,000+ hours
- Cost: < $10/m²

**Commercial Impact:**
- Current stack failure costs: $0.30-0.45/kg added to LCOH
- Market opportunity: $5-8B annually by 2030

## 1. Setup & Imports

In [None]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Add src directory to path
sys.path.insert(0, os.path.abspath('../src'))

from data_collection.materials_project_collector import MaterialsProjectCollector
from data_collection.literature_database import LiteratureDatabase

# Configure plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', 1000)

print("✓ Imports successful")
print(f"✓ Working directory: {os.getcwd()}")

## 2. Materials Project Data Collection

Collect coating candidates from Materials Project database:
- Conductive oxides
- Nitrides
- Carbides

**Note:** This requires a Materials Project API key. Get one at: https://materialsproject.org/api

Set your API key as an environment variable:
```bash
export MP_API_KEY="your_api_key_here"
```

In [None]:
# Check if API key is set
api_key = os.getenv("MP_API_KEY")
if api_key:
    print("✓ Materials Project API key found")
else:
    print("⚠ WARNING: MP_API_KEY not set!")
    print("   Set it with: export MP_API_KEY='your_key'")
    print("   Get a key at: https://materialsproject.org/api")

In [None]:
# Initialize collector
# Uncomment when API key is available

# collector = MaterialsProjectCollector()
# print("✓ Materials Project collector initialized")

In [None]:
# Collect coating candidates
# This may take 5-10 minutes depending on API response time
# Uncomment when API key is available

# mp_data = collector.collect_coating_candidates(
#     stability_threshold=0.1,  # eV/atom above convex hull
#     include_oxides=True,
#     include_nitrides=True,
#     include_carbides=True,
#     verbose=True
# )

# print(f"\n✓ Collected {len(mp_data)} materials from Materials Project")

In [None]:
# Save to CSV
# Uncomment when data is collected

# output_file = '../data/materials_project/coating_candidates.csv'
# collector.save_to_csv(output_file, stable_only=False)

# stable_file = '../data/materials_project/coating_candidates_stable.csv'
# collector.save_to_csv(stable_file, stable_only=True)

### Visualize Materials Project Data

In [None]:
# Load and visualize (uncomment when data is available)

# # Load data
# mp_data = pd.read_csv('../data/materials_project/coating_candidates.csv')

# # Distribution by material class
# fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# # 1. Materials by class
# mp_data['material_class'].value_counts().plot(kind='bar', ax=axes[0, 0])
# axes[0, 0].set_title('Materials by Class')
# axes[0, 0].set_ylabel('Count')

# # 2. Stability distribution
# stable_data = mp_data[mp_data['is_stable']]
# stable_data['material_class'].value_counts().plot(kind='bar', ax=axes[0, 1], color='green')
# axes[0, 1].set_title('Stable Materials (E_hull ≤ 0.1 eV/atom)')
# axes[0, 1].set_ylabel('Count')

# # 3. Energy above hull distribution
# mp_data.boxplot(column='energy_above_hull', by='material_class', ax=axes[1, 0])
# axes[1, 0].set_title('Energy Above Hull by Material Class')
# axes[1, 0].set_ylabel('Energy Above Hull (eV/atom)')

# # 4. Band gap distribution
# mp_data.boxplot(column='band_gap', by='material_class', ax=axes[1, 1])
# axes[1, 1].set_title('Band Gap by Material Class')
# axes[1, 1].set_ylabel('Band Gap (eV)')

# plt.tight_layout()
# plt.show()

# print(f"Total materials: {len(mp_data)}")
# print(f"Stable materials: {mp_data['is_stable'].sum()}")
# print(f"Stability rate: {mp_data['is_stable'].sum() / len(mp_data) * 100:.1f}%")

## 3. Literature Database

Load and analyze experimental coating performance data from literature.

In [None]:
# Initialize literature database (pre-populated with 5 key papers)
lit_db = LiteratureDatabase()

print(f"✓ Literature database initialized with {len(lit_db.entries)} papers")
print("\nSummary Statistics:")
stats = lit_db.get_summary_statistics()
for key, value in stats.items():
    print(f"  {key}: {value}")

In [None]:
# View all entries
lit_data = lit_db.to_dataframe()

# Display key columns
display_cols = [
    'material', 'substrate', 'year', 'test_duration_hours',
    'corrosion_current_uA_cm2', 'contact_resistance_mOhm_cm2',
    'cost_estimate_dollar_m2', 'success_rating'
]

print("\nLiterature Database Entries:")
print("="*80)
lit_data[display_cols]

### Benchmark Against Performance Targets

In [None]:
# Benchmark all coatings against DOE/industry targets
benchmark = lit_db.benchmark_against_targets()

print("Performance Target Benchmarking:")
print("="*80)
print("\nTargets:")
print("  - Contact resistance: < 10 mΩ·cm²")
print("  - Corrosion current: < 1 μA/cm²")
print("  - Test duration: > 2000 hours")
print("  - Cost: < $10/m²")
print("\n")

benchmark_cols = [
    'material', 'year',
    'meets_resistance_target', 'meets_corrosion_target',
    'meets_duration_target', 'meets_cost_target',
    'meets_all_targets'
]

benchmark[benchmark_cols]

In [None]:
# Visualize performance vs. targets
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Contact Resistance
materials = benchmark['material']
resistance = benchmark['contact_resistance_mOhm_cm2']
axes[0, 0].barh(materials, resistance)
axes[0, 0].axvline(x=10, color='r', linestyle='--', label='Target: 10 mΩ·cm²')
axes[0, 0].set_xlabel('Contact Resistance (mΩ·cm²)')
axes[0, 0].set_title('Contact Resistance vs. Target')
axes[0, 0].legend()

# 2. Corrosion Current
corrosion = benchmark['corrosion_current_uA_cm2']
axes[0, 1].barh(materials, corrosion)
axes[0, 1].axvline(x=1, color='r', linestyle='--', label='Target: 1 μA/cm²')
axes[0, 1].set_xlabel('Corrosion Current (μA/cm²)')
axes[0, 1].set_title('Corrosion Current vs. Target')
axes[0, 1].legend()

# 3. Test Duration
duration = benchmark['test_duration_hours']
axes[1, 0].barh(materials, duration)
axes[1, 0].axvline(x=2000, color='r', linestyle='--', label='Target: 2000 hrs')
axes[1, 0].axvline(x=80000, color='orange', linestyle='--', label='Ultimate: 80k hrs')
axes[1, 0].set_xlabel('Test Duration (hours)')
axes[1, 0].set_title('Test Duration vs. Targets')
axes[1, 0].legend()

# 4. Cost
cost = benchmark['cost_estimate_dollar_m2']
axes[1, 1].barh(materials, cost)
axes[1, 1].axvline(x=10, color='r', linestyle='--', label='Target: $10/m²')
axes[1, 1].set_xlabel('Cost Estimate ($/m²)')
axes[1, 1].set_title('Cost vs. Target')
axes[1, 1].legend()

plt.tight_layout()
plt.show()

# Summary
print(f"\nCoatings meeting ALL targets: {benchmark['meets_all_targets'].sum()} / {len(benchmark)}")
if benchmark['meets_all_targets'].sum() == 0:
    print("⚠ No existing coating meets all performance and cost targets!")
    print("  → This confirms the need for AI-driven materials discovery")

## 4. Gap Analysis

Identify missing data and research needs.

In [None]:
# Identify research gaps
gaps = lit_db.identify_research_gaps()

print("RESEARCH GAPS & MISSING DATA:")
print("="*80)
for key, value in gaps.items():
    print(f"\n{key}:")
    if isinstance(value, list):
        for item in value:
            print(f"  - {item}")
    else:
        print(f"  {value}")

In [None]:
# Visualize data completeness
completeness = {
    'Corrosion Current': (~lit_data['corrosion_current_uA_cm2'].isna()).sum() / len(lit_data) * 100,
    'Contact Resistance': (~lit_data['contact_resistance_mOhm_cm2'].isna()).sum() / len(lit_data) * 100,
    'Test Duration': (~lit_data['test_duration_hours'].isna()).sum() / len(lit_data) * 100,
    'Cost Data': (~lit_data['cost_estimate_dollar_m2'].isna()).sum() / len(lit_data) * 100,
    'Ion Leaching': (~lit_data['fe_release_ug_cm2_day'].isna()).sum() / len(lit_data) * 100,
    'Degradation Rate': (~lit_data['voltage_increase_uV_hr'].isna()).sum() / len(lit_data) * 100
}

fig, ax = plt.subplots(figsize=(10, 6))
metrics = list(completeness.keys())
values = list(completeness.values())

bars = ax.barh(metrics, values, color=['green' if v == 100 else 'orange' if v >= 50 else 'red' for v in values])
ax.axvline(x=80, color='gray', linestyle='--', alpha=0.5, label='80% target')
ax.set_xlabel('Data Completeness (%)')
ax.set_title('Literature Database Data Completeness')
ax.set_xlim(0, 100)
ax.legend()

# Add percentage labels
for i, (bar, value) in enumerate(zip(bars, values)):
    ax.text(value + 2, i, f"{value:.0f}%", va='center')

plt.tight_layout()
plt.show()

print("\n⚠ Priority: Expand literature database to 25-35 papers with complete metrics")

## 5. Performance vs. Cost Trade-off Analysis

In [None]:
# Plot performance vs cost
fig, ax = plt.subplots(figsize=(12, 8))

# Create a composite performance score
# Lower is better for all metrics, so invert for scoring
benchmark['performance_score'] = (
    (1 / benchmark['corrosion_current_uA_cm2']) * 0.4 +  # 40% weight
    (1 / benchmark['contact_resistance_mOhm_cm2']) * 0.3 +  # 30% weight
    (benchmark['test_duration_hours'] / 1000) * 0.3  # 30% weight
)

# Scatter plot
scatter = ax.scatter(
    benchmark['cost_estimate_dollar_m2'],
    benchmark['performance_score'],
    s=benchmark['test_duration_hours'] / 20,  # Size by test duration
    c=benchmark['success_rating'],
    cmap='RdYlGn',
    alpha=0.7,
    edgecolors='black',
    linewidth=1
)

# Add labels for each point
for idx, row in benchmark.iterrows():
    ax.annotate(
        row['material'],
        (row['cost_estimate_dollar_m2'], row['performance_score']),
        xytext=(5, 5),
        textcoords='offset points',
        fontsize=9,
        alpha=0.8
    )

# Target zone (upper left = high performance, low cost)
ax.axvline(x=10, color='green', linestyle='--', alpha=0.5, label='Cost target: $10/m²')

ax.set_xlabel('Cost Estimate ($/m²)', fontsize=12)
ax.set_ylabel('Performance Score (composite)', fontsize=12)
ax.set_title('Coating Performance vs. Cost Trade-off\n(Size = test duration, Color = success rating)', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)

# Add colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Success Rating (1-5)', rotation=270, labelpad=20)

plt.tight_layout()
plt.show()

print("\nKey Insight: Target zone = upper-left quadrant (high performance, low cost)")
print("Current coatings show performance-cost trade-off, not optimized solutions")

## 6. Save Processed Data

In [None]:
# Save literature database to CSV
lit_db.save_to_csv('../data/literature/coating_performance_database.csv')

# Save benchmark results
benchmark.to_csv('../data/literature/performance_benchmark.csv', index=False)

print("✓ Data saved successfully")
print("  - Literature database: data/literature/coating_performance_database.csv")
print("  - Benchmark results: data/literature/performance_benchmark.csv")

## 7. Next Steps

### Immediate Actions (Week 1-2):

1. **Expand Literature Database** (HIGH PRIORITY)
   - Target: Add 20-30 more papers with quantitative coating performance data
   - Focus areas:
     - Long-duration tests (>5,000 hours)
     - Ion leaching measurements (Fe, Cr, Ni release)
     - Degradation mechanisms and failure mode analysis
     - Cost data for different deposition methods
   - Use systematic search queries (see docs/RESEARCH_PROMPT.md)

2. **Materials Project Data Collection**
   - Get Materials Project API key: https://materialsproject.org/api
   - Run collector to get 100-300 coating candidates
   - Analyze property distributions by material class

3. **Property Correlation Analysis**
   - Identify which Materials Project properties correlate with coating performance
   - Build initial feature set for ML models
   - Document relationships for model training

### Medium-term Goals (Weeks 3-8):

4. **Failure Mechanism Deep Dive**
   - Why do Ti/Nb coatings degrade before 80,000 hours?
   - What causes H2 embrittlement in nitride coatings?
   - How do defects propagate in harsh electrochemical environments?

5. **Coating Design Space Mapping**
   - Single-layer vs. multilayer structures
   - Thickness optimization (too thin = pinholes, too thick = cost/stress)
   - Different requirements for anode vs. cathode sides

6. **Cost-Performance Pareto Frontier**
   - Map the frontier of achievable performance vs. cost
   - Identify gaps where new materials could provide breakthrough
   - Inform ML optimization objectives

### Research Protocol:

See **docs/RESEARCH_PROMPT.md** for detailed task-by-task execution guidance.

### Success Criteria for Phase 1 (Month 3):

- ✅ 25-35 papers in literature database with complete metrics
- ✅ 100-300 Materials Project candidates collected and analyzed
- ✅ Property correlation analysis complete
- ✅ Failure mechanism understanding documented
- ✅ Database ready for ML model training (Phase 2)

---

**Remember:** The goal is not just to collect data, but to build a comprehensive understanding of what makes coatings succeed or fail. This knowledge will inform the AI/ML models in Phase 2.