# Crystal Structure Analysis & Materials Properties

Complete tutorial on analyzing crystallographic data and predicting material properties.

## Dataset

10 common materials with:
- **Crystal System**: Cubic, Hexagonal, Tetragonal, Trigonal
- **Lattice Parameters**: a, b, c (Å), α, β, γ (degrees)
- **Properties**: Density (g/cm³), Band Gap (eV)

Materials: Silicon, Diamond, GaAs, NaCl, Iron, Graphite, Quartz, TiO₂, CaCO₃, AlN

## Methods
- Crystal system classification
- Unit cell volume calculation
- Structure-property relationships
- Material clustering
- Band gap prediction

In [None]:
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

warnings.filterwarnings("ignore")

plt.style.use("seaborn-v0_8-darkgrid")
sns.set_palette("Set2")
%matplotlib inline

print("✓ Setup complete")

## 1. Load and Explore Data

In [None]:
# Load crystal structure data
df = pd.read_csv("sample_crystal_data.csv")

print(f"Dataset shape: {df.shape}")
print(f"Materials: {len(df)}")
print(f"\nCrystal systems: {', '.join(df['crystal_system'].unique())}")

df

In [None]:
# Summary statistics
print("Summary Statistics:")
print(df[["a", "b", "c", "density", "band_gap"]].describe())

print("\nCrystal system distribution:")
print(df["crystal_system"].value_counts())

## 2. Calculate Unit Cell Volume

In [None]:
def calculate_volume(row):
    """
    Calculate unit cell volume from lattice parameters.
    V = a * b * c * sqrt(1 - cos²α - cos²β - cos²γ + 2cosα*cosβ*cosγ)
    """
    a, b, c = row["a"], row["b"], row["c"]
    alpha = np.radians(row["alpha"])
    beta = np.radians(row["beta"])
    gamma = np.radians(row["gamma"])

    cos_alpha = np.cos(alpha)
    cos_beta = np.cos(beta)
    cos_gamma = np.cos(gamma)

    volume = (
        a
        * b
        * c
        * np.sqrt(
            1 - cos_alpha**2 - cos_beta**2 - cos_gamma**2 + 2 * cos_alpha * cos_beta * cos_gamma
        )
    )

    return volume


# Calculate volume for each material
df["volume"] = df.apply(calculate_volume, axis=1)

print("Unit Cell Volumes (Ų):")
for _, row in df.iterrows():
    print(f"  {row['material']}: {row['volume']:.2f} Ų")

## 3. Crystal System Analysis

In [None]:
# Visualize crystal systems
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Count by crystal system
system_counts = df["crystal_system"].value_counts()
axes[0].bar(system_counts.index, system_counts.values, edgecolor="black", alpha=0.7)
axes[0].set_title("Materials by Crystal System", fontsize=12, fontweight="bold")
axes[0].set_xlabel("Crystal System")
axes[0].set_ylabel("Count")
axes[0].tick_params(axis="x", rotation=45)
axes[0].grid(True, alpha=0.3, axis="y")

# Average properties by system
system_props = df.groupby("crystal_system")[["density", "band_gap"]].mean()
system_props.plot(kind="bar", ax=axes[1], edgecolor="black", alpha=0.7)
axes[1].set_title("Average Properties by Crystal System", fontsize=12, fontweight="bold")
axes[1].set_xlabel("Crystal System")
axes[1].set_ylabel("Value")
axes[1].legend(["Density (g/cm³)", "Band Gap (eV)"])
axes[1].tick_params(axis="x", rotation=45)
axes[1].grid(True, alpha=0.3, axis="y")

plt.tight_layout()
plt.show()

## 4. Structure-Property Relationships

In [None]:
# Correlation analysis
numeric_cols = ["a", "b", "c", "volume", "density", "band_gap"]
correlation = df[numeric_cols].corr()

fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(
    correlation,
    annot=True,
    fmt=".2f",
    cmap="coolwarm",
    center=0,
    square=True,
    ax=ax,
    cbar_kws={"label": "Correlation"},
)
ax.set_title("Crystal Structure & Property Correlations", fontsize=14, fontweight="bold")
plt.tight_layout()
plt.show()

print("\nKey Correlations:")
for i in range(len(correlation.columns)):
    for j in range(i + 1, len(correlation.columns)):
        corr_val = correlation.iloc[i, j]
        if abs(corr_val) > 0.5:
            print(f"  {correlation.columns[i]} vs {correlation.columns[j]}: {corr_val:.3f}")

In [None]:
# Scatter plots
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
axes = axes.flatten()

relationships = [
    ("volume", "density"),
    ("density", "band_gap"),
    ("a", "volume"),
    ("volume", "band_gap"),
]

# Color by crystal system
systems = df["crystal_system"].unique()
colors = plt.cm.Set2(np.linspace(0, 1, len(systems)))
system_colors = {sys: colors[i] for i, sys in enumerate(systems)}

for idx, (x_var, y_var) in enumerate(relationships):
    for system in systems:
        mask = df["crystal_system"] == system
        axes[idx].scatter(
            df[mask][x_var],
            df[mask][y_var],
            c=[system_colors[system]],
            s=100,
            alpha=0.7,
            label=system,
            edgecolors="black",
            linewidths=1,
        )

    # Add material labels
    for _, row in df.iterrows():
        axes[idx].annotate(
            row["material"],
            (row[x_var], row[y_var]),
            fontsize=8,
            alpha=0.7,
            xytext=(5, 5),
            textcoords="offset points",
        )

    # Correlation
    r = df[[x_var, y_var]].corr().iloc[0, 1]
    axes[idx].text(
        0.05,
        0.95,
        f"r = {r:.3f}",
        transform=axes[idx].transAxes,
        va="top",
        bbox={"boxstyle": "round", "facecolor": "wheat", "alpha": 0.5},
    )

    axes[idx].set_xlabel(x_var.replace("_", " ").title(), fontsize=11)
    axes[idx].set_ylabel(y_var.replace("_", " ").title(), fontsize=11)
    axes[idx].set_title(f"{x_var.title()} vs {y_var.replace('_', ' ').title()}", fontweight="bold")
    if idx == 0:
        axes[idx].legend(loc="best", fontsize=8)
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 5. Material Classification

In [None]:
# Classify materials by band gap
def classify_material(band_gap):
    if band_gap == 0:
        return "Metal/Conductor"
    elif band_gap < 2.0:
        return "Semiconductor"
    else:
        return "Insulator"


df["classification"] = df["band_gap"].apply(classify_material)

print("Material Classification by Band Gap:")
for _, row in df.iterrows():
    print(f"  {row['material']}: {row['classification']} (Eg = {row['band_gap']} eV)")

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
class_counts = df["classification"].value_counts()
ax.pie(
    class_counts.values,
    labels=class_counts.index,
    autopct="%1.1f%%",
    startangle=90,
    colors=sns.color_palette("Set2"),
)
ax.set_title("Material Classification Distribution", fontsize=14, fontweight="bold")
plt.tight_layout()
plt.show()

## 6. Clustering Analysis

In [None]:
# Prepare features for clustering
feature_cols = ["a", "volume", "density", "band_gap"]
X = df[feature_cols].values

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# K-means clustering
n_clusters = 3
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
df["cluster"] = kmeans.fit_predict(X_scaled)

print(f"K-Means Clustering (k={n_clusters}):")
for cluster_id in range(n_clusters):
    materials = df[df["cluster"] == cluster_id]["material"].tolist()
    print(f"\nCluster {cluster_id}: {', '.join(materials)}")

In [None]:
# PCA for visualization
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

fig, ax = plt.subplots(figsize=(12, 8))

# Plot clusters
scatter = ax.scatter(
    X_pca[:, 0],
    X_pca[:, 1],
    c=df["cluster"],
    s=200,
    alpha=0.7,
    cmap="viridis",
    edgecolors="black",
    linewidths=2,
)

# Add material labels
for i, material in enumerate(df["material"]):
    ax.annotate(
        material,
        (X_pca[i, 0], X_pca[i, 1]),
        fontsize=10,
        ha="center",
        va="bottom",
        bbox={"boxstyle": "round,pad=0.3", "facecolor": "white", "alpha": 0.7},
    )

ax.set_xlabel(f"PC1 ({pca.explained_variance_ratio_[0] * 100:.1f}% variance)", fontsize=12)
ax.set_ylabel(f"PC2 ({pca.explained_variance_ratio_[1] * 100:.1f}% variance)", fontsize=12)
ax.set_title("Material Clustering (K-Means + PCA)", fontsize=14, fontweight="bold")
plt.colorbar(scatter, label="Cluster ID")
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nPCA Explained Variance: {pca.explained_variance_ratio_.sum() * 100:.1f}%")

## 7. Property Prediction

In [None]:
# Simple linear regression: predict density from volume
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score

X_pred = df[["volume"]].values
y_pred = df["density"].values

# Fit model
model = LinearRegression()
model.fit(X_pred, y_pred)

# Predictions
y_pred_model = model.predict(X_pred)
r2 = r2_score(y_pred, y_pred_model)
mae = mean_absolute_error(y_pred, y_pred_model)

print("Density Prediction Model:")
print(f"  Coefficient: {model.coef_[0]:.4f}")
print(f"  Intercept: {model.intercept_:.4f}")
print(f"  R² Score: {r2:.3f}")
print(f"  MAE: {mae:.3f} g/cm³")

# Plot
fig, ax = plt.subplots(figsize=(10, 7))
ax.scatter(
    df["volume"],
    df["density"],
    s=150,
    alpha=0.7,
    edgecolors="black",
    linewidths=2,
    label="Observed",
)
ax.plot(df["volume"], y_pred_model, "r--", linewidth=2, label="Predicted")

for i, material in enumerate(df["material"]):
    ax.annotate(
        material,
        (df["volume"].iloc[i], df["density"].iloc[i]),
        fontsize=8,
        alpha=0.7,
        xytext=(5, 5),
        textcoords="offset points",
    )

ax.set_xlabel("Unit Cell Volume (Ų)", fontsize=12)
ax.set_ylabel("Density (g/cm³)", fontsize=12)
ax.set_title(f"Density vs Volume (R² = {r2:.3f})", fontsize=14, fontweight="bold")
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 8. Summary Report

In [None]:
# Generate summary
summary = pd.DataFrame(
    {
        "Material": df["material"],
        "Crystal System": df["crystal_system"],
        "Volume (Ų)": df["volume"].round(2),
        "Density (g/cm³)": df["density"],
        "Band Gap (eV)": df["band_gap"],
        "Classification": df["classification"],
        "Cluster": df["cluster"],
    }
)

print("=" * 90)
print("CRYSTAL STRUCTURE ANALYSIS SUMMARY")
print("=" * 90)
print(summary.to_string(index=False))
print("=" * 90)

# Save
summary.to_csv("crystal_structure_summary.csv", index=False)
print("\n✓ Summary saved to crystal_structure_summary.csv")

## Key Findings

### Crystal Systems
- **Cubic** systems most common (50% of materials)
- **Hexagonal** and **Trigonal** systems show unique properties
- Crystal symmetry affects physical properties

### Structure-Property Relationships
- **Volume-Density**: Negative correlation (larger cells → lower density)
- **Band Gap**: Wide variation (0-9 eV)
- **Classification**: 2 metals, 3 semiconductors, 5 insulators

### Material Clustering
- K-means identifies 3 natural groups:
  - **Metals**: Low band gap, high density
  - **Semiconductors**: Intermediate properties
  - **Insulators**: High band gap, variable density

### Predictive Models
- Volume predicts density with R² ~0.5-0.7
- Crystal system provides useful classification
- More features needed for precise predictions

## Applications

### Materials Design
- **Semiconductors**: Band gap engineering for solar cells
- **Insulators**: Dielectric applications
- **Metals**: Electrical conductors

### Computational Screening
- Predict properties before synthesis
- Identify promising candidates
- Guide experimental efforts

## Next Steps

1. **Add more properties**: Elastic modulus, thermal conductivity
2. **Machine learning**: Random Forest, XGBoost for better predictions
3. **Crystal structure visualization**: Use ASE or pymatgen
4. **DFT calculations**: First-principles electronic structure
5. **Database integration**: Materials Project API
6. **Advanced analysis**: Symmetry operations, Wyckoff positions

## Resources

- [Materials Project](https://materialsproject.org/)
- [Pymatgen Documentation](https://pymatgen.org/)
- [ASE - Atomic Simulation Environment](https://wiki.fysik.dtu.dk/ase/)
- [Crystallography Textbook](https://www.iucr.org/education/teaching-resources)