<!-- First Slide -->

<center>
    <h1 style="font-size: 2.5em; color: #ef5636; font-weight: 800; margin-bottom: 0.2em;"> Intro on Machine Learning in Materials Science </h1>
    <h2 style="font-size: 1.8em; font-weight: 300; color: #5D6D7E;">A "Gentle" Introduction to Material Featuriziation & Machine Learning</h2>
    <p style="font-size: 1.em; color: #34495E;text-align: center;"> Paolo De Angelis &mdash; Politecnico di Torino </p>
    <br>
</center>

<br>
<br>

<center>
    <p style="font-size: 1.em; color: #34495E;">
        <strong>Course:</strong> 02QZSND - Applicazioni energetiche dei materiali 
        <br>
        <strong>Date:</strong> 11 April 2025
    </p>
</center>
<br>

---

<!-- Optional footer -->
<p style="font-size: 0.7em; color: #BDC3C7; text-align: center;">
    A.A. 2025/2026
</p>


# What we will see today:


<ul style="font-size: 1.5em;">
  <li><strong>Why Machine Learning?</strong></li>
  <li><strong>Can computers digest materials?</strong></li>
  <li><strong>What Machine Learning can do?</strong></li>
  <li><strong>Where we are going?</strong></li>
</ul>


# Evolution of science & toolkits

![img1-1](img/1-1.png)

# Evolution of science & toolkits

![img1-1](img/1-2.png)

# Evolution of science & toolkits

![img1-1](img/1-3.png)

# Evolution of science & toolkits

![img1-1](img/1-4.png)

# Oveview

![Eulero-Ven](img/2.png)

# ML in Materials Science

Materials science is undergoing a revolution through machine learning applications. 
At the core of this transformation lies a some mathematical foundation and "empirical" evidance.

## Universal Function Approximation Theorem

The Universal Function Approximation Theorem states that a feed-forward neural network with:
- A single hidden layer (but also proved for deeper models)
- Sufficient number of neurons
- Appropriate activation functions

Can approximate **any** continuous function on compact subsets of $\mathbb{R}^n$ to arbitrary precision.

Formally:

For any continuous function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ and error $\epsilon > 0$, there exists a neural network $g$ with one hidden layer such that:

$$\sup_{x \in K} \|f(x) - g(x)\| < \epsilon$$

Where $K$ is a compact subset of $\mathbb{R}^n$.

> In other words, neural networks can learn the function we don’t know, given enough data and the right architecture.

# The "Hypermaterial Space" Ansatz

## Bridging the Discrete Nature of Materials

Traditional materials science faces a fundamental challenge: materials exist as discrete entities. Each chemical compound or element has distinct properties, making it difficult to find continuous functions that map across them.

How can we apply ML techniques to this inherently discrete domain?




# The "Hypermaterial Space" Ansatz

## The Hypermaterial Space Concept

The key insight is to map materials into a high-dimensional continuous space where:

$$\Phi: \text{Materials} \rightarrow \mathbb{R}^N$$

Where $\Phi$ is our mapping function betwenn the material $M$ and $N$-dimensional "hypermaterial space".

In this space, previously discrete materials become points in a continuum, where:

$$\text{Property}(M) \approx f(\Phi(;))$$

Where $f$ is a continuous function that ML methods can effectively learn.


# What *may* this unlock?

- Enables prediction of complex material properties without requiring explicit physical models
- Allows discovery of structure-property relationships that may be too complex for traditional methods
- Facilitates inverse design: identifying materials with desired properties
- Provides a framework for interpolating across known materials to discover new ones

# Representing Materials: Featuring a material

To use machine learning on materials, we first need to represent material structures in a numerical form (features/descriptors). Good features capture the relevant chemistry/structure and are amenable to ML models​
chemintelligence.com

![features](img/3.png)

# Representing Materials: Featuring a material

Two common approaches are:

### chemical formula or SMILES descriptors (Fingerprint): 
- we use the chemical formula or SMILES strings, e.g. representing a polymer or molecule by a text string. The SMILES notation is a compact text representation of a molecule’s structure​
mdpi.com
- We can convert SMILES into numerical fingerprints (vectors) using cheminformatics tools. 

###  Structure-based descriptors: 
- If 3D structure or crystal geometry is known, we can compute physics-inspired descriptors (features).
- For example, the Coulomb Matrix encodes inter-atomic Coulombic interactions in a symmetric matrix form
- These descriptors are designed to be invariant to rotations, translations, and atom index permutations 

# Fingerprint

## SMILES — Simplified Molecular Input Line Entry System

- A text-based representation of molecular structures.

- Encodes atoms and bonds in a compact 1D string format.

- Example:
<div style="text-align: center;font-size: 1.5em;">
  <code>CCO</code><br>
  <span>(ethanol)</span>
</div>
- Easy to store, parse, and convert to graph structures.

# Morgan Fingerprints — Circular Substructure Encoding

Morgan fingerprints (aka Extended Connectivity Fingerprints, ECFP) are a way to convert molecular graphs into **fixed-length binary vectors**.

- Each bit in the fingerprint represents a **specific substructure** (e.g., rings, chains, functional groups).
- Computed by iteratively hashing atom neighborhoods up to a defined **radius**.

#### Example:

- For ethanol (`CCO`), the 1024-bit Morgan fingerprint might look like:

$$
[0, 0, 1, 0, 1, \dots, 0]
$$

Each "1" indicates presence of a certain chemical pattern.


# Morgan Fingerprints — Circular Substructure Encoding


In [1]:
# --- Imports ---
import base64
from io import BytesIO

import numpy as np
import plotly.graph_objects as go
from ipywidgets import HTML, Dropdown, HBox, Layout, VBox, interactive_output
from rdkit import Chem
from rdkit.Chem import DataStructs, Draw, rdFingerprintGenerator

# from PIL import Image # Not directly needed

# --- Configuration ---
# Define the molecules to choose from
molecules_dict = {
    "Ethanol": "CCO",
    "Acetic Acid": "CC(=O)O",
    "Benzene": "c1ccccc1",
    "Aspirin": "CC(=O)Oc1ccccc1C(=O)O",
    "Caffeine": "Cn1cnc2c1c(=O)n(C)c(=O)n2C",
}
default_name = "Ethanol"

# Fingerprint parameters
fp_radius = 2
fp_size = 1024
matrix_dim = 32  # For reshaping 1024 bits into 32x32

# --- Fingerprint Generator (create once) ---
morgan_gen = rdFingerprintGenerator.GetMorganGenerator(radius=fp_radius, fpSize=fp_size)

# --- Create Widgets (Initialize empty or with default) ---

# Dropdown for selection
molecule_dropdown = Dropdown(
    options=list(molecules_dict.keys()),
    value=default_name,
    description="Select Molecule:",
    style={"description_width": "initial"},
)

# Left Panel Widgets
chem_name_widget = HTML()  # Will be populated by update function
image_widget = HTML()  # Will be populated by update function
left_panel = VBox(
    [chem_name_widget, image_widget],
    layout=Layout(width="40%", align_items="center", margin="10px"),
)

# Right Panel Widget (Plotly Heatmap)
fig_widget = go.FigureWidget()
# Add initial empty heatmap trace (will be populated)
heatmap_trace = fig_widget.add_heatmap(
    z=np.zeros((matrix_dim, matrix_dim)),  # Initial empty data
    colorscale="Blues",
    colorbar=dict(title="Bit Value", tickvals=[0, 1], ticktext=["0 (Off)", "1 (On)"]),
    showscale=True,
).data[0]
# Initial layout for the heatmap
fig_widget.update_layout(
    title="Morgan Fingerprint",
    xaxis_title=f"Feature Index Block (0-{matrix_dim-1})",
    yaxis_title=f"Feature Index Block (0-{matrix_dim-1})",
    height=500,
    width=600,
    margin=dict(l=50, r=50, t=50, b=50),
)
# fig_widget.layout.width = '95%'
right_panel = fig_widget

# --- Main Layout Structure ---
main_layout = HBox(
    [left_panel, right_panel],
    layout=Layout(
        width="100%", justify_content="space-around", align_items="flex-start"
    ),
)

# --- Update Function ---


def update_display(selected_name):
    """Updates molecule info, sketch, and fingerprint heatmap based on selection."""
    global mol  # Make mol accessible if needed elsewhere, or pass if preferred

    smiles = molecules_dict.get(selected_name, None)
    if smiles is None:
        chem_name_widget.value = f"<h2>Error</h2><p>Invalid selection.</p>"
        image_widget.value = ""
        with fig_widget.batch_update():
            heatmap_trace.z = np.zeros((matrix_dim, matrix_dim))
            fig_widget.layout.title = "Error"
        return

    # --- Process Molecule ---
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        chem_name_widget.value = f"<h2>{selected_name}</h2><p>SMILES: {smiles}</p><b>Error parsing SMILES!</b>"
        image_widget.value = ""
        with fig_widget.batch_update():
            heatmap_trace.z = np.zeros((matrix_dim, matrix_dim))
            fig_widget.layout.title = f"Error parsing SMILES for {selected_name}"
        return

    # --- Update Left Panel ---
    chem_name_widget.value = f"<h2>{selected_name}</h2><p>SMILES: {smiles}</p>"
    try:
        img = Draw.MolToImage(mol, size=(300, 300))
        buffer = BytesIO()
        img.save(buffer, "png")
        buffer.seek(0)
        b64_str = base64.b64encode(buffer.read()).decode("utf-8")
        image_widget.value = f'<img src="data:image/png;base64,{b64_str}" alt="{selected_name} structure">'
    except Exception as e:
        image_widget.value = f"<i>Error generating image: {e}</i>"

    # --- Update Right Panel (Fingerprint & Heatmap) ---
    try:
        fp = morgan_gen.GetFingerprint(mol)
        fp_array = np.zeros((fp_size,), dtype=int)
        DataStructs.ConvertToNumpyArray(fp, fp_array)
        fp_matrix = fp_array.reshape(matrix_dim, matrix_dim)

        # Update Plotly FigureWidget data and layout
        with fig_widget.batch_update():
            heatmap_trace.z = fp_matrix
            fig_widget.layout.title = f"Morgan FP ({selected_name})"
            # Optional: update print statements if they were part of the display
            # print(f"Fingerprint length: {fp.GetNumBits()}")
            # print(f"Number of bits set: {fp.GetNumOnBits()}")

    except Exception as e:
        with fig_widget.batch_update():
            heatmap_trace.z = np.zeros((matrix_dim, matrix_dim))  # Clear on error
            fig_widget.layout.title = f"Error generating FP for {selected_name}: {e}"


# --- Linking Dropdown to Update Function ---
# Use interactive_output to connect the dropdown value to the function's argument
out = interactive_output(update_display, {"selected_name": molecule_dropdown})

# --- Initial Display ---
# Call the update function once with the default value to populate widgets initially
update_display(molecule_dropdown.value)

# --- Final Layout ---
# Combine dropdown and main layout vertically
final_layout = VBox([molecule_dropdown, main_layout])

# --- Display Everything ---
display(final_layout)

VBox(children=(Dropdown(description='Select Molecule:', options=('Ethanol', 'Acetic Acid', 'Benzene', 'Aspirin…

# Structural-based Features

#### Structural Features = Encoded Geometry + Chemistry

Structural features capture information about:
- **Atomic types** (e.g., H, C, Si)
- **Atomic positions** (e.g., 3D coordinates, distances)
- **Local environments** (e.g., coordination, angles, symmetry)
- **Global structure** (e.g., lattice, bonding network)

These features are essential for predicting:
- Energy, forces (ML force fields)
- Elasticity, conductivity, stability
- Surface reactivity, band gap, diffusion

In [104]:
%matplotlib widget

# (Dummy example) Coordinates

A basic molecular structure (e.g., water) with its atomic coordinates in 3D space. Thus, the dummy example of descriptor uses the 3D coordinates of the atoms in the system

$$
    M_{ij} = r_{ij}
$$

where:
- $i$ is the atom index
- $j=x,y,z$ is the vector component

# (Dummy example) Coordinates

In [3]:
import itertools  # For generating permutations

import ase
import ase.build
import nglview as nv

# --- Imports ---
import numpy as np
import plotly.graph_objects as go  # Import Plotly
from ipywidgets import (
    HTML,
    Dropdown,
    FloatSlider,
    HBox,
    Label,
    Layout,
    VBox,
    interactive_output,
)
from scipy.spatial.transform import Rotation as R

# --- Initial Molecule Setup ---
atoms = ase.build.molecule("H2O")
original_positions = atoms.get_positions().copy()
center = atoms.get_center_of_mass()
n_atoms = len(atoms)
atom_symbols = [a.symbol for a in atoms]
base_atom_labels = [f"{s}{i}" for i, s in enumerate(atom_symbols)]  # ['O0', 'H1', 'H2']

# --- Generate Permutations ---
original_indices = list(range(n_atoms))
permutations_indices = list(itertools.permutations(original_indices))
permutation_map = {}
for p_indices in permutations_indices:
    permuted_labels = [base_atom_labels[i] for i in p_indices]
    label_str = "-".join(permuted_labels)
    permutation_map[label_str] = list(p_indices)
permutation_labels = list(permutation_map.keys())

# --- Visualization and Plotting Widgets ---
# NGLView for 3D structure (Left Panel)
view = nv.show_ase(atoms)
view.add_ball_and_stick()
view.center()
view.control.zoom(0.1)
# Set a fixed width for the NGLView widget if needed for layout
view.layout.width = "60%"

# Plotly FigureWidget for Coordinates (Right Panel - Top)
# Create a FigureWidget instance - this allows efficient updates
fig_widget = go.FigureWidget()

# Add an initial heatmap trace. We will update its 'z' data and 'y' labels.
# Store a reference to the trace object using .data[0]
heatmap_trace = fig_widget.add_heatmap(
    z=np.zeros((n_atoms, 3)),  # Initial dummy data (n_atoms x 3 coordinates)
    x=["x", "y", "z"],  # Column labels
    y=base_atom_labels,  # Initial row labels (will be updated)
    colorscale="Viridis",  # Choose a colorscale
    #     colorbar={'title': 'Coordinate Value'},
    hoverongaps=False,
).data[0]
# Update layout
fig_widget.update_layout(
    title="Atomic Coordinates",
    #     yaxis={'title': 'j', 'tickvals': list(range(3)), 'ticktext': reshaped_y_labels, 'showticklabels': False, 'automargin': True},
    #     xaxis={'title': 'i', 'tickvals': list(range(3)), 'ticktext': reshaped_x_labels, 'showticklabels': False, 'automargin': True}
)


# --- Control Widgets (Right Panel - Bottom) ---
# Rotation Sliders
slider_rot_x = FloatSlider(
    min=-180,
    max=180,
    step=5,
    value=0,
    description="Rotate X (°)",
    continuous_update=False,
)
slider_rot_y = FloatSlider(
    min=-180,
    max=180,
    step=5,
    value=0,
    description="Rotate Y (°)",
    continuous_update=False,
)
slider_rot_z = FloatSlider(
    min=-180,
    max=180,
    step=5,
    value=0,
    description="Rotate Z (°)",
    continuous_update=False,
)
# Permutation Dropdown
permutation_dropdown = Dropdown(
    options=permutation_labels,
    value=permutation_labels[0],
    description="Atom Order:",
    style={"description_width": "initial"},
)

# --- Update Function ---


def update_molecule_and_plot(rot_x, rot_y, rot_z, permutation_label):
    # 1. Calculate Rotated Positions
    rot_mat = R.from_euler("xyz", [rot_x, rot_y, rot_z], degrees=True).as_matrix()
    rotated_positions = (
        original_positions - center + np.array([1, 0.5, 0.5])
    ) @ rot_mat.T + center

    # 2. Update the physical ASE Atoms object's positions
    atoms.set_positions(rotated_positions)

    # 3. Update NGLView display
    if hasattr(view, "_ngl_component_ids") and view._ngl_component_ids:
        view.set_coordinates({0: atoms.positions})
    else:
        print("Warning: NGLView component not found.")

    # 4. Get current positions and apply permutation for plot display
    try:
        current_positions = atoms.get_positions()
        p_indices = permutation_map[permutation_label]
        # Reorder the coordinates and labels according to the permutation
        displayed_positions = current_positions[p_indices]
        displayed_labels = [base_atom_labels[i] for i in p_indices]

        # --- Update Plotly Plot ---
        # Use batch_update for smoother rendering of multiple changes
        with fig_widget.batch_update():
            # Update the heatmap's z data (the coordinate values)
            heatmap_trace.z = displayed_positions
            # Update the heatmap's y labels (the permuted atom labels)
            heatmap_trace.y = displayed_labels
            # Update the layout's y-axis tick text to match the new labels
            # This ensures the axis labels correspond to the heatmap rows
            #             fig_widget.layout.yaxis.ticktext = displayed_labels
            #             fig_widget.layout.yaxis.tickvals = list(range(len(displayed_labels))) # Ensure tick values match indices

            # Update title
            fig_widget.layout.title = f"Descriptor"
            # Optional: Auto-adjust color axis range dynamically
            # min_val, max_val = np.min(displayed_positions), np.max(displayed_positions)
            # fig_widget.layout.coloraxis.cmin = min_val
            # fig_widget.layout.coloraxis.cmax = max_val

    except Exception as e:
        print(f"Error during update: {e}")
        # Optionally clear plot or show error in title
        with fig_widget.batch_update():
            heatmap_trace.z = np.zeros((n_atoms, 3))
            heatmap_trace.y = base_atom_labels
            fig_widget.layout.yaxis.ticktext = base_atom_labels
            fig_widget.layout.yaxis.tickvals = list(range(n_atoms))
            fig_widget.layout.title = f"Error during update: {e}"


# --- Linking Widgets to Update Function ---
out = interactive_output(
    update_molecule_and_plot,
    {
        "rot_x": slider_rot_x,
        "rot_y": slider_rot_y,
        "rot_z": slider_rot_z,
        "permutation_label": permutation_dropdown,
    },
)

# --- Initial Calculation & Display ---
update_molecule_and_plot(
    slider_rot_x.value,
    slider_rot_y.value,
    slider_rot_z.value,
    permutation_dropdown.value,
)

# --- Display Layout (Side-by-Side) ---
controls = VBox([permutation_dropdown, slider_rot_x, slider_rot_y, slider_rot_z])

# Right panel contains the Plotly FigureWidget and the controls below it
right_panel = VBox([fig_widget, controls])
right_panel.layout.width = "50%"

# Main layout: NGLView on left, Plot+Controls on right
layout = HBox([view, right_panel])

display(layout)

HBox(children=(NGLWidget(layout=Layout(width='60%')), VBox(children=(FigureWidget({
    'data': [{'colorscale'…

# Coulomb Matrix (CM)

The **Coulomb Matrix** is a straightforward global descriptor designed to capture the electrostatic interactions between atomic nuclei in a molecule.

Its elements are defined by the following formula:

$$
M_{ij}^{\text{Coulomb}} = \begin{cases}
0.5 Z_i^{2.4} & \text{if } i = j \\
\frac{Z_i Z_j}{R_{ij}} & \text{if } i \ne j
\end{cases}
$$

Here:
- $ Z_i $ and $ Z_j $ are the atomic numbers of atoms $ i $ and $ j $,
- $ R_{ij} $ is the distance between the two nuclei.

The **diagonal entries** represent a fitted approximation of atomic energy as a function of nuclear charge — effectively modeling an atom’s interaction with itself.  
In contrast, the **off-diagonal entries** reflect the classical Coulombic repulsion between distinct atomic pairs.

# Coulomb Matrix (CM)

In [4]:
import itertools

import ase
import ase.build
import nglview as nv

# --- Imports ---
import numpy as np
import plotly.graph_objects as go
from dscribe.descriptors import CoulombMatrix  # <--- Import CoulombMatrix
from ipywidgets import (
    HTML,
    Dropdown,
    FloatSlider,
    HBox,
    Label,
    Layout,
    VBox,
    interactive_output,
)
from scipy.spatial.transform import Rotation as R

# Removed SOAP and math imports

# --- Initial Molecule Setup ---
atoms = ase.build.molecule("H2O")
# You had atoms.cell = [4,4,4] - this is unusual for an isolated molecule
# unless you intend periodic calculations. For CoulombMatrix on H2O,
# periodicity is typically False, and cell isn't strictly needed unless
# dscribe specifically requires it for certain modes. Let's assume non-periodic.
# If periodicity IS needed, ensure the descriptor is set accordingly.
# atoms.pbc = True # If periodic calculation is intended
original_positions = atoms.get_positions().copy()
center = atoms.get_center_of_mass()
n_atoms = len(atoms)
atom_symbols = [a.symbol for a in atoms]
base_atom_labels = [f"{s}{i}" for i, s in enumerate(atom_symbols)]  # ['O0', 'H1', 'H2']

# --- Generate Permutations (Used for dropdown AND for CM calculation order) ---
original_indices = list(range(n_atoms))
permutations_indices = list(itertools.permutations(original_indices))
permutation_map = {}
for p_indices in permutations_indices:
    permuted_labels = [base_atom_labels[i] for i in p_indices]
    label_str = "-".join(permuted_labels)
    permutation_map[label_str] = list(p_indices)
permutation_labels = list(permutation_map.keys())

# --- Setup Coulomb Matrix Descriptor ---
# Ensure n_atoms_max is sufficient (at least n_atoms)
n_atoms_max = n_atoms
cm_desc = CoulombMatrix(
    n_atoms_max=n_atoms_max,
    permutation="none",  # <--- Use 'none' to respect input order
)
# print(f"Using CoulombMatrix with permutation='none', size=({n_atoms_max}x{n_atoms_max})")

# --- Visualization and Plotting Widgets ---
# NGLView (Left Panel)
view = nv.show_ase(atoms, camera="orthographic")  # Adjust camera as preferred
view.add_ball_and_stick()
view.center()
# view.control.zoom(0.1) # Adjust zoom if needed, this method might not exist, use camera params
view.layout.width = "60%"

# Plotly FigureWidget for Coulomb Matrix (Right Panel - Top)
fig_widget = go.FigureWidget()
# Initial heatmap trace with correct square dimensions (n_atoms x n_atoms)
heatmap_trace = fig_widget.add_heatmap(
    z=np.zeros((n_atoms, n_atoms)),  # Square matrix based on n_atoms
    x=base_atom_labels,  # Initial x-axis atom labels
    y=base_atom_labels,  # Initial y-axis atom labels
    colorscale="Viridis",  # Or another scale like 'RdBu'
    hoverongaps=False,
).data[0]
# Update layout
fig_widget.update_layout(
    yaxis={
        "title": "j",
        "tickvals": list(range(n_atoms)),
        "ticktext": base_atom_labels,
        "automargin": True,
    },
    xaxis={
        "title": "i",
        "tickvals": list(range(n_atoms)),
        "ticktext": base_atom_labels,
        "automargin": True,
    },  # , 'side': 'top'} # Optionally move x-axis ticks to top
)
fig_widget.layout.height = 350  # Adjust height

# --- Control Widgets (Right Panel - Bottom) ---
slider_rot_x = FloatSlider(
    min=-180,
    max=180,
    step=5,
    value=0,
    description="Rotate X (°)",
    continuous_update=False,
)
slider_rot_y = FloatSlider(
    min=-180,
    max=180,
    step=5,
    value=0,
    description="Rotate Y (°)",
    continuous_update=False,
)
slider_rot_z = FloatSlider(
    min=-180,
    max=180,
    step=5,
    value=0,
    description="Rotate Z (°)",
    continuous_update=False,
)
# Permutation Dropdown
permutation_dropdown = Dropdown(
    options=permutation_labels,
    value=permutation_labels[0],
    description="Atom Order:",
    style={"description_width": "initial"},
)
controls = VBox([permutation_dropdown, slider_rot_x, slider_rot_y, slider_rot_z])
controls.layout = Layout(margin="10px 0 0 0")

# --- Create the Right Panel VBox (Plot + Controls) ---
right_panel = VBox([fig_widget, controls])
right_panel.layout.width = "50%"

# --- Update Function ---
# Argument 'permutation_label' IS used now for plot data ordering


def update_molecule_and_cm_plot(rot_x, rot_y, rot_z, permutation_label):
    # 1. Calculate Rotated Positions & Update MAIN Atoms Object (for NGLView)
    rot_mat = R.from_euler("xyz", [rot_x, rot_y, rot_z], degrees=True).as_matrix()
    rotated_positions = (original_positions - center) @ rot_mat.T + center
    atoms.set_positions(rotated_positions)

    # 2. Update NGLView display
    if hasattr(view, "_ngl_component_ids") and view._ngl_component_ids:
        view.set_coordinates({0: atoms.positions})
    else:
        print("Warning: NGLView component not found.")

    # 3. Create Permuted Atoms, Calculate Coulomb Matrix, Update Plot
    try:
        # Get permutation indices and labels
        p_indices = permutation_map[permutation_label]
        displayed_labels = [base_atom_labels[i] for i in p_indices]

        # Create a temporary ASE Atoms object with the permuted order AND current positions
        # This is the object we feed to CoulombMatrix(permutation='none').create()
        permuted_atoms_for_cm = atoms[p_indices]
        # If using periodic=True, ensure cell is copied:
        # permuted_atoms_for_cm.set_cell(atoms.get_cell())
        # permuted_atoms_for_cm.set_pbc(atoms.get_pbc())

        # Calculate Coulomb Matrix for the permuted atoms object
        # Result shape: (1, n_atoms_max, n_atoms_max)
        cm_batch = cm_desc.create(permuted_atoms_for_cm)
        cm_matrix = cm_batch.reshape(
            len(atoms), len(atoms)
        )  # Extract the actual matrix (n_atoms x n_atoms)

        # --- Update Plotly Plot ---
        with fig_widget.batch_update():
            # Update heatmap data with the Coulomb Matrix
            heatmap_trace.z = cm_matrix
            # Update x and y axis labels to reflect the current permutation
            heatmap_trace.x = displayed_labels
            heatmap_trace.y = displayed_labels

            # Update layout axes ticks and labels
            fig_widget.layout.yaxis.ticktext = displayed_labels
            fig_widget.layout.yaxis.tickvals = list(range(len(displayed_labels)))
            fig_widget.layout.xaxis.ticktext = displayed_labels
            fig_widget.layout.xaxis.tickvals = list(range(len(displayed_labels)))

            fig_widget.layout.title = f"Coulomb Matrix (Order: {permutation_label})"
            fig_widget.layout.coloraxis.colorbar.title = "CM Value"

    except Exception as e:
        print(f"Error during CM calculation or plot update: {e}")
        with fig_widget.batch_update():
            heatmap_trace.z = np.zeros((n_atoms, n_atoms))  # Reset with correct shape
            heatmap_trace.x = base_atom_labels
            heatmap_trace.y = base_atom_labels
            fig_widget.layout.yaxis.ticktext = base_atom_labels
            fig_widget.layout.yaxis.tickvals = list(range(n_atoms))
            fig_widget.layout.xaxis.ticktext = base_atom_labels
            fig_widget.layout.xaxis.tickvals = list(range(n_atoms))
            fig_widget.layout.title = f"Error during update: {e}"


# --- Linking Widgets to Update Function ---
out = interactive_output(
    update_molecule_and_cm_plot,
    {  # Renamed function
        "rot_x": slider_rot_x,
        "rot_y": slider_rot_y,
        "rot_z": slider_rot_z,
        "permutation_label": permutation_dropdown,
    },
)

# --- Initial Calculation & Display ---
update_molecule_and_cm_plot(  # Renamed function
    slider_rot_x.value,
    slider_rot_y.value,
    slider_rot_z.value,
    permutation_dropdown.value,
)

# --- Display Layout (Using HBox or GridspecLayout) ---
layout = HBox([view, right_panel])
# grid = GridspecLayout(n_rows=1, n_columns=2, width='100%')
# grid[0, 0] = view
# grid[0, 1] = right_panel
# layout = grid # Use if HBox failed previously


display(layout)

HBox(children=(NGLWidget(layout=Layout(width='60%')), VBox(children=(FigureWidget({
    'data': [{'colorscale'…

# Smooth Overlap of Atomic Positions (SOAP)

SOAP creates a smooth 3D density around each atom by placing Gaussian blobs on nearby atoms.

Then it compares this density using:

- Spherical harmonics → captures angular information

- Radial basis functions → captures distance-based info

# Smooth Overlap of Atomic Positions (SOAP)

1. Build a **smooth density** around each atom by placing Gaussian blobs at neighboring atoms:
   $$
   \rho^Z(\mathbf{r}) = \sum_i e^{-\frac{1}{2\sigma^2} \lvert \mathbf{r} - \mathbf{R}_i \rvert^2}
   $$
   This gives a smooth "cloud" of nearby atoms (species $ Z $).

2. Decompose this atomic density using **radial basis functions** and **spherical harmonics**:
   $
   c^Z_{nlm} = \iiint g_n(r) Y_{lm}(\theta, \phi)\, \rho^Z(\mathbf{r}) \, dV
   $

    where
    - $g_n(r)$ is the radial basis functions
    - $Y_{lm}(\theta, \phi)$ is the spherical harmonics



# Smooth Overlap of Atomic Positions (SOAP)

To make the descriptors usable in ML, we build a **power spectrum vector**:

$$
p^{Z_1 Z_2}_{n n' l} = \pi \sqrt{\frac{8}{2l+1}} \sum_m {c^{Z_1}_{n l m}}^* c^{Z_2}_{n' l m}
$$

- Combines info about atom types $Z_1, Z_2 $
- Aggregates over angular components $ m $
- Output: A **fixed-length vector** (invariant, informative)

# Smooth Overlap of Atomic Positions (SOAP)


In [5]:
import itertools
import math  # For ceiling function

import ase
import ase.build
import nglview as nv

# --- Imports ---
import numpy as np
import plotly.graph_objects as go
from dscribe.descriptors import SOAP
from ipywidgets import (
    HTML,
    Dropdown,
    FloatSlider,
    HBox,
    Label,
    Layout,
    VBox,
    interactive_output,
)
from scipy.spatial.transform import Rotation as R

# --- Initial Molecule Setup ---
atoms = ase.build.molecule("H2O")
atoms.cell = [4, 4, 4]
original_positions = atoms.get_positions().copy()
center = atoms.get_center_of_mass()
n_atoms = len(atoms)
atom_symbols = [a.symbol for a in atoms]
base_atom_labels = [
    f"{s}{i}" for i, s in enumerate(atom_symbols)
]  # Still needed for NGLView/logic consistency

# --- Generate Permutations (Keep for dropdown, though plot won't use it directly) ---
original_indices = list(range(n_atoms))
permutations_indices = list(itertools.permutations(original_indices))
permutation_map = {}
for p_indices in permutations_indices:
    permuted_labels = [base_atom_labels[i] for i in p_indices]
    label_str = "-".join(permuted_labels)
    permutation_map[label_str] = list(p_indices)
permutation_labels = list(permutation_map.keys())

# --- Setup SOAP Descriptor ---
species = ["H", "O"]
r_cut = 6.0
n_max = 5
l_max = 3
# --- Modify SOAP Setup: Use average='outer' ---
soap_desc = SOAP(
    species=species,
    periodic=True,
    r_cut=r_cut,
    n_max=n_max,
    l_max=l_max,
    average="outer",  # <--- USE STRUCTURE-WIDE AVERAGE
    sparse=False,
)
n_features = soap_desc.get_number_of_features()
print(f"Using SOAP with average='outer'. Number of features: {n_features}")

# --- Calculate Reshaped Dimensions ---
d1 = int(math.ceil(math.sqrt(n_features)))
d2 = int(math.ceil(n_features / d1))
padded_size = d1 * d2
reshaped_x_labels = [f"idx_{i}" for i in range(d2)]
reshaped_y_labels = [f"idx_{i}" for i in range(d1)]
print(
    f"Reshaping average vector ({n_features},) to ({d1}, {d2}) with padding to {padded_size}."
)

# --- Visualization and Plotting Widgets ---
# NGLView (Left Panel)
view = nv.show_ase(atoms)
view.add_ball_and_stick()
view.center()
view.control.zoom(0.1)
# Set a fixed width for the NGLView widget if needed for layout
view.layout.width = "60%"


# Plotly FigureWidget for Reshaped Average SOAP (Right Panel - Top)
fig_widget = go.FigureWidget()
# Initial heatmap trace with reshaped dimensions (d1 x d2)
heatmap_trace = fig_widget.add_heatmap(
    z=np.zeros((d1, d2)),
    x=reshaped_x_labels,
    y=reshaped_y_labels,
    colorscale="Viridis",
    #     colorbar={'title': 'Avg SOAP Val'},
    hoverongaps=False,
).data[0]
# Update layout
fig_widget.update_layout(
    #     title="Reshaped Structure-Average SOAP", # Updated title
    yaxis={
        "title": "j",
        "tickvals": list(range(d1)),
        "ticktext": reshaped_y_labels,
        "showticklabels": False,
        "automargin": True,
    },
    xaxis={
        "title": "i",
        "tickvals": list(range(d2)),
        "ticktext": reshaped_x_labels,
        "showticklabels": False,
        "automargin": True,
    },
)
fig_widget.layout.height = 300  # Adjust height

# --- Control Widgets (Right Panel - Bottom) ---
slider_rot_x = FloatSlider(
    min=-180,
    max=180,
    step=5,
    value=0,
    description="Rotate X (°)",
    continuous_update=False,
)
slider_rot_y = FloatSlider(
    min=-180,
    max=180,
    step=5,
    value=0,
    description="Rotate Y (°)",
    continuous_update=False,
)
slider_rot_z = FloatSlider(
    min=-180,
    max=180,
    step=5,
    value=0,
    description="Rotate Z (°)",
    continuous_update=False,
)
# --- Keep Permutation Dropdown (even if it doesn't affect this plot) ---
permutation_dropdown = Dropdown(
    options=permutation_labels,
    value=permutation_labels[0],
    description="Atom Order:",
    style={"description_width": "initial"},
)
controls = VBox(
    [permutation_dropdown, slider_rot_x, slider_rot_y, slider_rot_z]
)  # Keep permutation dropdown
controls.layout = Layout(margin="10px 0 0 0")

# --- Create the Right Panel VBox (Plot + Controls) ---
right_panel = VBox([fig_widget, controls])
right_panel.layout.width = "50%"

# --- Update Function ---
# Argument 'permutation_label' is kept for linkage but not used for plot data


def update_molecule_and_reshaped_avg_plot(rot_x, rot_y, rot_z, permutation_label):
    # 1. Calculate Rotated Positions & Update Atoms Object
    rot_mat = R.from_euler("xyz", [rot_x, rot_y, rot_z], degrees=True).as_matrix()
    rotated_positions = (original_positions - center) @ rot_mat.T + center
    atoms.set_positions(rotated_positions)

    # 2. Update NGLView display
    if hasattr(view, "_ngl_component_ids") and view._ngl_component_ids:
        view.set_coordinates({0: atoms.positions})
    else:
        print("Warning: NGLView component not found.")

    # 3. Calculate Structure-Average SOAP, Pad, Reshape, Update Plot
    try:
        # Calculate structure-wide average SOAP features
        # Result shape: (n_features,) OR (1, n_features) depending on dscribe version
        # Ensure it's 1D for padding/reshaping
        avg_soap_vector = soap_desc.create(atoms).flatten()  # Use flatten() to be safe

        # --- Permutation Dropdown has NO effect on avg_soap_vector ---

        # Pad the vector with zeros
        padded_vector = np.pad(
            avg_soap_vector,
            (0, padded_size - n_features),
            mode="constant",
            constant_values=0,
        )

        # Reshape the padded vector into a 2D matrix
        reshaped_avg_soap = padded_vector.reshape((d1, d2))

        # --- Update Plotly Plot ---
        with fig_widget.batch_update():
            # Update heatmap data with the reshaped 2D matrix
            heatmap_trace.z = reshaped_avg_soap
            # X and Y axes represent indices within the reshaped matrix
            heatmap_trace.x = reshaped_x_labels
            heatmap_trace.y = reshaped_y_labels

            # Update layout axes and title
            fig_widget.layout.yaxis.ticktext = reshaped_y_labels
            fig_widget.layout.yaxis.tickvals = list(range(d1))
            fig_widget.layout.xaxis.ticktext = reshaped_x_labels
            fig_widget.layout.xaxis.tickvals = list(range(d2))
    #             fig_widget.layout.title = "Reshaped Structure-Average SOAP" # Updated title
    #             fig_widget.layout.coloraxis.colorbar.title = 'Avg SOAP Val'

    except Exception as e:
        print(f"Error during SOAP calculation or plot update: {e}")
        with fig_widget.batch_update():
            heatmap_trace.z = np.zeros((d1, d2))  # Reset with correct shape
            heatmap_trace.x = reshaped_x_labels
            heatmap_trace.y = reshaped_y_labels
            fig_widget.layout.yaxis.ticktext = reshaped_y_labels
            fig_widget.layout.yaxis.tickvals = list(range(d1))
            fig_widget.layout.xaxis.ticktext = reshaped_x_labels
            fig_widget.layout.xaxis.tickvals = list(range(d2))
            fig_widget.layout.title = f"Error during update: {e}"


# --- Linking Widgets to Update Function ---
# Keep link to permutation_dropdown, even though function doesn't use its value for plot
out = interactive_output(
    update_molecule_and_reshaped_avg_plot,
    {
        "rot_x": slider_rot_x,
        "rot_y": slider_rot_y,
        "rot_z": slider_rot_z,
        "permutation_label": permutation_dropdown,
    },
)

# --- Initial Calculation & Display ---
update_molecule_and_reshaped_avg_plot(
    slider_rot_x.value,
    slider_rot_y.value,
    slider_rot_z.value,
    permutation_dropdown.value,
)

# --- Display Layout (Using HBox or GridspecLayout) ---
layout = HBox([view, right_panel])

display(layout)

Using SOAP with average='outer'. Number of features: 220
Reshaping average vector (220,) to (15, 15) with padding to 225.


HBox(children=(NGLWidget(layout=Layout(width='60%')), VBox(children=(FigureWidget({
    'data': [{'colorscale'…

# Machine Learning tools

<figure style="text-align: center;">
  <img src="img/4.jpg"
       style="height: 600px;">
  <figcaption style="font-size: 0.85em; color: grey; margin-top: 0.5em;">Source: https://vas3k.com/blog/machine_learning/</figcaption>
</figure>

# What Is Machine Learning?

**Machine Learning (ML)** is a way to make a computer learn patterns from data without being explicitly programmed.

- Instead of writing rules, we **train models on examples**.
- The goal: learn a function that maps **inputs (features)** to **outputs (labels)**.

$$
\text{Model:} \quad f(\mathbf{x}) \rightarrow y
$$

Where:
- $ \mathbf{x} $: features (input data)
- $ y $: target (label/output)
- $ f $: the model (e.g., a linear function, a neural network, etc.)


# Supervised vs Unsupervised Learning

### **Supervised Learning**  
We have **input-output pairs** and the model learns to predict the output:
- 📈 Regression (predict numbers): e.g. Young’s modulus, melting point
- 🧪 Classification (predict categories): e.g. solid/liquid/gas, metal/insulator

$$
f(\mathbf{x}) \approx y
$$

### **Unsupervised Learning**  
We only have **inputs**, and the model finds structure or groups:
- 🔍 Clustering (group similar materials)
- 📉 Dimensionality reduction (simplify the data space)

$$
\text{No labels! Just } \mathbf{x}_1, \mathbf{x}_2, \dots
$$

# Model Types

- **Features (\( \mathbf{x} \))**: what we know (e.g., atomic number, density, structure)
- **Labels (\( y \))**: what we want to predict (e.g., band gap, failure strength)

## 🧠 Model (Main) Types:
| Type        | Task Example             | Output       |
|-------------|--------------------------|--------------|
| Regressor   | Predict melting point    | Real value   |
| Classifier  | Classify crystal system  | Category     |
| Clustering  | Group similar alloys     | Cluster ID   |


# Model Types

<figure style="text-align: center;">
  <img src="img/5.jpg"
       style="height: 600px;">
  <figcaption style="font-size: 0.85em; color: grey; margin-top: 0.5em;">Source: https://vas3k.com/blog/machine_learning/</figcaption>
</figure>

# Overfitting vs Underfitting

### 📉 Underfitting
- Model is too simple → **misses patterns**
- High error on training and test data


### 📈 Overfitting
- Model is too complex → **memorizes training data**
- Low training error, but poor on new data



# Overfitting vs Underfitting

<figure style="text-align: center;">
  <img src="img/6.png"
       style="width: 100%;">
  <figcaption style="font-size: 0.85em; color: grey; margin-top: 0.5em;">Source: https://github.com/jermwatt/machine_learning_refined/</figcaption>
</figure>

# Tipical Workflow

<figure style="text-align: center;">
  <img src="img/7.png"
       style="height: 500px;">
  <figcaption style="font-size: 0.85em; color: grey; margin-top: 0.5em;">Source: https://aronwalsh.github.io/MLforMaterials</figcaption>
</figure>

# Clustering

Clustering is an **unsupervised learning** technique that groups similar data points **without labels**.

### Common Clustering Models:

| Algorithm         | Idea                                  |
|-------------------|----------------------------------------|
| **K-Means**        | Assign points to \( k \) nearest centroids |
| **DBSCAN**         | Groups based on density (no need to set \( k \)) |
| **Hierarchical**   | Builds a tree of nested clusters      |
| **Gaussian Mixture** | Probabilistic soft clustering         |



# Gausian Mixture

<figure style="text-align: center;">
  <img src="img/8.gif"
       style="height: 300px;">
  <figcaption style="font-size: 0.85em; color: grey; margin-top: 0.5em;">Source: Wikipedia</figcaption>
</figure>

# Gausian Mixture

In [6]:
# --- Imports ---
import numpy as np
import plotly.graph_objects as go
import scipy.stats
from ipywidgets import FloatSlider, HBox, HTMLMath, Layout, VBox, interactive_output

# --- Generate Sample Data (1D Cluster) ---
np.random.seed(42)  # for reproducibility
true_mean = 5.0
true_std = 1.2
n_samples = 30
sample_data = np.random.normal(loc=true_mean, scale=true_std, size=n_samples)
# Calculate data range for plotting
data_min, data_max = np.min(sample_data), np.max(sample_data)
plot_margin = 2.0 * true_std
plot_min = data_min - plot_margin
plot_max = data_max + plot_margin

# --- Widgets ---

# Sliders for model parameters
mu_slider = FloatSlider(
    min=plot_min,
    max=plot_max,
    step=0.1,
    value=1.0,
    description="Model μ (Mean)",
    continuous_update=False,
    readout_format=".1f",
)
sigma_slider = FloatSlider(
    min=0.1,
    max=5.0,
    step=0.1,
    value=3.0,
    description="Model σ (Std Dev)",
    continuous_update=False,
    readout_format=".1f",
)
controls_box = VBox([mu_slider, sigma_slider])  # Place sliders side-by-side

# == Left Panel Widgets ==
# Equation Display (Log-Likelihood for N points from Gaussian N(μ, σ^2))
# Using HTMLMath for LaTeX rendering
logL_equation = HTMLMath(
    value=r"$$\mathcal{L}(\mu, \sigma | \mathbf{x}) = \sum_{i=1}^{N} \log P(x_i | \mu, \sigma^2) = \sum_{i=1}^{N} \left( -\log(\sigma) - \frac{1}{2}\log(2\pi) - \frac{(x_i - \mu)^2}{2\sigma^2} \right)$$"
)

# Plotly Bar chart for Log-Likelihood value
likelihood_fig = go.FigureWidget()
# Initialize with a single bar, value will be updated
likelihood_bar_trace = likelihood_fig.add_bar(x=["Log-Likelihood"], y=[0]).data[0]
likelihood_fig.update_layout(
    title="Model Log-Likelihood",
    yaxis_title="Log L",
    height=300,
    width=350,
    margin=dict(t=50, b=30, l=50, r=30),
    yaxis_range=[
        -200,
        0,
    ],  # Adjust range based on expected values, logL is often negative
)

left_panel = VBox([likelihood_fig, controls_box])  # Add right margin
left_panel.layout.width = "50%"

# == Right Panel Widgets ==
# Plotly plot for PDF and data points
pdf_fig = go.FigureWidget()
# Trace for the model Gaussian PDF curve (initially empty)
pdf_curve_trace = pdf_fig.add_scatter(x=[], y=[], mode="lines", name="Model PDF").data[
    0
]
# Trace for the sample data points (plotted along y=0)
pdf_data_trace = pdf_fig.add_scatter(
    x=sample_data,
    y=np.zeros_like(sample_data),  # Place markers on x-axis
    mode="markers",
    name="Data Points",
    marker=dict(
        symbol="line-ns-open", size=10, color="red", line=dict(width=2)
    ),  # Use vertical lines as markers
).data[0]
# Layout for the PDF plot
pdf_fig.update_layout(
    title="1D Gaussian Mixture",
    xaxis_title="Data Value",
    yaxis_title="Probability Density",
    xaxis_range=[plot_min, plot_max],
    height=400,
    width=550,
    margin=dict(t=50, b=40, l=50, r=30),
    legend=dict(yanchor="top", y=0.99, xanchor="left", x=0.01),
)

right_panel = pdf_fig

# --- Update Function ---


def update_plots(mu, sigma):
    """Calculates log-likelihood and updates plots based on slider values."""
    if sigma <= 0:  # Should not happen with slider min=0.1, but good practice
        return

    # Calculate Log-Likelihood
    # Use scipy.stats.norm.logpdf for numerical stability
    log_likelihoods = scipy.stats.norm.logpdf(sample_data, loc=mu, scale=sigma)
    current_log_likelihood = np.sum(log_likelihoods)

    # Update Likelihood Bar Plot
    with likelihood_fig.batch_update():
        likelihood_bar_trace.y = [current_log_likelihood]
        # Optional: adjust y-axis dynamically, but fixed can be better for comparison
        # current_range = likelihood_fig.layout.yaxis.range
        # likelihood_fig.layout.yaxis.range = [min(current_range[0], current_log_likelihood - 10), max(current_range[1], 0)]

    # Update PDF Plot
    # Generate x values for the PDF curve
    pdf_x = np.linspace(plot_min, plot_max, 300)
    # Calculate PDF y values for the current mu, sigma
    pdf_y = scipy.stats.norm.pdf(pdf_x, loc=mu, scale=sigma)

    with pdf_fig.batch_update():
        # Update the PDF curve data
        pdf_curve_trace.x = pdf_x
        pdf_curve_trace.y = pdf_y
        # Ensure y-axis adjusts to show the peak
        # pdf_fig.layout.yaxis.range = [0, max(pdf_y)*1.1] # Auto-adjust y-axis


# --- Linking Widgets ---
# Link sliders to the update function
out = interactive_output(update_plots, {"mu": mu_slider, "sigma": sigma_slider})

# --- Initial Plot Update ---
update_plots(mu_slider.value, sigma_slider.value)

# --- Assemble Final Layout ---
# HBox for the two main panels
main_panels = HBox([left_panel, right_panel])
# VBox for the overall layout: controls on top, panels below
final_layout = VBox([controls_box, main_panels])

# --- Display ---

display(logL_equation)
display(main_panels)

HTMLMath(value='$$\\mathcal{L}(\\mu, \\sigma | \\mathbf{x}) = \\sum_{i=1}^{N} \\log P(x_i | \\mu, \\sigma^2) =…

HBox(children=(VBox(children=(FigureWidget({
    'data': [{'type': 'bar',
              'uid': 'd212a7b0-b464-…

# Classification

Classification is a **supervised learning** task. The model learns from **labeled examples** to predict **categories**.

$$
f(\mathbf{x}) \rightarrow \text{label (class)}
$$


### Common Classification Models:

| Model                     | Notes                             |
|---------------------------|-----------------------------------|
| **Logistic Regression**   | Simple and interpretable          |
| **Decision Tree**         | Tree of yes/no decisions          |
| **Random Forest**         | Many trees voting together        |
| **Support Vector Machine**| Maximizes margin between classes  |
| **k-Nearest Neighbors**   | Looks at neighbors' labels        |
| **Neural Networks**       | Flexible, good for complex patterns |



# Decision Tree

<figure style="text-align: center;">
  <img src="img/9-1.png"
       style="height: 500px;">
  <figcaption style="font-size: 0.85em; color: grey; margin-top: 0.5em;">Source: Sci-kit Learn</figcaption>
</figure>

# Decision Tree

<figure style="text-align: center;">
  <img src="img/9-2.png"
       style="height: 500px;">
  <figcaption style="font-size: 0.85em; color: grey; margin-top: 0.5em;">Source: Sci-kit Learn</figcaption>
</figure>

#  Regression 

Regression is also **supervised learning**, but the target is a **real number** (not a category).

$$
f(\mathbf{x}) \rightarrow y \in \mathbb{R}
$$


### Common Regression Models:

| Model                   | Notes                             |
|-------------------------|-----------------------------------|
| **Linear Regression**    | Fits a straight line              |
| **Polynomial Regression**| Fits curves                      |
| **Random Forest Regressor** | Ensemble of trees                |
| **Support Vector Regressor** | Like SVM, but for numbers        |
| **Gradient Boosting**     | Boosted decision trees           |
| **Neural Network (NN)**  | Good for high-dimensional data   |


# Neural Network

<div style="display: flex; align-items: flex-start; gap: 2em;">

  <!-- Image with caption -->
  <figure style="margin: 0; text-align: center;">
    <img src="img/10-1.png" alt="Example" width="900px">
    <figcaption style="font-size: 0.65em; color: grey; margin-top: 0.5em;">
      Source: "Comparative study of the sensory areas of the human cortex" by Santiago Ramon y Cajal, published 1899, ISBN 9781458821898
![image-2.png](attachment:image-2.png)
    </figcaption>
  </figure>

  <!-- Text and list -->
  <div>
    <ul style="margin-top: 0;">
      <li>Just like biological neurons in our brain, where neurons are connected through synapses to transmit signals, forming a Biological Neural Network (BNN). </li>
      <li>The artificial neurons in a neural network are interconnected layers that pass information. 
</li>
      <li>In both systems, when a signal (input) reaches a threshold, it triggers a response.
</li>
    </ul>
  </div>

</div>

# Neural Network

<div style="display: flex; align-items: flex-start; gap: 2em;">

  <!-- Image with caption -->
  <figure style="margin: 0; text-align: center;">
    <img src="img/10-2.png" alt="Example" width="1000px">
    <figcaption style="font-size: 0.65em; color: grey; margin-top: 0.5em;">
      Source: Image by Bruce Blaus (Creative Commons 3.0). Reproduced from https://en.wikipedia.org/wiki/Neuron. 
    </figcaption>
  </figure>

  <!-- Text and list -->
  <div>
    <ul style="margin-top: 0;">
      <li>In Artificial Neural Networks (ANN or NN), these artificial neurons process input data, learn patterns through training, and adjust their connections (weights) to make decisions or predictions.</li>
      <li>It mimics how biological networks adapt to stimuli over time.
</li>
    </ul>
  </div>

</div>

# Neural Network



<div style="display: flex; align-items: flex-start; gap: 2em;">

  <!-- Image with caption -->
  <figure style="margin: 0; text-align: center;">
    <img src="img/10-3.png" alt="Example" width="1200px">
  </figure>

  <!-- Text and list -->
  <div>
    <ul style="margin-top: 0;">
      <li><strong>Inputs</strong>: The perceptron takes several inputs, each representing features of the data (like pixel values in an image).
</li>
      <li><strong>Weights</strong>: The weighted inputs are summed together.</li>
              <li><strong>Activation Function</strong>:The sum is passed through an activation function (like a step function), which decides if the perceptron "fires" or not, producing an output (usually 0 or 1).</li>
    </ul>
  </div>

</div>

$$
\hat{y} = f^{(L)}\left( W^{(L)} f^{(L-1)} \left( \cdots f^{(1)}\left( W^{(1)} \mathbf{x} + \mathbf{b}^{(1)} \right) \cdots \right) + \mathbf{b}^{(L)} \right)
$$


# Neural Network

<div style="display: flex; align-items: flex-start; gap: 2em;">

  <figure style="margin: 0; text-align: center;">
    <img src="img/10-4.gif" alt="Neural Network Diagram" width="800px">
  <figcaption style="font-size: 0.85em; color: grey; margin-top: 0.5em;">Source: https://vas3k.com/blog/machine_learning/</figcaption>
  </figure>

  <!-- Text content -->
  <div>
    <h3 style="margin-top: 0;"> Training a Neural Network</h3>
    <ul style="margin-top: 0;">
      <li><strong>Goal:</strong> Minimize error between predictions \( \hat{y} \) and targets \( y \).</li>
      <li><strong>Loss:</strong> Measures prediction error  
        <br>e.g., MSE: \( \mathcal{L} = \frac{1}{N} \sum (y_i - \hat{y}_i)^2 \)
      </li>
      <li><strong>Optimization:</strong>  
        <br>Gradient descent with backpropagation  
        <br>\( \theta \leftarrow \theta - \eta \cdot \nabla_\theta \mathcal{L} \)
      </li>
    </ul>
    <p style="margin-top: 0.5em;"Repeat over many epochs to reach convergence.</p>
  </div>

</div>


# Foundational Model for Materials

**MACE** : [*Message Passing Atomic Cluster Expansion*](https://github.com/ACEsuit/mace-foundations)

It is a **neural network potential** designed to predict material properties like **energy** and **forces** with **DFT-level accuracy**, but **much faster**.

### How MACE Works:

- Combines ideas from:
  - **Graph Neural Networks (GNNs)** → atoms as nodes, bonds as edges
  - **Atomic Cluster Expansion (ACE)** → includes high-order geometric terms
- Uses **equivariant message passing** → respects physical symmetries:
  - Rotation
  - Translation
  - Permutation of identical atoms

# Foundational Model for Materials

### Why is groundbreaking

- Trained on massive DFT datasets (e.g. Materials Project)
- Covers **~89 elements**, broad chemical diversity
- Learns a **general representation** of atomic interactions

> **Ambition:** Become a universal, reusable model for predicting atomic-scale material behavior – like a GPT for materials