# Navigating the Combinatorial Explosion with SMACT

In this notebook, we'll explore the scale of chemical space and learn how to navigate it efficiently using SMACT. We'll:

1. **Visualize** the combinatorial explosion
2. **Calculate** the number of possible materials
3. **Apply** chemical filters to find viable candidates
4. **Generate** practical composition lists for further study

Let's start by understanding just how vast chemical space really is!

## Part 1: The Mathematics of Combinatorial Explosion

The sheer scale of chemical space is mind-boggling. Let's start with pure mathematics to understand the problem before we apply chemistry.\n\nThis section contains **ZERO** chemistry - it's a mathematical exercise to demonstrate the scale we're dealing with. The numbers we'll calculate match roughly those in Table 1 of the publication [\"Computational Screening of All Stoichiometric Inorganic Materials\"](https://www.cell.com/chem/fulltext/S2451-9294(16)30155-3) by Davies et al. (Chem, 2016).\n\n*Note: This calculation can take a minute or two for quaternary compounds on a typical computer.*"

In [ ]:
# Import required libraries
import itertools
from math import gcd, comb
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict

# SMACT imports
import smact
from smact import Element, ordered_elements
from smact.screening import smact_filter

# For visualization
plt.style.use('default')
if 'seaborn' in plt.style.available:
    plt.style.use('seaborn-v0_8-darkgrid')

print(f\"SMACT version: {smact.__version__}\")\nprint(\"Ready to explore chemical space!\")\n\n# First, let's see the scale of possible combinations\ndef count_combinations(n_elements, total_elements=103):\n    \"\"\"Calculate number of unique combinations of n elements.\"\"\"\n    return comb(total_elements, n_elements)\n\n# Calculate combinations for different numbers of elements\nn_elements_list = range(1, 8)\ncombinations_count = []\n\nprint(\"\\nNumber of possible element combinations:\")\nprint(f\"{'Elements':>10} | {'Combinations':>15} | {'Scientific':>12}\")\nprint(\"-\" * 45)\n\nfor n in n_elements_list:\n    count = count_combinations(n)\n    combinations_count.append(count)\n    print(f\"{n:>10} | {count:>15,} | {count:>12.3e}\")\n\n# Visualize the explosion\nplt.figure(figsize=(10, 6))\nplt.semilogy(n_elements_list, combinations_count, 'o-', markersize=10, linewidth=2)\nplt.xlabel('Number of Elements in Compound', fontsize=12)\nplt.ylabel('Number of Possible Combinations (log scale)', fontsize=12)\nplt.title('The Combinatorial Explosion in Materials Science', fontsize=14, fontweight='bold')\nplt.grid(True, alpha=0.3)\n\n# Add annotations for key points\nfor i, (n, count) in enumerate(zip([2, 3, 4], combinations_count[1:4])):\n    name = ['Binary', 'Ternary', 'Quaternary'][i]\n    plt.annotate(f'{name}\\n{count:,.0f}', \n                xy=(n, count), \n                xytext=(n+0.5, count*3),\n                fontsize=10,\n                ha='center',\n                arrowprops=dict(arrowstyle='->', color='gray', alpha=0.7))\n\nplt.tight_layout()\nplt.show()

## Part 2: Stoichiometry - It Gets Worse!\n\nSo far we've only counted *element combinations*. But each combination can form multiple compounds with different stoichiometric ratios:\n\n- **AB**: one ratio\n- **AB₂, A₂B**: two more ratios  \n- **AB₃, A₃B, A₂B₃, A₃B₂**: four more ratios\n- And so on...\n\nLet's see how this multiplies our already enormous numbers."

In [None]:
def count_stoichiometries(n_elements, max_coefficient=8):
    """Count possible stoichiometric ratios for n elements up to max_coefficient"""
    def is_irreducible(stoichs):
        """Check if stoichiometry can be reduced to smaller integers"""
        for i in range(1, len(stoichs)):
            if gcd(stoichs[i - 1], stoichs[i]) == 1:
                return True
        return False
    
    count = sum(1 for combo in itertools.product(
        *(n_elements * (tuple(range(1, max_coefficient + 1)),))
    ) if is_irreducible(combo))
    
    return count

# Example usage
n_elements = 3  # ternary compounds
max_coeff = 8
count = count_stoichiometries(n_elements, max_coeff)
print(f"Number of possible stoichiometries: {count}")

Number of possible stoichiometries: 433


In [ ]:
## Part 3: From Astronomical to Practical - Enter SMACT!\n\nThe numbers above are overwhelming, but here's the good news: **most combinations are chemically impossible**.\n\nSMACT applies chemical rules to filter out:\n- Impossible oxidation states (like Na⁻⁵)\n- Combinations that violate charge neutrality \n- Elements that don't typically bond together\n\nLet's see this filtering in action with real elements:\n\n# Let's work with a manageable set: first-row transition metals\nfrom smact import Element, element_dictionary, ordered_elements\nfrom smact.screening import smact_filter\nimport multiprocessing\nfrom datetime import datetime\n\ndef generate_viable_compositions(elements, n_elements=3):\n    \"\"\"Generate chemically viable compositions using SMACT filters.\"\"\"\n    # Create element combinations\n    element_combos = itertools.combinations(elements, n_elements)\n    systems = [[*combo] for combo in element_combos]\n    \n    print(f\"Processing {len(systems)} {n_elements}-element combinations...\")\n    \n    # Apply SMACT filtering in parallel for speed\n    with multiprocessing.Pool(processes=min(4, multiprocessing.cpu_count())) as pool:\n        results = pool.map(smact_filter, systems)\n    \n    # Flatten results and count\n    compositions = [item for sublist in results for item in sublist if sublist]\n    return compositions\n\n# Example: First-row transition metals (Sc to Zn)\ntransition_metals = [Element(symbol).symbol for symbol in ordered_elements(21, 30)]\nprint(f\"Working with transition metals: {', '.join(transition_metals)}\")\n\n# Raw combinations\nraw_ternary = comb(len(transition_metals), 3)\nprint(f\"\\nRaw ternary combinations: {raw_ternary:,}\")\n\n# Apply SMACT filtering\nviable_compositions = generate_viable_compositions(transition_metals, n_elements=3)\nprint(f\"Viable ternary compositions: {len(viable_compositions):,}\")\nprint(f\"Reduction factor: {raw_ternary/len(viable_compositions):.1f}x\")\nprint(\"\\nSMACT has reduced our search space dramatically!\")"

In [ ]:
## Part 4: Practical Results - Real Material Formulas\n\nLet's convert our filtered compositions into readable chemical formulas and see what we've found:\n\nimport pandas as pd\nfrom pymatgen.core import Composition\n\ndef format_compositions(compositions):\n    \"\"\"Convert SMACT compositions to readable formulas.\"\"\"\n    formulas = []\n    for comp in compositions:\n        elements, oxidations, stoichs = comp\n        # Create formula string\n        formula_parts = []\n        for el, stoich in zip(elements, stoichs):\n            if stoich == 1:\n                formula_parts.append(el)\n            else:\n                formula_parts.append(f\"{el}{stoich}\")\n        formula = ''.join(formula_parts)\n        \n        # Simplify using pymatgen if possible\n        try:\n            simplified = Composition(formula).reduced_formula\n            formulas.append(simplified)\n        except:\n            formulas.append(formula)\n    \n    return formulas\n\n# Convert to readable formulas\nformulas = format_compositions(viable_compositions)\ndf_results = pd.DataFrame({\"formula\": formulas}).drop_duplicates()\n\nprint(f\"\\nGenerated {len(df_results)} unique transition metal compounds:\")\nprint(\"\\nSample formulas:\")\nfor i, formula in enumerate(df_results['formula'].head(15)):\n    print(f\"  {i+1:2d}. {formula}\")\n\nif len(df_results) > 15:\n    print(f\"  ... and {len(df_results) - 15} more!\")\n\n# Some statistics\nprint(f\"\\n📊 **Key Results:**\")\nprint(f\"- Started with {raw_ternary:,} possible combinations\")\nprint(f\"- SMACT filtered to {len(viable_compositions):,} viable compositions\")\nprint(f\"- This gives us {len(df_results)} unique material formulas\")\nprint(f\"- Reduction factor: **{raw_ternary/len(viable_compositions):.0f}x smaller search space**\")\n\nprint(f\"\\n🎯 **The Power of Informatics:**\")\nprint(f\"Instead of testing {raw_ternary:,} combinations randomly,\")\nprint(f\"we can focus on {len(df_results)} promising candidates!\")\n\n## Summary and Next Steps\n\nprint(f\"\\n🚀 **What We've Learned:**\")\nprint(f\"• Chemical space contains **trillions** of possible materials\")\nprint(f\"• Most combinations are chemically **impossible**\")\nprint(f\"• SMACT filters reduce the space by **orders of magnitude**\")\nprint(f\"• We can generate **focused lists** of viable candidates\")\n\nprint(f\"\\n📖 **Coming Up Next:**\")\nprint(f\"• Apply more sophisticated chemical filters\")\nprint(f\"• Screen for specific material properties\")\nprint(f\"• Predict crystal structures for our candidates\")\nprint(f\"• Use AI to explore even larger chemical spaces\")\n\nprint(f\"\\n💡 **Try This:**\")\nprint(f\"• Change the element list to include oxygen or sulfur\")\nprint(f\"• Try binary or quaternary compositions\")\nprint(f\"• Filter for specific oxidation state patterns\")\nprint(f\"• Export results for use in other tutorials\")"