This Python script employs a five-folds leveraging the `mp_time_split` subpackage. By treating materials discovered in the future as generated data, the script benchmarks the progress of materials discovery over time. Each fold partitions the dataset into distinct training, validation, and generated sets, simulating the incremental advancement of materials science.

<a href="https://colab.research.google.com/github/sparks-baird/matbench-genmetrics/blob/main/notebooks/core/2.0-matbench-genmetrics-materials_discovery_progress_benchmark.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<font color="red">**NOTE: If using Colab, "Restart Runtime" after installation.**</font>

In [None]:
from google.colab import output
output.enable_custom_widget_manager()

In [None]:
try:
    import google.colab  # type: ignore # noqa: F401
    %pip install git+https://github.com/sparks-baird/matbench-genmetrics.git
except ImportError:
    print("not in Colab")

In [1]:
from matbench_genmetrics.core.metrics import MPTSMetrics
from tqdm.notebook import tqdm
from pprint import pprint

In [4]:
%%time
n = 5396
mptm = MPTSMetrics(dummy=False, verbose=True, num_gen=n)
mptm2 = MPTSMetrics(dummy=False, verbose=True, num_gen=n)
for fold in tqdm(mptm.folds[0:4]):  
    train_val_inputs_1, test_structures = mptm.get_train_and_val_data(fold, return_test=True)
    train_val_inputs_2, gen_structures = mptm2.get_train_and_val_data(fold + 1, return_test=True)

    mptm.evaluate_and_record(fold, gen_structures[0:n])
    
pprint(mptm.recorded_metrics)

  0%|          | 0/4 [00:00<?, ?it/s]

Reading file c:\Users\hasan\miniconda3\envs\matbench-genmetrics\lib\site-packages\matbench_genmetrics\mp_time_split\utils\mp_time_summary.json.gz: 0it [00:53, ?it/s]#####9| 40446/40476 [00:52<00:00, 1037.34it/s]
Decoding objects from c:\Users\hasan\miniconda3\envs\matbench-genmetrics\lib\site-packages\matbench_genmetrics\mp_time_split\utils\mp_time_summary.json.gz: 100%|##########| 40476/40476 [00:52<00:00, 763.91it/s] 
Reading file c:\Users\hasan\miniconda3\envs\matbench-genmetrics\lib\site-packages\matbench_genmetrics\mp_time_split\utils\mp_time_summary.json.gz: 0it [00:51, ?it/s]#####9| 40379/40476 [00:50<00:00, 1039.70it/s]
Decoding objects from c:\Users\hasan\miniconda3\envs\matbench-genmetrics\lib\site-packages\matbench_genmetrics\mp_time_split\utils\mp_time_summary.json.gz: 100%|##########| 40476/40476 [00:51<00:00, 792.37it/s] 

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

{0: {'coverage': 0.03280207561156412,
     'novelty': 0.9571905114899926,
     'uniqueness': 0.9998874668429091,
     'validity': 0.9657295385947532},
 1: {'coverage': 0.032987398072646404,
     'novelty': 0.948295033358043,
     'uniqueness': 0.9999187947547732,
     'validity': 0.9689630506461338},
 2: {'coverage': 0.019644180874722018,
     'novelty': 0.9529280948851001,
     'uniqueness': 0.999984404745629,
     'validity': 0.942820170335924},
 3: {'coverage': 0.02575982209043736,
     'novelty': 0.939770200148258,
     'uniqueness': 0.9999801452488405,
     'validity': 0.8635418248589397}}
CPU times: total: 5min 5s
Wall time: 3h 58min 44s
