# MODNet (v0.1.10)

For now, this benchmark file simply loads our existing full benchmark results (265 MB) from [ml-evs/modnet-matbench](https://github.com/ml-evs/modnet-matbench) and exports them in the matbench format. Code for featurisation, hyperparameter optimisation and the final predictions themselves can be found in the aforementioned repository or in the illustrative run.py file.

In [2]:
def download_and_extract(url, fname):
    import urllib
    from zipfile import ZipFile
    import os
    if os.path.exists(fname):
        print(f"File {fname} already found, will not redownload.")
        return
    
    response = urllib.request.urlretrieve(url, fname)
    with ZipFile(fname, "r") as _zip:
        _zip.extractall(".")

# repo = "ml-evs/modnet-matbench"
repo = "ppdebreuck/modnet-matbench"

version = "main"
fname = f'modnet-matbench-{version.replace("#", "-").replace("/", "-")}'
if version.startswith("v"):
    url = f"https://github.com/{repo}/archive/refs/tags/{version}.zip"
else:
    url = f'https://github.com/{repo}/archive/refs/heads/{version.replace("#", "%23")}.zip'


download_and_extract(url, fname + ".zip")

In [3]:
import pickle
from matbench.bench import MatbenchBenchmark
from matbench.constants import CLF_KEY

mb = MatbenchBenchmark(
    autoload=False, 
    subset=[
        'matbench_dielectric', 
        'matbench_jdft2d', 
        'matbench_steels', 
        'matbench_expt_gap', 
        'matbench_phonons',
        'matbench_log_gvrh',
        'matbench_log_kvrh',
        'matbench_glass', 
        'matbench_expt_is_metal',
        'matbench_perovskites',
        'matbench_mp_gap',
        'matbench_mp_is_metal',
        'matbench_mp_e_form'
    ],
)

results_locs = {task.dataset_name: f"{fname}/{task.dataset_name}/results/{task.dataset_name}_results.pkl" for task in mb.tasks}
# Remap filename for elastic tasks as they were joint-learned
results_locs["matbench_log_gvrh"] = results_locs["matbench_log_kvrh"] = f"{fname}/matbench_elastic/results/matbench_elastic_results.pkl"
target_key_map = {"matbench_log_gvrh": "log10G_VRH", "matbench_log_kvrh": "log10K_VRH"}

2021-10-08 10:30:29 INFO     Initialized benchmark 'matbench_v0.1' with 13 tasks: 
['matbench_dielectric',
 'matbench_jdft2d',
 'matbench_steels',
 'matbench_expt_gap',
 'matbench_phonons',
 'matbench_log_gvrh',
 'matbench_log_kvrh',
 'matbench_glass',
 'matbench_expt_is_metal',
 'matbench_perovskites',
 'matbench_mp_gap',
 'matbench_mp_is_metal',
 'matbench_mp_e_form']


In [4]:
for task in mb.tasks:
    task.load()
    with open(results_locs[task.dataset_name], "rb") as f:
        results = pickle.load(f)
        
    for fold_ind, fold in enumerate(task.folds):

        # Handle predictions that were made with joint/multitarget learning
        if task.dataset_name in target_key_map:
            predictions = results["predictions"][fold_ind][target_key_map[task.dataset_name]].values
        else:
            predictions = results["predictions"][fold_ind].values
        
        # Classification tasks must be recorded with labels and not group probabilities
        if task.metadata.task_type == CLF_KEY:
            predictions = predictions[:, 1] >= 0.5

        predictions = predictions.flatten()

        task.record(fold, predictions)
        
    if task.metadata.task_type == CLF_KEY:
        print(f"{task.dataset_name}: Accuracy score {task.scores['accuracy']['mean']}")
        print(f"{task.dataset_name}: ROC score {task.scores['rocauc']['mean']}")
    else:
        print(f"{task.dataset_name}: MAE {task.scores['mae']['mean']}")

    task.df = None

2021-10-08 10:32:02 INFO     Loading dataset 'matbench_dielectric'...


Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_dielectric.json.gz: 4764it [00:06, 752.56it/s] 

2021-10-08 10:32:09 INFO     Dataset 'matbench_dielectric loaded.
2021-10-08 10:32:09 INFO     Recorded fold matbench_dielectric-0 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_dielectric-1 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_dielectric-2 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_dielectric-3 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_dielectric-4 successfully.





matbench_dielectric: MAE 0.2969698688737498
2021-10-08 10:32:09 INFO     Loading dataset 'matbench_jdft2d'...


Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_jdft2d.json.gz: 636it [00:00, 1782.27it/s]

2021-10-08 10:32:09 INFO     Dataset 'matbench_jdft2d loaded.
2021-10-08 10:32:09 INFO     Recorded fold matbench_jdft2d-0 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_jdft2d-1 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_jdft2d-2 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_jdft2d-3 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_jdft2d-4 successfully.
matbench_jdft2d: MAE 34.53678641963336
2021-10-08 10:32:09 INFO     Loading dataset 'matbench_steels'...



Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_steels.json.gz: 0it [00:00, ?it/s]

2021-10-08 10:32:09 INFO     Dataset 'matbench_steels loaded.



Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_steels.json.gz: 0it [00:00, ?it/s]

2021-10-08 10:32:09 INFO     Recorded fold matbench_steels-0 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_steels-1 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_steels-2 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_steels-3 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_steels-4 successfully.
matbench_steels: MAE 96.21387590993324
2021-10-08 10:32:09 INFO     Loading dataset 'matbench_expt_gap'...



Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_expt_gap.json.gz: 0it [00:00, ?it/s]

2021-10-08 10:32:09 INFO     Dataset 'matbench_expt_gap loaded.





2021-10-08 10:32:09 INFO     Recorded fold matbench_expt_gap-0 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_expt_gap-1 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_expt_gap-2 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_expt_gap-3 successfully.
2021-10-08 10:32:09 INFO     Recorded fold matbench_expt_gap-4 successfully.
matbench_expt_gap: MAE 0.3470153653294551
2021-10-08 10:32:09 INFO     Loading dataset 'matbench_phonons'...


Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_dielectric.json.gz: 100%|##########| 4764/4764 [00:07<00:00, 641.42it/s] 
Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_jdft2d.json.gz: 100%|##########| 636/636 [00:00<00:00, 686.93it/s] 
Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_expt_gap.json.gz: 0it [00:00, ?it/s]
Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_phonons.json.gz: 1265it [00:00, 2213.85it/s]

2021-10-08 10:32:10 INFO     Dataset 'matbench_phonons loaded.
2021-10-08 10:32:10 INFO     Recorded fold matbench_phonons-0 successfully.
2021-10-08 10:32:10 INFO     Recorded fold matbench_phonons-1 successfully.
2021-10-08 10:32:10 INFO     Recorded fold matbench_phonons-2 successfully.
2021-10-08 10:32:10 INFO     Recorded fold matbench_phonons-3 successfully.
2021-10-08 10:32:10 INFO     Recorded fold matbench_phonons-4 successfully.
matbench_phonons: MAE 38.7524344203875
2021-10-08 10:32:10 INFO     Loading dataset 'matbench_log_gvrh'...



Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_phonons.json.gz: 100%|##########| 1265/1265 [00:00<00:00, 1462.01it/s]
Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_log_gvrh.json.gz: 10987it [00:05, 1852.14it/s]

2021-10-08 10:32:16 INFO     Dataset 'matbench_log_gvrh loaded.
2021-10-08 10:32:16 INFO     Recorded fold matbench_log_gvrh-0 successfully.
2021-10-08 10:32:16 INFO     Recorded fold matbench_log_gvrh-1 successfully.
2021-10-08 10:32:16 INFO     Recorded fold matbench_log_gvrh-2 successfully.
2021-10-08 10:32:16 INFO     Recorded fold matbench_log_gvrh-3 successfully.
2021-10-08 10:32:16 INFO     Recorded fold matbench_log_gvrh-4 successfully.





matbench_log_gvrh: MAE 0.07311620406947483
2021-10-08 10:32:16 INFO     Loading dataset 'matbench_log_kvrh'...


Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_log_gvrh.json.gz: 100%|##########| 10987/10987 [00:06<00:00, 1747.02it/s]
Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_log_kvrh.json.gz: 10987it [00:05, 1917.66it/s]

2021-10-08 10:32:22 INFO     Dataset 'matbench_log_kvrh loaded.
2021-10-08 10:32:22 INFO     Recorded fold matbench_log_kvrh-0 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_log_kvrh-1 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_log_kvrh-2 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_log_kvrh-3 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_log_kvrh-4 successfully.





matbench_log_kvrh: MAE 0.05477001646276852
2021-10-08 10:32:22 INFO     Loading dataset 'matbench_glass'...


Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_glass.json.gz: 0it [00:00, ?it/s]

2021-10-08 10:32:22 INFO     Dataset 'matbench_glass loaded.





2021-10-08 10:32:22 INFO     Recorded fold matbench_glass-0 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_glass-1 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_glass-2 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_glass-3 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_glass-4 successfully.
matbench_glass: Accuracy score 0.8676056338028169
matbench_glass: ROC score 0.8106763388737604
2021-10-08 10:32:22 INFO     Loading dataset 'matbench_expt_is_metal'...


Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_expt_is_metal.json.gz: 0it [00:00, ?it/s]

2021-10-08 10:32:22 INFO     Dataset 'matbench_expt_is_metal loaded.
2021-10-08 10:32:22 INFO     Recorded fold matbench_expt_is_metal-0 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_expt_is_metal-1 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_expt_is_metal-2 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_expt_is_metal-3 successfully.
2021-10-08 10:32:22 INFO     Recorded fold matbench_expt_is_metal-4 successfully.
matbench_expt_is_metal: Accuracy score 0.9160717675704676
matbench_expt_is_metal: ROC score 0.9160515032798082
2021-10-08 10:32:22 INFO     Loading dataset 'matbench_perovskites'...



Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_log_kvrh.json.gz: 100%|##########| 10987/10987 [00:06<00:00, 1679.38it/s]
Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_glass.json.gz: 0it [00:00, ?it/s]
Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_expt_is_metal.json.gz: 0it [00:00, ?it/s]
Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_perovskites.json.gz: 18928it [00:04, 4384.41it/s] 

2021-10-08 10:32:26 INFO     Dataset 'matbench_perovskites loaded.





2021-10-08 10:32:27 INFO     Recorded fold matbench_perovskites-0 successfully.
2021-10-08 10:32:27 INFO     Recorded fold matbench_perovskites-1 successfully.
2021-10-08 10:32:27 INFO     Recorded fold matbench_perovskites-2 successfully.
2021-10-08 10:32:27 INFO     Recorded fold matbench_perovskites-3 successfully.
2021-10-08 10:32:27 INFO     Recorded fold matbench_perovskites-4 successfully.
matbench_perovskites: MAE 0.09075423473752561
2021-10-08 10:32:27 INFO     Loading dataset 'matbench_mp_gap'...


Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_perovskites.json.gz: 100%|##########| 18928/18928 [00:08<00:00, 2118.95it/s]
Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_mp_gap.json.gz: 106113it [02:28, 713.75it/s] 

2021-10-08 10:34:56 INFO     Dataset 'matbench_mp_gap loaded.





2021-10-08 10:34:56 INFO     Recorded fold matbench_mp_gap-0 successfully.
2021-10-08 10:34:56 INFO     Recorded fold matbench_mp_gap-1 successfully.
2021-10-08 10:34:56 INFO     Recorded fold matbench_mp_gap-2 successfully.
2021-10-08 10:34:56 INFO     Recorded fold matbench_mp_gap-3 successfully.
2021-10-08 10:34:56 INFO     Recorded fold matbench_mp_gap-4 successfully.
matbench_mp_gap: MAE 0.21987236694632012
2021-10-08 10:34:58 INFO     Loading dataset 'matbench_mp_is_metal'...


Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_mp_gap.json.gz: 100%|##########| 106113/106113 [02:38<00:00, 669.50it/s] 
Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_mp_is_metal.json.gz: 106113it [02:25, 731.75it/s] 

2021-10-08 10:37:24 INFO     Dataset 'matbench_mp_is_metal loaded.
2021-10-08 10:37:24 INFO     Recorded fold matbench_mp_is_metal-0 successfully.





2021-10-08 10:37:24 INFO     Recorded fold matbench_mp_is_metal-1 successfully.
2021-10-08 10:37:24 INFO     Recorded fold matbench_mp_is_metal-2 successfully.
2021-10-08 10:37:24 INFO     Recorded fold matbench_mp_is_metal-3 successfully.
2021-10-08 10:37:24 INFO     Recorded fold matbench_mp_is_metal-4 successfully.
matbench_mp_is_metal: Accuracy score 0.8030506180286311
matbench_mp_is_metal: ROC score 0.7804643191398983
2021-10-08 10:37:26 INFO     Loading dataset 'matbench_mp_e_form'...


Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_mp_is_metal.json.gz: 100%|##########| 106113/106113 [02:33<00:00, 690.26it/s] 
Reading file /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_mp_e_form.json.gz: 132752it [02:59, 740.21it/s] 

2021-10-08 10:40:26 INFO     Dataset 'matbench_mp_e_form loaded.





2021-10-08 10:40:26 INFO     Recorded fold matbench_mp_e_form-0 successfully.
2021-10-08 10:40:26 INFO     Recorded fold matbench_mp_e_form-1 successfully.
2021-10-08 10:40:26 INFO     Recorded fold matbench_mp_e_form-2 successfully.
2021-10-08 10:40:26 INFO     Recorded fold matbench_mp_e_form-3 successfully.
2021-10-08 10:40:26 INFO     Recorded fold matbench_mp_e_form-4 successfully.
matbench_mp_e_form: MAE 0.044769163811452004


Decoding objects from /home/mevans/.local/conda/envs/modnet_matbench/lib/python3.8/site-packages/matminer/datasets/matbench_mp_e_form.json.gz: 100%|##########| 132752/132752 [03:09<00:00, 1081.19it/s]

In [5]:
mb.to_file("results.json.gz")

2021-10-08 10:58:16 INFO     Successfully wrote MatbenchBenchmark to file 'results.json.gz'.
