Materials informatics (MI) research, which is the discovery of new materials through machine learning (ML) using large-scale material data, has attracted considerable attention in recent years. However, in general, the large-scale material data used in MI are biased owing to differences in the targeted material domains. Moreover, most studies on MI have not clearly demonstrated the influence of data bias on ML models. In this study, we clarify the influence of data bias on ML models by combining the concept of the applicability domain and clustering for large-scale experimental property data in the Starrydata2 material database previously developed by our group.
- Docker
- Docker Compose
Run the following commands in a terminal.
cd YOUR_WORKSPACE
git clone https://github.com/kumagallium/matCL-knnAD.git
cd matCL-knnAD
docker-compose build
docker-compose up
You can open jupyterlab by accessing the following URL.
Please cite the following work if you want to use matCL-knnAD.
@article{kumagai2022effects,
title={Effects of data bias on machine-learning--based material discovery using experimental property data},
author={Kumagai, Masaya and Ando, Yuki and Tanaka, Atsumi and Tsuda, Koji and Katsura, Yukari and Kurosaki, Ken},
journal={Science and Technology of Advanced Materials: Methods},
volume={2},
number={1},
pages={302--309},
year={2022},
publisher={Taylor \& Francis}
}
URL: https://www.tandfonline.com/doi/full/10.1080/27660400.2022.2109447
- Fork it (
git clone https://github.com/kumagallium/matCL-knnAD.git
) - Create your feature branch (
git checkout -b your-new-feature
) - Commit your changes (
git commit -am 'feat: add some feature'
) - Push to the branch (
git push origin your-new-feature
) - Create a new Pull Request
This work was supported by JSPS KAKENHI Grant Number JP20K22466.
This software was primarily written by Assistant Professor Masaya Kumagai at Kyoto University.
This codes are released under the MIT License.