Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MatKG #1368

Open
vinven7 opened this issue Feb 18, 2024 · 1 comment
Open

MatKG #1368

vinven7 opened this issue Feb 18, 2024 · 1 comment
Assignees
Labels

Comments

@vinven7
Copy link

vinven7 commented Feb 18, 2024

Describe the benchmark KG dataset:

This is a dataset of materials, their applications, properties, characterization methods, synthesis methods, descriptors, and symmetry labels extracted from over 5 Million scientific publications in material science. It has over 6 million triples covering 70,000 entities.

E.g., what domain does this benchmark KG cover? Are there any special parts besides triples?

The domain is Material Science. Each triple has another field called 'count', which is a numerical parameter that is a weight. It is the number of documents in which the triples appear.

Dataset is pre-stratified into train/test/valid [ ]

NO

Publication(s) or website describing benchmark KG dataset:
https://www.nature.com/articles/s41597-024-03039-z.pdf

URL for downloading the benchmark KG dataset:
https://zenodo.org/records/10144972

@mberr
Copy link
Member

mberr commented Feb 19, 2024

Hi @vinven7 ,

There is currently no support for triple weights, although there has been interest in adding support for this, see #1142.

As a temporary workaround, it may be possible to add a user-defined count threshold to the KG dataset that would discard triples below a certain count. This way, the resulting KG would be in a format compatible with all existing structures.

We would be happy to accept a PR, especially since you seem to be involved in the creation of this KG. If you would like to participate in adding support for triple weights, we would also appreciate your contribution, but this seems to be a larger effort than the baseline I suggested above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants