Skip to content

Official source codes for implementing "Quantitative Topic Analysis of Materials Science Literature Using Natural Language Processing"

Notifications You must be signed in to change notification settings

jwchoi95/matsciexp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MaterialsScienceExplorer

Quantitative Topic Analysis of Materials Science Literature Using Natural Language Processing

This study is a successful example 🌟 of applying natural language processing and unsupervised learning for materials science trend analysis. We applied the document embedding and density-based clustering to materials science literature. We obtained the comprehensive understanding of scientific topics in materials science without any insertion of expertise. We defined the topic relevance of each paper and identified main topics and academic interests of organisations in a quantitative and time-aware manner. This repository contains the source code and dataset for the following publication:

Dataset

Dataset used in this study is available, and unzip in your own directory.

  • temp_mat_abstract.txt : Sample of the abstract of materials science literature used in this study, which is published between 2017 and 2021.

  • temp_mat_bib.txt : Sample of the bibliographic information of the collected papers, of which columns are doi, title, journal, year.

Requirements

Our experiment setting is as follows:

  • gensim : 4.1.2

  • spacy : 3.2.4

  • hdbscan : 0.8.28

pip install -r requirements.txt

Run

python code/run.py -dataset <dataset>

Visualization

Topic Map of Materials Science

National Interests of Materials Science

Text-mined academic interests of Nature Materials, a randomly selected journal

Tutorial

Coming soon!

Citation

If you utilise our findings, methods, or results, please consider citing the following paper.

  • Choi, J., & Lee, B.* (2024). Quantitative Topic Analysis of Materials Science Literature Using Natural Language Processing Here

About

Official source codes for implementing "Quantitative Topic Analysis of Materials Science Literature Using Natural Language Processing"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages