NameDisambiguation

Author Name Disambiguation by Clustering based on Deep Learned Pairwise Similarities

Course: 6CP project in Applied Deep Learning Collaborator: Philipp Gnoyke

Abstract

Name disambiguation in the field of scientific literature management is a rising issue. Authors might share the same name as other authors and distinguishing who is who is a challenging task. This report describes an intent to solve this problem by applying deep learning. We extracted a series of pairwise similarity measures from bibliographic metadata that were used for training and predicting with neural networks. The trained model predicts a likelihood of two papers belonging to the same person. We used the likelihoods to assign papers into existing clusters and to construct author profiles from scratch with agglomerative clustering. For pairwise classification, an F1 of 51.5 % was reached. For clustering from scratch and assigning into existing clusters, we reached clustering F1s of 48 % and 84.3 % respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
test_track1		test_track1
test_track2		test_track2
training_track1		training_track1
ApplyNeuralNetwork.py		ApplyNeuralNetwork.py
Cluster.py		Cluster.py
CorrectAuthorNames.py		CorrectAuthorNames.py
README.md		README.md
SimilarityFunctions.py		SimilarityFunctions.py
SimilarityFunctions.pyc		SimilarityFunctions.pyc
TextMining.py		TextMining.py
Track1TestPreprocessingAllInOne.py		Track1TestPreprocessingAllInOne.py
Track1TrainPreprocessing.py		Track1TrainPreprocessing.py
Track1ValidPreprocessing.py		Track1ValidPreprocessing.py
Track2TestPreprocessingAllInOne.py		Track2TestPreprocessingAllInOne.py
TrainNeuronalNet.py		TrainNeuronalNet.py
Unzip.py		Unzip.py
Utils.py		Utils.py
Zip.py		Zip.py
bestmodel.h5		bestmodel.h5
bestmodelNew.h5		bestmodelNew.h5
training_track1_file12.txt.zip		training_track1_file12.txt.zip
training_track1_file2.txt.zip		training_track1_file2.txt.zip
training_track1_file3.txt.zip		training_track1_file3.txt.zip
training_track1_file4.txt.zip		training_track1_file4.txt.zip
training_track1_file5.txt.zip		training_track1_file5.txt.zip
training_track1_file6.txt.zip		training_track1_file6.txt.zip
training_track1_file7.txt.zip		training_track1_file7.txt.zip
training_track1_file8.txt.zip		training_track1_file8.txt.zip
training_track1_file9.txt.zip		training_track1_file9.txt.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NameDisambiguation

Abstract

About

Releases

Packages

Languages

kaveenkumar/NameDisambiguation

Folders and files

Latest commit

History

Repository files navigation

NameDisambiguation

Abstract

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages