Skip to content

klauslippert/person-name-normalisation

Repository files navigation

Person Name Normalisation

DOI

Unifying person names in different notations

different sources write person names in different notations:

  • Firstname Secondname Lastname
  • Lastname, Firstname Secondname

also extracted are:

  • academic degrees (e.g. 'Dr.', 'Ph.D.')
  • name prefixes (e.g. 'van ter', 'von', 'De')

included: german, french, italian, dutch

missing: spanish, portuguese

missing: double Lastnames in Spanish

Installation

pip install personnamenorm

Usage

import personnamenorm as pnn
nameobj = pnn.namenorm('Dr. Dipl. Firstname Secondname von und zu Lastname')
results in
nameobj.name <dict>
{
    'raw': 'Dr. Dipl. Firstname von und zu Lastname',
    'firstname': ['Firstname','Secondname'],
    'lastname': ['Lastname'],
    'title': ['Dr.','Dipl.'],
    'prefix': ['von und zu']
}

nameobj.fullname <str>
'von und zu Lastname, Firstname Secondname'

nameobj.fullname_abbrev <str>
'von und zu Lastname, F S'

more examples can be found in this file on github.

Debug-mode

by default debug mode is off.

activating the debug mode

nameobj = pnn.namenorm(<str>, True)

returns additional information as logging message.

  • used annotation dictionary
  • annotated input string as list of tuples

Logging

logging is implemented

  • writes to std-out if logging IS NOT enabled before
  • writes to the existing logging handler if other logging IS enabled before

Test

see folder 'tests' on github.

python test_personnamenorm.py