# Adding ISO 639-3 language codes to MARC Records using OpenRefine Wikidata Reconciliation

Northwestern Libraries uses ExLibris Alma and Primo.

The Melville J. Herskovits Library of African Studies contains 220,729 physical items. Many of these are in African languages. MARC languages (also known as ISO 639-2) describes many African languages under their broad family name. These broad family codes (639-2) fail to capture the linguistic diversity of Africa. 

This is a list off all the African language families that appear in the Herskovits Library:

| Code | Name |
| ----------- | ----------- |
| afa | Afroasiatic (Other)
| ber | Berber (Other)
| bnt | Bantu (Other)
| cpe | Creoles and Pidgins, English-based (Other)
| cpf | Creoles and Pidgins, French-based (Other)
| cpp | Creoles and Pidgins, Portuguese-based (Other)
| crp | Creoles and Pidgins (Other)
| cus | Cushitic (Other)
| khi | Khoisan (Other)
| kro | Kru (Other)
| nic | Niger-Kordofanian (Other)
| ssa | Nilo-Saharan (Other)
| tai | Tai (Other)

Since the MARC language codes uses 639-2, this means that the languages that appears in the Primo display and filters are the broad language families. In this example we see that Niger-Kordofanian is the displayed language:
![Primo display](img/primo-example.JPG)

NIC (Niger-Kordofanian(Other)) is the language in the fixed field and the 041 (yellow), which is the source of the language in Primo. 

![MARC record](img/MARC-example.JPG)

The recent [Guidelines for the use of ISO 639-3 language codes in MARC records](https://www.loc.gov/aba/pcc/scs/documents/ISO-639-3-guidelines.pdf) make it possible to add the ISO-639-3 codes to MARC records. The 639-3 codes include information for more granular languages. There are 7,916 entries in the 639-3 list, compared to 487 on the 639-2 list.

To go back to the previous MARC example, the information about the granular language is in the MARC record in the title, notes, or subject headings (blue). We wanted to get this granular language information converted to a 639-3 code so it can be utilized in the Primo search display.

![MARC record](img/MARC-example.JPG)

The method I used to get this information from a note to a 639-3 code was OpenRefine Wikidata Reconciliation. This is a tool built into OpenRefine. 

First, the MARC records need to be converted to csv. To do this, I used the Export Tab Delimited feature in MarcEdit.
![MarcEdit](img/marcedit-export.JPG)

The fields exported contain possible information on the granular language and other import record infromation, like the unique identifier. 
Export fields: 001, 008, 041, 245, 246, 260, 264, 500, 650.
![MarcEdit](img/marcedit-export-fields.JPG)

This file can be uploaded into OpenRefine, cleaned, and reconciled with Wikidata in order to get ISO 639-3 language codes.