# Manipulating MARC Records: Examples

## Introduction & Resources

Many cultural heritage institutions hold inventory and bibliographic records in the *Machine Readable Cataloging* format, or [MaRC](https://en.wikipedia.org/wiki/MARC_standards). In this notebook, we will explore some basic uses of Python code to parse MaRC records, count and filter them, and then to transform them.

The examples assume installation and use of the ``pymarc`` python library, which is available on github at https://github.com/edsu/pymarc. This library was developed by Gabriel Farrell, Mark Matienzo, Geoffrey Spear, Ed Summers. 

The activities below use a file of MaRC records retrieved from the Project MUSE database representing all book titles received during the month of February 2018, as of February 28, 2018 (see [Project MUSE's Download MaRC Records](https://muse.jhu.edu/about/librarians/marc_records.html) for information). When this data was uploaded to this exercise, it included 706 records.


## Basic Demo

Let's start with this python code. This block shows how to open the MaRC file using pymarc, then look for the title field using the library's builtin ``title()`` function. Note that if you are familiar with MaRC format, you can ask for specific MaRC fields using the field number and identifiers to extract more granular information.


In [1]:
from pymarc import MARCReader

with open('/marc-python-tests/Project_MUSE_2018_Complete_20180228.mrc', 'rb') as fh: #NB: this open command must point to the location of the marc file in your system
  reader = MARCReader(fh)
  Titlecount = 0
  for record in reader:
    print(record.title())
    Titlecount = Titlecount + 1
    if Titlecount >= 10:
        break
  print('\nCounted',Titlecount,'Titles')


Race, Place, and Memory Deep Currents in Wilmington, North Carolina /
Race, Place, and Memory Deep Currents in Wilmington, North Carolina /
Taming the Tide of Capital Flows A Policy Guide /
Ancient Psychoactive Substances
La Patria Nueva EconomÃ­a, sociedad y cultura en el PerÃº, 1919-1930 /
La Patria Nueva EconomÃ­a, sociedad y cultura en el PerÃº, 1919-1930 /
The Bible and Early Trinitarian Theology
The Spirit of God Short Writings on the Holy Spirit /
Conflicted Memory Military Cultural Interventions and the Human Rights Era in Peru /
Conflicted Memory Military Cultural Interventions and the Human Rights Era in Peru /

Counted 10 Titles


The above block adapts the basic example presented in the PyMarc documentation. 

First, it imports the ``MARCReader`` function from the pymarc library. 

Next, it opens the file of MaRC records and puts the data into the ``reader`` variable.

A counter variable named ``Titlecount`` is set to 0.

Then, using a basic ``for`` loop, the code iterates through the title fields, prints the Title as a text string, and stops when the count reaches 10.

When you run the code block, you should see the following output: 
1. text strings of the 10 titles, 
2. a blank line, and 
3. a response that lists the Title count. 

## Filtering Duplicate Titles

You might notice that there are a lot of titles that appear to be duplicates. There may be a variety of reasons for this (different editions, multiple copies in holdings, etc). But let's say that we want to print what we think is a list of unique titles in order to establish a good idea of how many works are represented. How could we filter out the duplicates?

In this case we can use a variable to store the current title, the previous title, and to compare the two. We use a ``continue`` breakpoint to restart the loop if the title is already recorded. If we try this directly, we will break the code (below). Do you see what the error is? How can we get around this? 

In [2]:
from pymarc import MARCReader

with open('/marc-python-tests/Project_MUSE_2018_Complete_20180228.mrc', 'rb') as fh:
  reader = MARCReader(fh)
  Titlecount = 0
  for record in reader:
    titleCur = record.title()
    if titleCur == titlePrev:
        continue
    else:
        titlePrev = titleCur
        print(record.title())
        Titlecount = Titlecount + 1
        if Titlecount >= 10:
            break
  print('\nCounted',Titlecount,'Titles')

NameError: name 'titlePrev' is not defined

One way to fix this would be to use a try/except loop to establish the first variable. In other words, we will try the code, but if the ``titlePrev`` variable is not established, the code runs the except loop to establish it: 

In [3]:
from pymarc import MARCReader

with open('/marc-python-tests/Project_MUSE_2018_Complete_20180228.mrc', 'rb') as fh:
  reader = MARCReader(fh)
  Titlecount = 0
  for record in reader:
    titleCur = record.title()
    try:
        if titleCur == titlePrev:
            continue
        else:
            titlePrev = titleCur
            print(record.title())
            Titlecount = Titlecount + 1
            if Titlecount >= 10:
                break
    except:
        titlePrev = titleCur
        print(record.title())
        Titlecount = Titlecount + 1
        if Titlecount > 10:
            break
  print('\nCounted',Titlecount,'Titles')


Race, Place, and Memory Deep Currents in Wilmington, North Carolina /
Taming the Tide of Capital Flows A Policy Guide /
Ancient Psychoactive Substances
La Patria Nueva EconomÃ­a, sociedad y cultura en el PerÃº, 1919-1930 /
The Bible and Early Trinitarian Theology
The Spirit of God Short Writings on the Holy Spirit /
Conflicted Memory Military Cultural Interventions and the Human Rights Era in Peru /
Unaffordable American Healthcare from Johnson to Trump /
Tragic Rites Narrative and Ritual in Sophoclean Drama /
Globalizing Innovation State Institutions and Foreign Direct Investment in Emerging Economies /

Counted 10 Titles


Now, we have a list of the first 10 unique title strings. Of course, there may still be "the same" titles that sneak through if they were keyed in according to different conventsion, for example there might be some that use the trailing `/`, which accords with the AACR2 convention for entering data in the MaRC 240 field for title, but not all of the records in the set include this in the text string that we see in the output.

### Resources

* [PyMarc](https://github.com/edsu/pymarc)
* [Project MUSE MaRC Records](https://muse.jhu.edu/about/librarians/marc_records.html)