# Pymarc Patterns

[Pymarc Documentation](https://pymarc.readthedocs.io/en/latest/)

This notebook covers common patterns for working with MARC records in Python. It will start with the very basics like getting fields and move into more complex examples.

The example records come from [Harvard's bibliographic records](https://archive.org/download/harvard_bibliographic_metadata) on the Internet Archive.

## Reading Records from Files

Use the `MARCReader` class to read records from a file, it accepts an open file handle and returns an iterator of `Record` objects. Make sure each record is not `None` when iterating.

In [14]:
from pymarc import MARCReader

with open('assets/example.mrc', 'rb') as fh:
    reader = MARCReader(fh)
    for record in reader:
        if record:
            print("Got record:", record.title.rstrip(' /'))
            # save this global variable for use in later code blocks
            global venetian
            venetian = record
        else:
            print("No record found.")

Got record: Photographs of Venetian villas


Note that the file is opened in read-binary mode (`rb`). Read mode is sufficient because we are not modifying the file. We use binary mode because Pymarc will handle decoding strings in records, we don't want Python to do it. Try deleting the `b` in `rb`—what happens? What would happen if we didn't have an `if record` condition?

There are several gotchas you can run into with encoding issues. The [`MARCReader`](https://pymarc.readthedocs.io/en/latest/#pymarc.reader.MARCReader) class has a `to_unicode` parameter to return UTF-8 strings as well as a `force_utf8` parameter which coerces the data to UTF-8 (useful if you have records with inaccurate encodings). These parameters seemed to be more commonly needed under Python 2 where it was more work to manage string encodings.



## Simple Ways to View Record Data

Pymarc comes with convenience properties for accessing common MARC fields on a record:

- `record.title`
- `record.author`
- `record.isbn` and `record.issn`
- `record.publisher`
- `record.pubyear`

Also, if you print a record, its string representation is the "mnemonic marc" format, which is a human-readable version of the MARC data where each field is printed on a new line with its tag, indicators, and subfields visible.

In [18]:
print('Title:', venetian.title)
print('Author:', venetian.author)
print('ISBN:', venetian.isbn)
print('Publisher:', venetian.publisher)
print('Year:', venetian.pubyear)
print(venetian)

Title: Photographs of Venetian villas /
Author: None
ISBN: None
Publisher: The Institute,
Year: 1954.
=LDR  00774nam a22002057u 4500
=001  000000010-8
=005  20020606090541.3
=008  821202|1954\\\\|||||||\\||||\|0||||eng|d
=035  0\$aocm78684367
=245  00$aPhotographs of Venetian villas /$cRoyal Institute of British Architects ; detailed information compiled by Giuseppe Mazzotti.
=246  3\$aVenetian villas.
=260  0\$aLondon, England :$bThe Institute,$c1954.
=300  \\$a39 p. ;$c21 cm.
=500  \\$aCover title : Venetian villas.
=500  \\$aCatalogue of an exhibition held at Royal Institute of British Architects, Feb. 25-Mar. 27, 1954.
=650  \0$aArchitecture, Domestic$zItaly$zVenice.
=700  1\$aMazzotti, Giuseppe.
=710  2\$aRoyal Institute of British Architects.
=988  \\$a20020608
=906  \\$0MH



As you can see, the properties are set to `None` if they do not exist, like the ISBN in our example.

For serious work with records, **you should not use these convenience properties**. They only find the first instance of a field and return the text of certain subfields. They are useful for quick peaks at data, but not functional for most purposes. Instead, we will typically use the `get_fields()` method and iterate over all existing fields. There are also `record.series`, `record.subjects`, `record.physicaldescription` (all 300 fields), and `record.notes` (all 5XX fields) properties. Since these return a list of actual `Field` objects, they are more useful, though we should be careful that they're using the MARC fields we care about.

## Writing Records

The basic steps to modify MARC records with Pymarc are:

- Read the records in with `MARCReader`
- Modify them in place—you can simply assign values to fields and subfields
- Write the records out with `MARCWriter`

Below, we prefix the example record's title field with "Great".

In [26]:
from pymarc import MARCReader, MARCWriter

with open('assets/example.mrc', 'rb') as fh:
    reader = MARCReader(fh)
    with open('assets/great.mrc', 'wb') as out:
        writer = MARCWriter(out)
        for record in reader:
            if record:
                record["245"]["a"] = f'Great {record["245"]["a"]}'
                writer.write(record)
                print(record.title.rstrip(' /'))

Great Photographs of Venetian villas


## Viewing Fields

There are three better ways to retrieve fields from a record than the convenience properties:

- `get` which is like `dict`'s `get` method in that it lets you define a default value if the field doesn't exist
- `get_fields` which returns a list of fields with a given tag
- bracket notation, which returns the first field with a given tag

In general, `get_fields` is probably the most foolproof method.

For all methods, field names are strings, not numbers. Fields that begin with a 0, like 020, would be awkward otherwise. We will talk more about `Field` objects later but below notice we must use the `value()` method to return a string representation of the fields.

In [53]:
# record.get(field, default)
# if we can't find a uniform title in 130 return 245
print('Title with a fallback')
title = venetian.get('130', venetian['245'])
print(title.value())

# record.get_fields(field) -> list[Field]
print('\n500 fields:')
for field in venetian.get_fields('500'):
    print(field.value())

# you can pass multiple fields to get a list of all of them
print('\nAll 2XX fields:')
for field in venetian.get_fields('245', '246', '260'):
    print(field.value())
# you could pass a list & use * to unpack it, too
# venetian.get_fields(*['245', '246', '260']):

# though there are two 500s fields, bracket notation only returns the first one
print('\nFirst 500:', venetian['500'].value())

# if you try to access a field that doesn't exist, we get a KeyError
print('\nKeyError from accessing a non-existent field:')
print(venetian['999'])

Title with a fallback
Photographs of Venetian villas / Royal Institute of British Architects ; detailed information compiled by Giuseppe Mazzotti.

500 fields:
Cover title : Venetian villas.
Catalogue of an exhibition held at Royal Institute of British Architects, Feb. 25-Mar. 27, 1954.

All 2XX fields:
Photographs of Venetian villas / Royal Institute of British Architects ; detailed information compiled by Giuseppe Mazzotti.
Venetian villas.
London, England : The Institute, 1954.

First 500: Cover title : Venetian villas.

KeyError from accessing a non-existent field:


KeyError: 

Because bracket notation throws errors if the field doesn't exist, and only returns the first instance of a field, it's not very useful. It seems like a rare scenario to have a default field value handy for use with `get`. In general, using `get_fields` with for-in loops (which will simply not execute if there are no matching fields) seems the best method.

## Field Objects

https://pymarc.readthedocs.io/en/latest/#module-pymarc.field

In [None]:
for field in venetian.get_fields('500'):
    pass

# add_subfield()
# format_field() is like a better value()
# subfields_as_dict()?

## Leader Field

https://pymarc.readthedocs.io/en/latest/#pymarc.leader.Leader

## Modifying Fields

...

In [None]:
# copy field from record
# create field object from scratch