Skip to content

Reading MARC record files

haschart edited this page Jan 8, 2017 · 2 revisions

The MARC Multiplex Reader

After all of the command line options and their respective arguments the remainder of the command line specifies one or more MARC record files that are to be read and processed. These files can be:

  • Binary MARC records using the MARC8 character encoding
  • Binary MARC records using the UTF-8 character encoding
  • Binary MARC records using some other character encoding
  • MARCXML record files
  • MarcInJSON record files
  • MarcBreaker (.mrk) ASCII-encoded MARC record files as produced by MarcEdit
  • MarcBreaker (.mrk8) UTF-8 encoded ASCII MARC record files as produced by MarcEdit

The MarcReaderFactory is called for each filename to determine what specific type of MarcReader should be created. Then all of the MarcReaders are placed in a MarcMultiplexReader object that will return records from the first reader until it is empty and then switch to the next reader, and so on, until all records from all files have been read.

It might be possible to have multiple readers running in parallel, each sending records to the readQ, but the Marc4j classes are likely not thread-safe, and the reading of records doesn't seem to be the bottleneck in any case.