`marc-matcher` - a macro for working with MARC data #4

hzafar · 2021-07-08T01:54:13Z

Please enter the bee by submitting code (or links to code) for:

your macro
an example use of your macro
(optional) "before" code that your macro helps to improve

Thank you for your submission!

If your entry is a PR to the syntax parse examples repository, please include a link to the PR.

Macro

This is a very domain-specific macro, developed for a particular bibliographic metadata use-case. The macro definition itself is given below, and the required files containing helper definitions have been attached to this issue.

#lang racket

(require syntax/parse/define
         "marc-matcher-syntax-classes.rkt"
         "marc-matcher-helpers.rkt")

(define-syntax (marc-matcher stx)
  (syntax-parse stx
    [(_ (var:marc-var-defn ...) body:expr ...)
     (define params #'(var.name ...))
     (define regexps #'(var.re ...))
     #`(λ (input [sep "$"])
         (define args (get-subfield-data '#,regexps input sep))
         (apply (λ #,params (begin body ...)) (map simplify-groups args)))]))

This macro aims to make it easier to do regex-like matching over a structured bibliographic data format known as MARC 21. MARC records contain a sequence of fields whose data are string values that look like this:

$aCarroll, Lewis,$d1832-1898,$eauthor.

In each field, individual subfields are separated using a separator character (in this case $); the character immediately following the separator is called the subtag; and the substring upto the next separator or end-of-string is the subfield data. So in the example above, there are three subfields, $a, $d, and $e, whose data are, respectively, Carroll, Lewis,, 1832-1898,, and author..

Parsing subfields out of this is often done using regular expressions, but it gets really difficult when trying to deal with subfield repetitions. I'll use field 264 to illustrate. This field mainly contains the following pieces of publication information: the $a subfield contains place of publication; the $b contains the entity responsible for publication; and the $c contains the date of publication. There are several possible repetition patterns for these subfields which require different semantic interpretations. To give a few examples:

a+bc: multiple places of publication with the same publisher
- $aLondon ;$aNew York :$bRoutledge,$c2017.[1]
ab+c: multiple publishers with the same place of publication
- $aNew York, NY :$bBarnes & Noble :$bSterling Publishing Co., Inc.,$c2012.[2]
(ab)+c: multiple publications, each with different places and publishers
- $aBoston :$bLee and Shepard, publishers ;$aNew York :$bLee, Shepard, and Dillingham,$c1872.[3]

Writing a regex to intelligently parse this information out of the string is a pain, but regexes are an already popular and well understood tool in the metadata community. Thus, marc-matcher lets users specify regular expressions that match subgroups within the field they want to parse, and define variables they can use in their code containing the results of those matches, which allows more complex kinds of processing to be done with simpler code.

Example

Illustrate one or more ways of using your macro.
Please show code and briefly describe what it does.

This example defines a lambda called parse-264 using marc-matcher:

(define parse-264
  (marc-matcher ([#px"ab" #:as place-entity-groups]
                 [#px"c" #:as date])
                (for/list ([group place-entity-groups])
                  (cons (subfield-data date) (map subfield-data group)))))

The first clause of the marc-matcher expression is a list of variable definitions, similar to a parameter list for a lambda. For example, [#px"ab" #:as place-entity-groups] defines a variable called place-entity-groups, which will be a list of all the groups (which are themselves lists of structs) consisting of a single subfield $a followed by a single subfield $b. The second clause is the computation the user wishes to do with the values extracted from the field, and can refer to the variables defined in the first clause.

The parse-264 function above can then be used as follows:

> (parse-264 "$aBoston :$bLee and Shepard, publishers ;$aNew York :$bLee, Shepard, and Dillingham,$c1872.")
'(("1872." "Boston :" "Lee and Shepard, publishers ;") ("1872." "New York :" "Lee, Shepard, and Dillingham,"))

Here is another example, using table of contents data[4]:

> ((marc-matcher ([#px"tr?" #:as title-info-groups])
               (for ([group title-info-groups])
                 (define title (first (map subfield-data
                                           (filter (λ (sf) (equal? "t" (subfield-subtag sf))) group))))
                 (define authors (map subfield-data
                                      (filter (λ (sf) (equal? "r" (subfield-subtag sf))) group)))
                 (printf "Title: ~a~a~n~n" (string-trim title #px"( /\\s*)|( --\\s*)|\\.")
                         (if (empty? authors) "" (string-append "\nAuthor: "
                                                                (string-trim (first authors)
                                                                             #px"( /\\s*)|( --\\s*)|\\."))))))               
 (string-join '("$tCaveat Lector; or how I ransacked Wikipedias across the Multiverse soley "
                "to amuse and edify readers -- $tMystery of the missing mothers / $rKristin King -- "
                "$tSecrets of Flatland / $rAnne Toole -- $tSanyo TM-300 Home-Use Time Machine / "
                "$rJeremy Sim -- $tElizabeth Burgoyne Corbett / $rL. Timmel Duchamp -- "
                "$tBiographies.") ""))
Title: Caveat Lector; or how I ransacked Wikipedias across the Multiverse soley to amuse and edify readers

Title: Mystery of the missing mothers
Author: Kristin King

Title: Secrets of Flatland
Author: Anne Toole

Title: Sanyo TM-300 Home-Use Time Machine
Author: Jeremy Sim

Title: Elizabeth Burgoyne Corbett
Author: L. Timmel Duchamp

Title: Biographies

Before and After

If you designed your macro to improve some existing code, please explain the improvements.

Use the following categories if applicable:

Code Cleaning : Please share the code that you used to write before creating your macro. Briefly explain how the code works.

Macro Engineering : Please share the old macro that you revised. Briefly explain the changes.

This would probably count as a code cleaning macro, though the before code doesn't exist (because I've not previously done this kind of metadata work in Racket).

Licence

Please confirm that you are submitting this code under the same MIT License that the Racket language uses. https://github.com/racket/racket/blob/master/racket/src/LICENSE-MIT.txt
Please confirm that the associated text is licensed under the Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/

I confirm that the code is under the same MIT license as the Racket language, and associated text is under Creative Commons Attribution 4.0 International License

Contact

To receive prizes and/or provide feedback please complete
the form at https://forms.gle/Z5CN2xzK13dfkBnF7 (google account not required / email optional).

The text was updated successfully, but these errors were encountered:

spdegabrielle · 2021-07-25T20:59:37Z

Awesome! Now I need a z39.50 client!

hzafar · 2021-07-26T14:03:59Z

A Racket one would be nice! 😆

spdegabrielle · 2021-07-26T14:32:03Z

if only I had time - and I switched from libraries to health about 12 years ago so I'm into HL7 instead of MARC21 now.

There is an ASN.1 Library if it is of interest
https://docs.racket-lang.org/asn1

spdegabrielle · 2021-07-30T07:54:07Z

Thank you for your contribution!

If you haven’t already please take the time to fill in the form https://forms.gle/Z5CN2xzK13dfkBnF7

Bw
Stephen

@hzafar

from syntax-objects/Summer2021#4 cc @hzafar

@hzafar

from syntax-objects/Summer2021#4 cc @hzafar

@hzafar

from syntax-objects/Summer2021#4 cc @hzafar

bennn added a commit to syntax-objects/syntax-parse-example that referenced this issue Sep 28, 2021

add marc-matcher

8ae310d

from syntax-objects/Summer2021#4 cc @hzafar

bennn mentioned this issue Sep 28, 2021

add marc-matcher syntax-objects/syntax-parse-example#33

Merged

bennn added a commit to syntax-objects/syntax-parse-example that referenced this issue Oct 27, 2021

add marc-matcher

4bdb683

from syntax-objects/Summer2021#4 cc @hzafar

bennn added a commit to syntax-objects/syntax-parse-example that referenced this issue Oct 27, 2021

add marc-matcher

069aa56

from syntax-objects/Summer2021#4 cc @hzafar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`marc-matcher` - a macro for working with MARC data #4

`marc-matcher` - a macro for working with MARC data #4

hzafar commented Jul 8, 2021

spdegabrielle commented Jul 25, 2021

hzafar commented Jul 26, 2021

spdegabrielle commented Jul 26, 2021

spdegabrielle commented Jul 30, 2021

marc-matcher - a macro for working with MARC data #4

marc-matcher - a macro for working with MARC data #4

Comments

hzafar commented Jul 8, 2021

Macro

Example

Before and After

Licence

Contact

spdegabrielle commented Jul 25, 2021

hzafar commented Jul 26, 2021

spdegabrielle commented Jul 26, 2021

spdegabrielle commented Jul 30, 2021

`marc-matcher` - a macro for working with MARC data #4

`marc-matcher` - a macro for working with MARC data #4