-
-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Please enter the bee by submitting code (or links to code) for:
- your macro
- an example use of your macro
- (optional) "before" code that your macro helps to improve
Thank you for your submission!
If your entry is a PR to the syntax parse examples repository, please include a link to the PR.
Macro
This is a very domain-specific macro, developed for a particular bibliographic metadata use-case. The macro definition itself is given below, and the required files containing helper definitions have been attached to this issue.
#lang racket
(require syntax/parse/define
"marc-matcher-syntax-classes.rkt"
"marc-matcher-helpers.rkt")
(define-syntax (marc-matcher stx)
(syntax-parse stx
[(_ (var:marc-var-defn ...) body:expr ...)
(define params #'(var.name ...))
(define regexps #'(var.re ...))
#`(λ (input [sep "$"])
(define args (get-subfield-data '#,regexps input sep))
(apply (λ #,params (begin body ...)) (map simplify-groups args)))]))
This macro aims to make it easier to do regex-like matching over a structured bibliographic data format known as MARC 21. MARC records contain a sequence of fields whose data are string values that look like this:
$aCarroll, Lewis,$d1832-1898,$eauthor.
In each field, individual subfields are separated using a separator character (in this case $
); the character immediately following the separator is called the subtag; and the substring upto the next separator or end-of-string is the subfield data. So in the example above, there are three subfields, $a
, $d
, and $e
, whose data are, respectively, Carroll, Lewis,
, 1832-1898,
, and author.
.
Parsing subfields out of this is often done using regular expressions, but it gets really difficult when trying to deal with subfield repetitions. I'll use field 264 to illustrate. This field mainly contains the following pieces of publication information: the $a
subfield contains place of publication; the $b
contains the entity responsible for publication; and the $c
contains the date of publication. There are several possible repetition patterns for these subfields which require different semantic interpretations. To give a few examples:
a+bc
: multiple places of publication with the same publisher$aLondon ;$aNew York :$bRoutledge,$c2017.
[1]
ab+c
: multiple publishers with the same place of publication$aNew York, NY :$bBarnes & Noble :$bSterling Publishing Co., Inc.,$c2012.
[2]
(ab)+c
: multiple publications, each with different places and publishers$aBoston :$bLee and Shepard, publishers ;$aNew York :$bLee, Shepard, and Dillingham,$c1872.
[3]
Writing a regex to intelligently parse this information out of the string is a pain, but regexes are an already popular and well understood tool in the metadata community. Thus, marc-matcher
lets users specify regular expressions that match subgroups within the field they want to parse, and define variables they can use in their code containing the results of those matches, which allows more complex kinds of processing to be done with simpler code.
Example
Illustrate one or more ways of using your macro.
Please show code and briefly describe what it does.
This example defines a lambda called parse-264
using marc-matcher
:
(define parse-264
(marc-matcher ([#px"ab" #:as place-entity-groups]
[#px"c" #:as date])
(for/list ([group place-entity-groups])
(cons (subfield-data date) (map subfield-data group)))))
The first clause of the marc-matcher
expression is a list of variable definitions, similar to a parameter list for a lambda. For example, [#px"ab" #:as place-entity-groups]
defines a variable called place-entity-groups
, which will be a list of all the groups (which are themselves lists of structs) consisting of a single subfield $a
followed by a single subfield $b
. The second clause is the computation the user wishes to do with the values extracted from the field, and can refer to the variables defined in the first clause.
The parse-264
function above can then be used as follows:
> (parse-264 "$aBoston :$bLee and Shepard, publishers ;$aNew York :$bLee, Shepard, and Dillingham,$c1872.")
'(("1872." "Boston :" "Lee and Shepard, publishers ;") ("1872." "New York :" "Lee, Shepard, and Dillingham,"))
Here is another example, using table of contents data[4]:
> ((marc-matcher ([#px"tr?" #:as title-info-groups])
(for ([group title-info-groups])
(define title (first (map subfield-data
(filter (λ (sf) (equal? "t" (subfield-subtag sf))) group))))
(define authors (map subfield-data
(filter (λ (sf) (equal? "r" (subfield-subtag sf))) group)))
(printf "Title: ~a~a~n~n" (string-trim title #px"( /\\s*)|( --\\s*)|\\.")
(if (empty? authors) "" (string-append "\nAuthor: "
(string-trim (first authors)
#px"( /\\s*)|( --\\s*)|\\."))))))
(string-join '("$tCaveat Lector; or how I ransacked Wikipedias across the Multiverse soley "
"to amuse and edify readers -- $tMystery of the missing mothers / $rKristin King -- "
"$tSecrets of Flatland / $rAnne Toole -- $tSanyo TM-300 Home-Use Time Machine / "
"$rJeremy Sim -- $tElizabeth Burgoyne Corbett / $rL. Timmel Duchamp -- "
"$tBiographies.") ""))
Title: Caveat Lector; or how I ransacked Wikipedias across the Multiverse soley to amuse and edify readers
Title: Mystery of the missing mothers
Author: Kristin King
Title: Secrets of Flatland
Author: Anne Toole
Title: Sanyo TM-300 Home-Use Time Machine
Author: Jeremy Sim
Title: Elizabeth Burgoyne Corbett
Author: L. Timmel Duchamp
Title: Biographies
Before and After
If you designed your macro to improve some existing code, please explain the improvements.
Use the following categories if applicable:
- Code Cleaning : Please share the code that you used to write before creating your macro. Briefly explain how the code works.
- Macro Engineering : Please share the old macro that you revised. Briefly explain the changes.
This would probably count as a code cleaning macro, though the before code doesn't exist (because I've not previously done this kind of metadata work in Racket).
Licence
Please confirm that you are submitting this code under the same MIT License that the Racket language uses. https://github.com/racket/racket/blob/master/racket/src/LICENSE-MIT.txt
Please confirm that the associated text is licensed under the Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/
I confirm that the code is under the same MIT license as the Racket language, and associated text is under Creative Commons Attribution 4.0 International License
Contact
To receive prizes and/or provide feedback please complete
the form at https://forms.gle/Z5CN2xzK13dfkBnF7 (google account not required / email optional).