Skip to content
Huma Zafar edited this page May 11, 2020 · 6 revisions

Getting Started

Installation

See the README for installation instructions for Maven and Gradle.

Basic parsing

There is currently only one parser available, for title statement punctuation. It can be used as follows:

val t = TitleStatementParser()
t.parse(title)

The input is an ISBD-punctuated string, and the output is a list of title statements contained within the string. For example, if the input were the string, Die Klaviersonaten [sound recording] = The piano sonatas ; Tänze = Dances : complete recording / Franz Schubert., the result would be the following:

listOf(
    TitleStatement(
        titles = listOf(
            Title(
                titleProper = TitleProper("Die Klaviersonaten [sound recording]"),
                parallelTitles = listOf(ParallelTitle("The piano sonatas"))
            ),
            Title(
                titleProper = TitleProper("Tänze"),
                parallelTitles = listOf(ParallelTitle("Dances")),
                otherInfo = listOf(OtherInfo("complete recording"))
            )
        ),
        sors = listOf(SOR("Franz Schubert"))
    )
)

Parsing ambiguous strings

Titles which contain periods within the string (not as ending punctuation) can be ambiguous to parse, as periods are also used by the ISBD grammar to structure title information. For example, consider the following string:

Trio for violin, cello, and piano, in D minor, op. 11 (posth.) / 
Fanny Mendelssohn. Tarantella, op. 6 / Saint-Saens. Paganiniana / Arr. 
[by] Elayakim Taussig [sound recording].

It's probably impossible to disambiguate which occurrences of ". " are part of the title data (such as in "op. 11"), and which are being used to separate individual title statements. The parse method will return the first full parse it finds, but this may not be correct in the presence of periods, so there are two other parse methods provided for such cases: parseHeuristically and parseAll. The first, parseHeuristically, considers multiple possible interpretations of the periods in the title and returns the first one it finds that passes a "good parse" heuristic. Currently, the heuristics are extremely simplistic, and it is a goal of future releases to improve them. The parseAll method returns all possible successful parses of the given string.

To illustrate:

val title = "Trio for violin, cello, and piano, in D minor, op. 11 (posth.)" +
                " / Fanny Mendelssohn. Tarantella, op. 6 / Saint-Saens. Paganiniana" +
                " / Arr. [by] Elayakim Taussig [sound recording]."

val t = TitleStatementParser()
t.parseHeuristically(title)

would produce:

listOf(
    TitleStatement(
        titles = listOf(
            Title(
                titleProper = TitleProper("Trio for violin, cello, and piano, in D minor, op 11 (posth)")
            )
        ),
        sors = listOf(SOR("Fanny Mendelssohn"))
    ),
    TitleStatement(
        titles = listOf(
            Title(
                titleProper = TitleProper("Tarantella, op 6")
            )
        ),
        sors = listOf(SOR("Saint-Saens"))
    ),
    TitleStatement(
        titles = listOf(
            Title(
                titleProper = TitleProper("Paganiniana")
            )
        ),
        sors = listOf(SOR("Arr [by] Elayakim Taussig [sound recording]"))
    )
)

Note that the periods are stripped out in the response. This is a known consequence of the current approach and will hopefully be improved in the future.

Clone this wiki locally