Skip to content

pulibrary/lcsort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Normalized sort key for sorting Library of Congress call numbers.

Usage

Sorting Library of Congress call numbers is tricky. This library generates a sort key for a LC call number, such that for a list of callnums, their sort keys will sort (natural byte order) in the same order the call numbers should sort in.

# It's often useful to store the sort_key in a db
sort_key = Lcsort.normalize(callnum)

If the input can't be recognized as an LC Call Number, nil will be returned.

This code is intended for ascii-only input, if you have UTF-8 in your call numbers, we don't know what will happen.

# Or if you have a list of call numbers in memory, easy
# enough to just sort them in memory:
call_num_array.sort_by {|callnum| Lcsort.normalize(callnum) }

We can handle all sorts of call numbers

Call numbers are diverse, both in standard LC and local practice. We wouldn't have the hubris to say we can properly recognize and sort EVERY possible LC call number including local practices. But we sure can handle a lot, including:

  • Typical call numbers like: R 169.1 .B59 1990
  • Can handle variations in spacing/punctuation, such as: R 169.B59.C39, R169 B59C39 1990
  • Can handle properly sorting the dreaded 'date or other number': KF 4558 15th .G6 sorts after KF 4558 1st .G6
  • Will generally sort volume/number info in call number suffix properly: Q11 .P6 vol. 4 no. 4 sorts before Q11 .P6 vol. 12 no. 1.
  • Can handle 1-2 letter suffixes on the end of cutters: R 179 .C79ab. Common local practice, and also used in NLM call numbers. (No guarantee that every NLM call number can be handled by this library for LC call numbers, but it seems to work okay for NLM.)

OCLC's docs on MARC 050 includes some information on possible LC call number components.

Range and truncation limiting

Once you have a bunch of Lcsort keys in your database, you may want to search to find all call numbers beginning with, say, EG 101. So that might include EG 101.5, EG 101 .C23 1990 etc.

The truncated_range_end method gives you a proper ending range to get what you want, say:

sort_key >= #{Lcsort.normalize("EG 101")} AND sort_key <= #{Lcsort.truncated_range_end('EG 101')}

This can also be used for finding a range of call numbers. Say you want all call numbers from those beginning with AB 101 to AB 500:

sort_key >= #{Lcsort.normalize("AB 101")} AND sort_key <= #{Lcsort.truncated_range_end('AB 500')}

truncated_range_end works with as many or as few call number components as you want. Lcsort.truncated_range_end('AB 101.1') will find AB 101.123 or AB 101.1 .A5 too. Lcsort.truncated_range_end("AB 101 .C45") will find AB 101 .C456, AB 101 .C45 .B5, etc.

At the moment, truncated_range_end actually pretty much just adds an ~ onto the end of the normalized sort key. But it did more complicated things in past versions of the normaliation algorithm, and we do have tests ensuring it finds what is expected.

append_suffix

Sometimes you want to add something on to the end of a normalized call number, as a payload, or to ensure normalized sort key uniqueness.

You can pass an :append_suffix to have it appended in a way that won't otherwise change the sort order of the original call number.

I use this to add the bib ID on to the end of the normalized sort key, because if two bibs have identical call numbers, I want to avoid normalized sort key collision, because my functions work better with all unique sort keys.

 sortkey = Lcsort.normalize(callnumber, :append_suffix => bibID)

Acknowledgement

Original regex and code by Bill Dueber. Original port to ruby by Nikitas Tampakis. LC handling advice from Naomi Dushay and her code.