Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
MARC to RDF converter through yaml mapping
Ruby
Branch: master
Pull request Compare This branch is 454 commits behind digibib:master.

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
config
README
hamsun_fikset.mrc
mapping-marc-rdf.ods
normarc2rdf.rb
output.rdf

README

### MARC bibliographic record to RDF converter
### Ruby script using YAML mapping

# Purpose: Convert binary marc to semantic markup using yaml mapping file
# Creator: Benjamin Rokseth
# Date: 10.07.2011

## FILES ##
normarc2rdf.rb                            -- main ruby script to convert NORMARC file to RDF
config/config-dist.yml                    -- config file
config/mapping-normarc2rdf.yml            -- example mapping file: NORMARC tags to rdf mapping
config/mapping-normarc2rdf_bildebaser.yml -- example mapping file: image base in NORMARC
hamsun_fikset.mrc                         -- test NORMARC file
output.rdf                                -- test output RDF with -r 50 (50 records)
output.nt                                 -- test output NTRIPLES with -r 50 (50 records)

## MAPPING ##
uses yaml hashes mapping
excerpt:
-----
tag:
  "(650|690|693|699)": 
    subfield: 
      ? # initiates array
        - a
        - d
        - j
      : 
        predicate: http://purl.org/dc/terms/subject
        object: 
          regex: !ruby/regexp /[^\w\-ÆØÅæøå]/     # remove non-ascii characters
          split: !ruby/regexp /, */               # split objects by comma
          prefix: http://4store.deichman.no/subject/
          datatype: uri
        relation: 
          class: "http://www.w3.org/2004/02/skos/core#Concept"
          subfield:
            a: 
              predicate: "http://www.w3.org/2004/02/skos/core#prefLabel"
              object: 
                datatype: literal
                lang: :no
            x: 
              predicate: http://purl.org/dc/terms/subject
              object: 
                regex: !ruby/regexp /[^\w\-ÆØÅæøå]/
                prefix: http://4store.deichman.no/classification/
                datatype: uri
              relation: 
                class: http://bibpode.no/terms/DDKCode
 
------
## FEATURES ##
tag numbers can be regex (e.g. "^5(?!71)" for 500-599 minus 571)
all uris are exploded, and objects need prefix 
subfields can be arrays
relations can have subfields
objects can have language tags given as symbols (:se :en_UK etc)

## TODO ##
relation subfield relations should accept different classes
conditionals in yaml file? if/else
camelcase or underscored objects (requires extra string replace, will affect performance)

## REQUIREMENTS ##
ruby >= 1.8.7
ruby-marc (thanks to Ross Singer et.al.)
rdf.rb (thanks to Arto Bendiken et.al. for the brilliant RDF library for ruby)
rdf-rdfxml.rb (requires development libraries libxml2 and libxslt1)

## UBUNTU INSTALL ##
(for rdf-xml support)
sudo apt-get install libxml2-dev libxslt1-dev
gem install marc rdf rdf-rdfxml

## USAGE ##
normarc2rdf.rb -i input_file -o output_file [-r recordlimit]
  -i input_file must be marc binary
  -o output_file extension can be either .rdf (slooow) or .nt (very fast)
  -r [number] stops processing after given number of records
Something went wrong with that request. Please try again.