Skip to content

A high performance, parallel parser of CrossRef OAI-PMH / unixref metadata

Notifications You must be signed in to change notification settings

scraping-xx/cayenne

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cayenne

Cayenne aims to be a high performance, parallel parser of CrossRef OAI-PMH / unixref metadata.

Currently cayenne can parse all CrossRef metadata, just over 50 million work records (with 100 millions of citation entries) in just under 4 hours on a modest 6-core machine.

Usage

Install leiningen, then run lein repl and try a few commands:

$ lein repl
> (use 'cayenne.core)
> (use 'cayenne.tasks)
> (process-dir (file "/some/dir/with/xml") :task (doi-record-json-writer "out.txt"))

This prints DOI records as JSON to out.txt. Alternatively, :task can be any function that takes parsed DOI record data structures (Clojure maps) and processes them in some way.

About

A high performance, parallel parser of CrossRef OAI-PMH / unixref metadata

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published