arxivsync 0.2.0

Ruby OAI interface for harvesting the arXiv. Can be used to store and update an XML mirror of paper metadata, and parse the XML into Ruby objects to allow conversion into a friendlier format.

Installation

  gem install arxivsync

Usage

Creating or updating an archive

Use the included shell command:

  arxivsync ARCHIVE_DIR

This stores each XML response as an individual file, each containing up to 1000 records. Following an initial harvest, you can rerun this to add additional files containing all records since the last harvest.

Remember to leave at least a day between syncs-- the temporal granularity doesn't go any smaller than that!

Reading from an archive

  archive = ArxivSync::XMLArchive.new("/home/foo/savedir")
  archive.read_metadata do |papers|
    # Papers come in blocks of at most 1000 at a time
    papers.each do |paper|
      # Do stuff with papers
    end
  end

Parses the XML files using a SAX parser and yields Structs representing the metadata as it goes. The structures returned will closely match the arxivRaw format.

Download and parse immediately

If you just want arxivsync to do the request-cycle and parsing bits but handle storage yourself:

  ArxivSync.get_metadata(oai_params) do |resp, papers|
    papers.each do |paper|
      # Do stuff with paper
    end
  end

Contributing

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
bin		bin
lib		lib
test		test
.gitignore		.gitignore
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
arxivsync.gemspec		arxivsync.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

lib

lib

test

test

.gitignore

.gitignore

Gemfile

Gemfile

LICENSE.txt

LICENSE.txt

README.md

README.md

Rakefile

Rakefile

arxivsync.gemspec

arxivsync.gemspec

Repository files navigation

arxivsync 0.2.0

Installation

Usage

Creating or updating an archive

Reading from an archive

Download and parse immediately

Contributing

About

Releases

Packages

Languages

License

scirate/arxivsync

Folders and files

Latest commit

History

Repository files navigation

arxivsync 0.2.0

Installation

Usage

Creating or updating an archive

Reading from an archive

Download and parse immediately

Contributing

About

Resources

License

Stars

Watchers

Forks

Languages