-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature index data #15
Merged
jkeck
merged 23 commits into
sul-dlss-deprecated:master
from
anusharanganathan:feature_index_data
Jan 7, 2016
Merged
Changes from 20 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
31b0b0b
Merge pull request #3 from anusharanganathan/master
mejackreed fd6177c
Merge pull request #5 from anusharanganathan/feature_capistrano
mejackreed 4d49633
Merge pull request #6 from anusharanganathan/feature_mods
mejackreed f557286
Merge pull request #7 from anusharanganathan/feature_annotation_data
mejackreed 2c5dabf
Merge pull request #8 from anusharanganathan/feature_iiif_manifest_data
mejackreed cfd6577
adds travis support
mejackreed e053c31
add coveralls support
mejackreed f4ba203
Merge pull request #13 from sul-dlss/add-ci
13d4169
Added new method in annotations to extract all resources - annotation…
c9ee61c
Using webmock stubs for all http requests in spec
f6a8eb2
Refactored the rake task to index records into a class and added rspe…
2a88df7
Added indexing fixture objects to rake task
ca3cf20
Allow localhost in webmock and change port for solr test env
6592a11
Merge pull request #9 from anusharanganathan/feature_solr_config
e1c9ed0
Fixed .gitignore conflict
cd178c8
Fixed .gitignore conflict
2fa90e2
Travis rake task needs access to github
38cd691
Enable webmock only in rsepc for test env
992d708
Fixed pending tests - test before execute for solr stub. Thanks @jkeck!
e2dccfa
Removed github from spec helper - undoing prev commit
0503afc
Retain test port at 8888
81f45d7
Changed conditional statements to read better
51b9637
Added spec tests to test for nil
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,3 +33,7 @@ config/environments/*.local.yml | |
|
||
# Ignore coverage directory | ||
coverage | ||
|
||
#Ignore all data files | ||
data/* | ||
!data/.keep |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
http://dms-data.stanford.edu/data/manifests/Stanford/kq131cs7229/manifest.json | ||
http://dms-data.stanford.edu/data/manifests/BnF/jr903ng8662/manifest.json | ||
http://dms-data.stanford.edu/data/manifests/Parker/fh878gz0315/manifest.json | ||
http://dms-data.stanford.edu/data/manifests/Parker/ft757ht3699/manifest.json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
require 'csv' | ||
class DataIndexer | ||
|
||
# Index mods and annotations data from a list of manifest urls or each url | ||
# Params: | ||
# +collection+:: Name of collection the manifest(s) belong to | ||
# +csv_file+:: string containing the path to the csv file. | ||
# File to have one url per line and no header | ||
# +manifest_url+:: url to the mnaifest file | ||
# Usage: | ||
# DataIndexer.new('collection_name', 'file_path').run | ||
# to index csv file | ||
# DataIndexer.new('collection_name', nil, 'url').run | ||
# to index one manifest at url | ||
def initialize(collection = nil, csv_file = nil, manifest_url = nil) | ||
@collection = collection | ||
@csv_file = csv_file | ||
@url = manifest_url | ||
@manifest = nil | ||
@title = nil | ||
@doc = SolrDocument.new | ||
@solr = Blacklight.default_index.connection | ||
end | ||
|
||
# Index and commit mods and annotations data either | ||
# from a list of manifest urls or each url | ||
# depending on the options | ||
def run | ||
if !@csv_file.blank? && File.exist?(@csv_file) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps more of a stylistic thing but |
||
index_csv | ||
commit | ||
elsif @url | ||
index | ||
commit | ||
end | ||
end | ||
|
||
# Index mods and annotations data from a list of manifest urls | ||
# csv file to contain one url per line and no header | ||
def index_csv | ||
return if @csv_file.blank? || !File.exist?(@csv_file) | ||
CSV.foreach(@csv_file) do |row| | ||
@url = row[0] | ||
index | ||
end | ||
end | ||
|
||
# Index MODS and annotation lists fetched from the IIIF manifest url | ||
def index | ||
fetch_manifest | ||
if define_doc | ||
index_mods | ||
index_annotations | ||
end | ||
end | ||
|
||
# Commit the data indexed in solr | ||
def commit | ||
@solr.commit | ||
end | ||
|
||
protected | ||
|
||
# Get the manifest data | ||
def fetch_manifest | ||
@manifest = IiifManifest.new(@url) | ||
@manifest.read_manifest | ||
end | ||
|
||
def define_doc | ||
unless @manifest.title.blank? || @manifest.druid.blank? | ||
@doc[:collection] = @collection | ||
@doc[:druid] = @manifest.druid | ||
@doc[:iiif_manifest] = @url | ||
@doc[:mods_url] = @manifest.mods_url | ||
@doc[:modsxml] = @manifest.fetch_modsxml | ||
return true | ||
end | ||
false | ||
end | ||
|
||
# index mods data in solr | ||
def index_mods | ||
solr_doc = @doc.mods_to_solr | ||
unless solr_doc.blank? | ||
@title = solr_doc['title_search'] if solr_doc.key?('title_search') | ||
@solr.add solr_doc | ||
end | ||
end | ||
|
||
# index all of the annotations data in solr | ||
def index_annotations | ||
list_count = 0 | ||
doc_count = 0 | ||
add_count = 0 | ||
@manifest.annotation_lists.each do |al| | ||
annotation_list = @doc.read_annotation(al['@id']) | ||
@doc.resources(annotation_list).each do |a| | ||
data = { "annotation" => a, "manuscript" => @title, "folio" => al['label'], "url" => al['@id'] } | ||
solr_doc = @doc.annotation_to_solr(data) | ||
unless solr_doc.blank? | ||
@solr.add solr_doc | ||
add_count += 1 | ||
end | ||
doc_count += 1 | ||
end | ||
list_count += 1 | ||
end | ||
[list_count, doc_count, add_count] | ||
end | ||
|
||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason we're changing to port 8984 for the test env here?