Skip to content

Commit

Permalink
Merge modsulator gem into the rails app
Browse files Browse the repository at this point in the history
This reduces the overhead of having to maintain two projects
independently.  Since this app is the only consumer of the modsulator
gem, this approach makes sense.
  • Loading branch information
jcoyne committed Oct 29, 2018
1 parent 738d66d commit 7699821
Show file tree
Hide file tree
Showing 70 changed files with 116,556 additions and 16 deletions.
9 changes: 7 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
*.rbc
capybara-*.html
.rspec
/log
/tmp
/db/*.sqlite3
/db/*.sqlite3-journal
/public/system
Expand All @@ -15,6 +13,13 @@ pickle-email-*.html
# TODO Comment out these rules if you are OK with secrets being uploaded to the repo
config/initializers/secret_token.rb

# Ignore all logfiles and tempfiles.
/log/*
/tmp/*
.tmp
!/log/.keep
!/tmp/.keep

# dotenv
# TODO Comment out this rule if environment variables can be committed
.env
Expand Down
10 changes: 6 additions & 4 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,18 @@ git_source(:github) do |repo_name|
"https://github.com/#{repo_name}.git"
end

gem 'modsulator', '~> 1.2'
gem 'stanford-mods-normalizer'
gem 'honeybadger'

# Bundle edge Rails instead: gem 'rails', github: 'rails/rails'
gem 'rails', '~> 5.0.2'

# Use Puma as the app server
gem 'puma', '~> 3.0'

gem 'stanford-mods-normalizer'
gem 'roo', '>= 2.7.1'
gem 'honeybadger'
gem 'deprecation'


group :development, :test do
# Call 'byebug' anywhere in the code to stop execution and get a debugger console
gem 'byebug', platform: :mri
Expand Down
12 changes: 3 additions & 9 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ GEM
thor (~> 0.19.4)
tins (~> 1.6)
crass (1.0.4)
deprecation (0.99.0)
deprecation (1.0.0)
activesupport
diff-lcs (1.3)
dlss-capistrano (3.4.1)
Expand Down Expand Up @@ -104,13 +104,6 @@ GEM
mini_mime (1.0.1)
mini_portile2 (2.3.0)
minitest (5.11.3)
modsulator (1.3.0)
activesupport
deprecation (~> 0)
equivalent-xml (>= 0.6.0)
nokogiri
roo (>= 2.7.1)
stanford-mods-normalizer (~> 0.1)
net-scp (1.2.1)
net-ssh (>= 2.6.5)
net-ssh (5.0.2)
Expand Down Expand Up @@ -210,13 +203,14 @@ DEPENDENCIES
capistrano-passenger
capistrano-rails
coveralls
deprecation
dlss-capistrano
equivalent-xml (>= 0.6.0)
honeybadger
listen (~> 3.0.5)
modsulator (~> 1.2)
puma (~> 3.0)
rails (~> 5.0.2)
roo (>= 2.7.1)
rspec-rails (~> 3.5)
spring
spring-watcher-listen (~> 2.0.0)
Expand Down
8 changes: 8 additions & 0 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,11 @@
require_relative 'config/application'

Rails.application.load_tasks

Rake::Task['spec'].clear
RSpec::Core::RakeTask.new(:spec) do |t|
t.pattern = 'spec/**/*_spec.rb'

# The modsulator integration_tests are very slow
t.exclude_pattern = 'spec/integration_tests/*_spec.rb'
end
2 changes: 1 addition & 1 deletion app/controllers/modsulator_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@ def create
end

def version
render plain: Gem.loaded_specs['modsulator'].version.version
render plain: '2.0.0'
end
end
61 changes: 61 additions & 0 deletions app/models/modsulator_sheet.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# File "modsulator_sheet.rb" - a class to load and validate metadata spreadsheets (.xlsx or .csv) for input
# to Modsulator.

require 'json'
require 'roo'

# This class provides methods to parse Stanford's MODS spreadsheets into either an array of hashes, or a JSON string.
class ModsulatorSheet
attr_reader :file, :filename

# Creates a new ModsulatorSheet. When called with temporary files, the filename must be specified separately, hence the
# second argument.
# @param [File] file The input spreadsheet
# @param [String] filename The filename of the input spreadsheet.
def initialize(file, filename)
@file = file
@filename = filename
end


# Loads the input spreadsheet into an array of hashes. This spreadsheet should conform to the Stanford MODS template format,
# which has three header rows. The first row is a kind of "super header", the second row is an intermediate header and the
# third row is the header row that names the fields. The data rows are in the fourth row onwards.
#
# @return [Array<Hash>] An array with one entry per data row in the spreadsheet. Each entry is a hash, indexed by
# the spreadsheet headers.
def rows
# Parse the spreadsheet, automatically finding the header row by looking for "druid" and "sourceId" and leave the
# header row itself out of the resulting array. Everything preceding the header row is discarded.
@rows ||= spreadsheet.parse(header_search: ['druid', 'sourceId'], clean: true)
end


# Opens a spreadsheet based on its filename extension.
#
# @return [Roo::CSV, Roo::Excel, Roo::Excelx] A Roo object, whose type depends on the extension of the given filename.
def spreadsheet
@spreadsheet ||= case File.extname(@filename)
when '.csv' then Roo::Spreadsheet.open(@file, extension: :csv)
when '.xls' then Roo::Spreadsheet.open(@file, extension: :xls)
when '.xlsx' then Roo::Spreadsheet.open(@file, extension: :xlsx)
else fail "Unknown file type: #{@filename}"
end
end


# Get the headers used in the spreadsheet
def headers
rows.first.keys
end


# Convert the loaded spreadsheet to a JSON string.
# @return [String] A JSON string.
def to_json
json_hash = {}
json_hash['filename'] = File.basename(filename)
json_hash['rows'] = rows
json_hash.to_json
end
end
Loading

0 comments on commit 7699821

Please sign in to comment.