New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest CLI (rake task) and Logging #150
Conversation
This is common in the case of sample data, which omits TIFF for purposes of making larger samples available online.
…o Rails.logger broadcast
spec/newspaper_works_rake_spec.rb
Outdated
include_context 'ndnp fixture setup' | ||
|
||
before(:all) do | ||
# task_path = File.join(NewspaperWorks::GEM_PATH, 'lib/tasks/newspaper_works_tasks') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented line (unless needed for future reference?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in f40bf67
).exactly(1).times | ||
expect_file_assignment_logging(adapter) | ||
adapter.ingest | ||
# expect { adapter.ingest }.not_to raise_error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented line (unless needed for future reference?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch — this was accidental, that test (however superfluous it could be to fail on error) passes (originally commented out because it looked like logging tests were not playing well with running ingest method in a block, but they do). I have re-enabled this line and removed the line before it. Fixed in f40bf67
|
||
module NewspaperWorks | ||
module Ingest | ||
module NDNP | ||
class BatchIngester | ||
attr_accessor :path, :batch | ||
|
||
# alternate constructor from ARGV | ||
# @param options [Array<String> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Array<String>
intentional here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, missing closing bracket. Fixed in f40bf67
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is really good, nice work.
Would it be possible to add a few lines that output text to the console that indicate:
- batch is starting/running
- batch has finished, see
ingest.log
for details
I'm thinking something along the lines of what Hyrax does for the default_admin_set
create task:
https://github.com/samvera/hyrax/blob/master/lib/tasks/default_admin_set.rake#L7
def initialize(path) | ||
@path = xml_path(path) | ||
@path = self.class.xml_path(path) | ||
raise IOError, 'No batch file found: #{path}' if @path.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to use double-quotes here for #{path}
to be interpolated correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fixed in f40bf67
rely on other classes to do this incidentally to make output to ingest.log).
available, when (human readable) page number is not explicitly provided.
…OCR of ocr_gray.tiff
…g ALTO to JSON coordinates.
# return layout for page | ||
def alto_layout(xml, pxwidth, pxheight, &block) | ||
xml.Layout do | ||
xml.Page(ID: 'ID1', | ||
PHYSICAL_IMG_NR: '1', | ||
HEIGHT: pxwidth.to_i, | ||
WIDTH: pxwidth.to_i) do | ||
WIDTH: pxheight.to_i) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking, this is correct, yes? HEIGHT: pxwidth
, WIDTH: pxheight
(It worked fine in the UI, just looks a bit weird.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, I think I accidentally got these identifiers switched, even though the array order in the end outcome JSON is correct. This needs correcting, or the code remains confusing. I will correct ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, strike last comment ^^^. This is still wrong, but I was confusing with ALTO reading, not ALTO generation. This smells like a bug, but scope would be limited to downloaded ALTO, not JSON word coordinates (which are created from OCR, not from this generated ALTO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected in 9ad9308
…while derivs are still processing plus minor tweaks to BatchIngester logging
This PR addresses the last two remaining issues in #80 (NDNP Ingest user story):