-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Date accessors #26
Date accessors #26
Conversation
lib/mods/date.rb
Outdated
|
||
text.gsub(/^[\[]+/, '').gsub(/[\.\]]+$/, '') | ||
Date.cleanup(text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call super
here?
lib/mods/date.rb
Outdated
@@ -95,11 +102,11 @@ def self.supports?(text) | |||
|
|||
# Full text extractor for MM/DD/YYYY-formatted dates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this comment need to change? Are we accepting 3 digit years now?
lib/mods/date.rb
Outdated
|
||
def self.cleanup(text) | ||
matches = text.match(REGEX) | ||
roman_to_int(matches[:year].upcase).to_s | ||
end | ||
|
||
def self.roman_to_int(value) | ||
value = value.dup | ||
value = value.dup.tr('.', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to dup if we're running a tr
won't that create a copy for you?
@@ -145,9 +152,10 @@ def self.roman_to_int(value) | |||
end | |||
end | |||
|
|||
# Full-text extractor for centuries encoded as Roman numerals | |||
# Full-text extractor for centuries encoded as Roman numerals; sometimes centuries | |||
# are given as e.g. xvith, hence the funny negative look-ahead assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
lib/mods/date.rb
Outdated
@@ -181,16 +189,77 @@ def self.cleanup(text) | |||
end | |||
end | |||
|
|||
# Full-text extractor for data formatted as YYYY-YYYY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or YYY-YYY
?
lib/mods/date.rb
Outdated
class YearRangeFormat < ExtractorDateFormat | ||
REGEX = /(?<start>\d{3,4})-(?<end>\d{3,4})/ | ||
|
||
def self.cleanup(text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that this normalizes to YYYY/YYYY
lib/mods/date.rb
Outdated
|
||
def self.cleanup(text) | ||
matches = text.match(REGEX) | ||
"#{matches[:year]}X" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? Can you leave a comment?
lib/mods/date.rb
Outdated
class EmbeddedBCYearFormat < ExtractorDateFormat | ||
REGEX = /(?<year>\d{3,4})\s?B\.?C\.?/i | ||
|
||
def self.cleanup(text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment that this normalizes to -YYYY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these methods normalize to EDTF. I'm not sure a comment on each of them is particularly compelling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will rename cleanup
to normalize_to_edtf
for clarity.
# Full-text extractor that tries hard to pick any year present in the data | ||
class EmbeddedYearFormat < ExtractorDateFormat | ||
REGEX = /(?<prefix>-)?(?<year>\d{3,4})/ | ||
REGEX = /(?<prefix>-)?(?<!\d)(?<year>\d{4})(?!\d)/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of what a prefix might be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex is matching just a literal -
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
# Full-text extractor that tries hard to pick any year present in the data | ||
class EmbeddedYearWithBracketsFormat < ExtractorDateFormat | ||
# [YYY]Y Y[YYY] [YY]YY Y[YY]Y YY[YY] YYY[Y] YY[Y]Y Y[Y]YY [Y]YYY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah. 😱 Can you add a comment about what this stuff means? Is there a part of a spec I can look at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I have no idea where it comes from, I just know we have lots of it.
7d6c472
to
fcb7620
Compare
No description provided.