Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patch:lib] Split version number from paper identifier #39

Merged
merged 2 commits into from
Apr 6, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,9 +272,18 @@ paper = Arx('1809.09415')
#=> #<Arx::Paper:0x00007fb657b59bd0>

paper.id
#=> "1809.09415"
paper.id(version: true)
#=> "1809.09415v1"
paper.url
#=> "http://arxiv.org/abs/1809.09415"
paper.url(version: true)
#=> "http://arxiv.org/abs/1809.09415v1"
paper.version
#=> 1
paper.revision?
#=> false

paper.title
#=> "On finitely ambiguous Büchi automata"
paper.summary
Expand All @@ -293,8 +302,6 @@ paper.published_at
#=> #<DateTime: 2018-09-25T11:40:39+00:00 ((2458387j,42039s,0n),+0s,2299161j)>
paper.updated_at
#=> #<DateTime: 2018-09-25T11:40:39+00:00 ((2458387j,42039s,0n),+0s,2299161j)>
paper.revision?
#=> false

# Paper's comment
paper.comment?
Expand Down
4 changes: 2 additions & 2 deletions lib/arx.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,15 @@ module Arx
# 1705.01662v1
# 1412.0135
# 0706.0001v2
NEW_IDENTIFIER_FORMAT = %r"^\d{4}\.\d{4,5}(v\d+)?$"
NEW_IDENTIFIER_FORMAT = /^\d{4}\.\d{4,5}(v\d+)?$/

# The legacy arXiv paper identifier scheme (before 1 April 2007).
#
# @see https://arxiv.org/help/arxiv_identifier#old arXiv identifier (old)
# @example
# math/0309136v1
# cond-mat/0211034
OLD_IDENTIFIER_FORMAT = %r"^[a-z]+(\-[a-z]+)?\/\d{7}(v\d+)?$"
OLD_IDENTIFIER_FORMAT = /^[a-z]+(\-[a-z]+)?\/\d{7}(v\d+)?$/

class << self

Expand Down
43 changes: 38 additions & 5 deletions lib/arx/cleaner.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,44 @@ module Arx
# @private
class Cleaner

# Cleans strings.
# @param [String] string Removes newline/return characters and multiple spaces from a string.
# @return [String] The cleaned string.
def self.clean(string)
string.gsub(/\r\n|\r|\n/, ' ').strip.squeeze ' '
# arXiv paper URL prefix format
URL_PREFIX = /^(https?\:\/\/)?(www.)?arxiv\.org\/abs\//

class << self

# Cleans strings.
# @param [String] string Removes newline/return characters and multiple spaces from a string.
# @return [String] The cleaned string.
def clean(string)
string.gsub(/\r\n|\r|\n/, ' ').strip.squeeze ' '
end

# Attempt to extract an arXiv identifier from a string such as a URL.
#
# @param string [String] The string to extract the ID from.
# @param version [Boolean] Whether or not to include the paper's version.
# @return [String] The extracted ID.
def extract_id(string, version: false)
raise TypeError.new("Expected `version` to be boolean (TrueClass or FalseClass), got: #{version.class}") unless version == !!version
raise TypeError.new("Expected `string` to be a String, got: #{string.class}") unless string.is_a? String
string.gsub!(/(#{URL_PREFIX})|(\/$)/, '') if /#{URL_PREFIX}.+\/?$/.match? string
raise ArgumentError.new("Couldn't extract arXiv identifier from: #{string}") unless Validate.id? string
version ? string : string.sub(/v[0-9]+$/, '')
end

# Attempt to extract a version number from an arXiv identifier.
#
# @param string [String] The arXiv identifier to extract the version number from.
# @return [String] The extracted version number.
def extract_version(string)
reversed = extract_id(string, version: true).reverse

if /^[0-9]+v/.match? reversed
reversed.partition('v').first.to_i
else
raise ArgumentError.new("Couldn't extract version number from identifier: #{string}")
end
end
end
end
end
35 changes: 22 additions & 13 deletions lib/arx/entities/paper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,33 @@ class Paper
# @example
# 1705.01662v1
# cond-mat/0211034
# @param version [Boolean] Whether or not to include the paper's version.
# @return [String] The paper's identifier.
def id
@id.sub /https?\:\/\/arxiv\.org\/abs\//, ''
def id(version: false)
Cleaner.extract_id @id, version: version
end

# The URL of the paper on the arXiv website.
# @example
# http://arxiv.org/abs/1705.01662v1
# http://arxiv.org/abs/cond-mat/0211034
# @param version [Boolean] Whether or not to include the paper's version.
# @return [String] The paper's arXiv URL.
def url
@id
def url(version: false)
"http://arxiv.org/abs/#{id version: version}"
end

# The version of the paper.
# @return [Integer] The paper's version.
def version
Cleaner.extract_version @id
end

# Whether the paper is a revision or not.
# @note A paper is a revision if its {version} is greater than 1.
# @return [Boolean]
def revision?
version > 1
end

# @!method updated_at
Expand Down Expand Up @@ -58,13 +73,6 @@ def url
# @return [Array<Category>]
has_many :categories, Category, tag: 'category'

# Whether the paper is a revision or not.
# @note A paper is a revision if {updated_at} differs from {published_at}.
# @return [Boolean]
def revision?
@published_at != @updated_at
end

# @!method summary
# The summary (or abstract) of the paper.
# @return [String]
Expand Down Expand Up @@ -152,9 +160,10 @@ def revision?
end

inspector *%i[
id url title summary authors
id url version revision?
title summary authors
primary_category categories
published_at updated_at revision?
published_at updated_at
comment? comment
journal? journal
pdf? pdf_url
Expand Down
16 changes: 1 addition & 15 deletions lib/arx/query/query.rb
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,7 @@ def initialize(*ids, sort_by: :relevance, sort_order: :descending)

ids.flatten!
unless ids.empty?
ids.map! {|id| extract_id id}
Validate.ids ids
ids.map! &Cleaner.method(:extract_id)
@query << "&#{PARAMS[:id_list]}=#{ids * ','}"
end

Expand Down Expand Up @@ -241,18 +240,5 @@ def parenthesize(string)
def enquote(string)
CGI.escape("\"") + string + CGI.escape("\"")
end

# Attempt to extract an ID from an arXiv URL.
#
# @param url [String] The URL to extract the ID from.
# @return [String] The extracted ID if successful, otherwise the original string.
def extract_id(url)
prefix = %r"^(https?\:\/\/)?(www.)?arxiv\.org\/abs\/"
if %r"#{prefix}.*$".match? url
url.sub(prefix, '').sub(%r"\/$", '')
else
url
end
end
end
end