Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reset uri with every call to find #240

Merged
merged 2 commits into from
May 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 30 additions & 6 deletions lib/qa/authorities/linked_data/find_term.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ def initialize(term_config)
@term_config = term_config
end

attr_reader :term_config, :full_graph, :filtered_graph, :language, :id, :access_time_s, :normalize_time_s
private :full_graph, :filtered_graph, :language, :id, :access_time_s, :normalize_time_s
attr_reader :term_config, :full_graph, :filtered_graph, :language, :id, :uri, :access_time_s, :normalize_time_s
private :full_graph, :filtered_graph, :language, :id, :uri, :access_time_s, :normalize_time_s

delegate :term_subauthority?, :prefixes, :authority_name, to: :term_config

Expand Down Expand Up @@ -80,6 +80,7 @@ def perform_normalization
return full_graph.dump(:jsonld, standard_prefixes: true) if jsonld?

filter_graph
extract_uri
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting the uri here forces the uri to be updated for every call to find based on the current graph represented in filter_graph.

results = map_results
convert_results_to_json(results)
end
Expand Down Expand Up @@ -107,19 +108,42 @@ def map_results
ldpath_map: ldpaths_for_term, predicate_map: preds_for_term)
end

# special processing for loc ids for backward compatibility
def normalize_id
return id if expects_uri?
authority_name.to_s.casecmp('loc').zero? ? id.delete(' ') : id
loc? ? id.delete(' ') : id
end

# special processing for loc ids for backward compatibility
def loc_id
loc_id = URI.unescape(id)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking (but not always), loc ids have alpha characters followed by a blank followed by digits (e.g. n 123, sh 432). Prior to this version, ids could be passed in without the blank and work fine. Now that we are using the id to extract the uri, the ids must be an exact match to the id in the triples. So this method adds in a blank if one isn't there.

digit_idx = loc_id.index(/\d/)
loc_id.insert(digit_idx, ' ') if loc? && loc_id.index(' ').blank? && digit_idx > 0
loc_id
end

# determine if the current authority is LOC which may require special processing of its ids for backward compatibility
def loc?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes it clearer that all this special processing is for LOC backward compatibility only.

authority_name.to_s.casecmp('loc').zero?
end

def expects_uri?
term_config.term_id_expects_uri?
end

def uri
return @uri if @uri.present?
def extract_uri
return @uri = RDF::URI.new(id) if expects_uri?
@uri = graph_service.subjects_for_object_value(graph: @filtered_graph, predicate: RDF::URI.new(term_config.term_results_id_predicate), object_value: id.gsub('%20', ' ')).first
@uri = graph_service.subjects_for_object_value(graph: @filtered_graph, predicate: RDF::URI.new(term_config.term_results_id_predicate), object_value: URI.unescape(id)).first
Copy link
Contributor Author

@elrayle elrayle May 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since LOC doesn't always include a blank in their ids, try using the id as passed in before doing any special processing.

In all cases, the id is unescaped (e.g. '%20' is changed to ' ', and '%2E' is changed to '.', etc.) The encodings are needed when the value is part of a URL (like when they are passed in as part of the QA request), but are not part of the value in the graph.

return @uri unless loc? && @uri.blank?
# for backward compatibility, if an loc id as passed in fails to extract the URI, try to adding a blank to the id
@uri = graph_service.subjects_for_object_value(graph: @filtered_graph, predicate: RDF::URI.new(term_config.term_results_id_predicate), object_value: loc_id).first
if @uri.present?
Qa.deprecation_warning(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only showing the deprecation warning if it was able to successfully extract the uri using the special processing. If it still can't find it, then something else may be wrong and it should flow through the normal processing when there is a failure.

in_msg: 'Qa::Authorities::LinkedData::FindTerm',
msg: 'Special processing of LOC ids is deprecated; id should be an exact match of the id in the graph'
)
end
@uri
end

def ldpaths_for_term
Expand Down
68 changes: 68 additions & 0 deletions spec/fixtures/lod_loc_second_term_found.rdf.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<madsrdf:Topic rdf:about="http://id.loc.gov/authorities/subjects/sh1234" xmlns:madsrdf="http://www.loc.gov/mads/rdf/v1#">
<rdf:type rdf:resource="http://www.loc.gov/mads/rdf/v1#Authority"/>
<madsrdf:authoritativeLabel xml:lang="en">More Science</madsrdf:authoritativeLabel>
<madsrdf:elementList rdf:parseType="Collection">
<madsrdf:TopicElement>
<madsrdf:elementValue xml:lang="en">More Science</madsrdf:elementValue>
</madsrdf:TopicElement>
</madsrdf:elementList>
<madsrdf:hasVariant>
<madsrdf:Topic>
<rdf:type rdf:resource="http://www.loc.gov/mads/rdf/v1#Variant"/>
<madsrdf:variantLabel xml:lang="en">More Natural science</madsrdf:variantLabel>
<madsrdf:elementList rdf:parseType="Collection">
<madsrdf:TopicElement>
<madsrdf:elementValue xml:lang="en">More Natural science</madsrdf:elementValue>
</madsrdf:TopicElement>
</madsrdf:elementList>
</madsrdf:Topic>
</madsrdf:hasVariant>
<madsrdf:hasVariant>
<madsrdf:Topic>
<rdf:type rdf:resource="http://www.loc.gov/mads/rdf/v1#Variant"/>
<madsrdf:variantLabel xml:lang="en">More Science of science</madsrdf:variantLabel>
<madsrdf:elementList rdf:parseType="Collection">
<madsrdf:TopicElement>
<madsrdf:elementValue xml:lang="en">More Science of science</madsrdf:elementValue>
</madsrdf:TopicElement>
</madsrdf:elementList>
</madsrdf:Topic>
</madsrdf:hasVariant>
<madsrdf:hasVariant>
<madsrdf:Topic>
<rdf:type rdf:resource="http://www.loc.gov/mads/rdf/v1#Variant"/>
<madsrdf:variantLabel xml:lang="en">More Sciences</madsrdf:variantLabel>
<madsrdf:elementList rdf:parseType="Collection">
<madsrdf:TopicElement>
<madsrdf:elementValue xml:lang="en">More Sciences</madsrdf:elementValue>
</madsrdf:TopicElement>
</madsrdf:elementList>
</madsrdf:Topic>
</madsrdf:hasVariant>
<identifiers:lccn xmlns:identifiers="http://id.loc.gov/vocabulary/identifiers/">sh 1234</identifiers:lccn>
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel xml:lang="en" xmlns:skos="http://www.w3.org/2004/02/skos/core#">More Science</skos:prefLabel>
<skosxl:altLabel xmlns:skosxl="http://www.w3.org/2008/05/skos-xl#">
<rdf:Description>
<rdf:type rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/>
<skosxl:literalForm xml:lang="en">More Natural science</skosxl:literalForm>
</rdf:Description>
</skosxl:altLabel>
<skosxl:altLabel xmlns:skosxl="http://www.w3.org/2008/05/skos-xl#">
<rdf:Description>
<rdf:type rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/>
<skosxl:literalForm xml:lang="en">More Science of science</skosxl:literalForm>
</rdf:Description>
</skosxl:altLabel>
<skosxl:altLabel xmlns:skosxl="http://www.w3.org/2008/05/skos-xl#">
<rdf:Description>
<rdf:type rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/>
<skosxl:literalForm xml:lang="en">More Sciences</skosxl:literalForm>
</rdf:Description>
</skosxl:altLabel>
<skos:altLabel xml:lang="en" xmlns:skos="http://www.w3.org/2004/02/skos/core#">More Natural science</skos:altLabel>
<skos:altLabel xml:lang="en" xmlns:skos="http://www.w3.org/2004/02/skos/core#">More Science of science</skos:altLabel>
<skos:altLabel xml:lang="en" xmlns:skos="http://www.w3.org/2004/02/skos/core#">More Sciences</skos:altLabel>
</madsrdf:Topic>
</rdf:RDF>
23 changes: 21 additions & 2 deletions spec/lib/authorities/linked_data/find_term_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -95,11 +95,17 @@

context 'in LOC authority' do
context 'term found' do
let :results do
before do
stub_request(:get, 'http://id.loc.gov/authorities/subjects/sh85118553')
.to_return(status: 200, body: webmock_fixture('lod_loc_term_found.rdf.xml'), headers: { 'Content-Type' => 'application/rdf+xml' })
lod_loc.find('sh 85118553', subauth: 'subjects')
stub_request(:get, 'http://id.loc.gov/authorities/subjects/sh1234')
.to_return(status: 200, body: webmock_fixture('lod_loc_second_term_found.rdf.xml'), headers: { 'Content-Type' => 'application/rdf+xml' })
end

let(:results) { lod_loc.find('sh 85118553', subauth: 'subjects') }
let(:second_results) { lod_loc.find('sh 1234', subauth: 'subjects') }
let(:results_without_blank) { lod_loc.find('sh85118553', subauth: 'subjects') }

it 'has correct primary predicate values' do
expect(results[:uri]).to eq 'http://id.loc.gov/authorities/subjects/sh85118553'
expect(results[:uri]).to be_kind_of String
Expand Down Expand Up @@ -156,6 +162,19 @@
.to eq ['headings beginning with the word [Scientific;] and subdivision [Science] under ethnic groups and individual wars, e.g. [World War, 1939-1945--Science]']
expect(results['predicates']['http://www.w3.org/2004/02/skos/core#inScheme']).to eq ['http://id.loc.gov/authorities/subjects']
end

it 'has correct primary predicate values for second request' do
expect(results[:uri]).to eq 'http://id.loc.gov/authorities/subjects/sh85118553'
expect(second_results[:uri]).to eq 'http://id.loc.gov/authorities/subjects/sh1234'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails before the fix. The failure is only seen on the second find by id which is why this test checks results before checking second_results.

expect(second_results[:uri]).to be_kind_of String
expect(second_results[:id]).to eq 'sh 1234'
expect(second_results[:label]).to eq ['More Science']
expect(second_results[:altlabel]).to include('More Natural science', 'More Science of science', 'More Sciences')
end

it 'extracts correct uri when loc id does not have blank' do
Copy link
Contributor Author

@elrayle elrayle May 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests that when the data has an LOC id with a blank, but it is passed in without a blank, the URI can still be extracted. It failed before the special loc id processing was added and passes with the special processing.

expect(results_without_blank[:uri]).to eq 'http://id.loc.gov/authorities/subjects/sh85118553'
end
end
end

Expand Down