Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web of Science queries #223

Merged
merged 10 commits into from
Oct 5, 2017
Merged

Web of Science queries #223

merged 10 commits into from
Oct 5, 2017

Conversation

dazza-codes
Copy link
Contributor

@dazza-codes dazza-codes commented Sep 27, 2017

Fix #197

Provides query functionality for all the new WoS SOAP API endpoints

TODO:

Related issues:

Integration test

Use bundle exec rails c and, when you have the WoS credentials in place, run

wos_queries = WosQueries.new(WosClient.new(Settings.WOS.AUTH_CODE, Settings.WOS.LOG_LEVEL.to_sym))
records = wos_queries.search_by_name('Altman, Russ')
records.count #=> 464 (matched the WoS total records available)
records.uids # an array of the UID values
records.print # long listing of all the records XML

class WosQueries

# Default database 'WOK' is an umbrella for everything
DATABASE = 'WOK'.freeze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need a constant here, just put the string as the default in initialize. That makes it part of the YARD.

attr_reader :wos_client

# @param wos_client [WosClient] a Web Of Science client
# @param database [String] a WOS database identifier (default 'WOS')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default now WOK

@dazza-codes dazza-codes force-pushed the wos-queries branch 16 times, most recently from e3bd992 to d8876a1 Compare October 3, 2017 00:39
@dazza-codes
Copy link
Contributor Author

dazza-codes commented Oct 3, 2017

@dazza-codes dazza-codes force-pushed the wos-queries branch 2 times, most recently from 3678a3b to 2398dbd Compare October 3, 2017 17:18
@dazza-codes
Copy link
Contributor Author

I've worked around the Nokogiri issues related to immutability, by using the Document.dup deep copy method.

@dazza-codes dazza-codes changed the title [WIP] WoS queries Web of Science queries Oct 3, 2017
# @return [Hash] citingArticles parameters
def citing_articles_params(uid)
{
databaseId: @database,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should have an attr_reader :database and use it in these methods instead of referencing the instance variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# @param uids [Array<String>] a list of WOS UIDs
# @return [Hash] retrieveById parameters
def retrieve_by_id_params(uids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ton of repetition here in the params methods. Would it make sense to just have a base_params method (that looks exactly like the body of this one) and then invoke:

base_params.merge(different_key: different_val)

in the places where individual key/val pairs need to be added or overridden? This would put the focus on what is unique to each set of params rather than repeating all the things that are in common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look closely at it, there are slight differences in each set of params. Given that we know the request XML is order-sensitive (right? why? weird, if so, but yea ...), doesn't this make it difficult to use a Hash.merge strategy with any confidence? I think we should be glad that savon allows us to use Hash params. These methods have already been refactored a bit to allow for some existing use-case flexibility; we might discover more as we go. I'm open to exploring whether or not a Hash.merge strategy might work and will try it out soon. If it's not a blocker right now, maybe we can move on and adapt as we go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I acknowledged those slight differences in my comment. My intent is to have a common starting point, then merge/reject from there. This would make it obvious what each of those differences is, I.E., what is special about this method.

The ordered XML thing does not apply since we are still giving the Savon gem a Hash. It figures out the order now, and it would figure out the order the same way, since we would be passing it exactly the same param. This is just about how we build the hash and how much we repeat ourselves. In general, this codebase has suffered from lack of facility manipulating ruby Hash, so I would like to start repairing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take another look at this.

Copy link
Contributor

@atz atz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments and a couple requests.

describe '#to_xml' do
it 'works with encoded records' do
result = wos_records_encoded.to_xml
expect(result).to be_an String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a stronger expectation here than just "gimme a String"? Since we are dealing with encoding, it is sorta important that not all strings are equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an XSD validation for the record data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least expect it not to contain &lt; or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think full XSD is required. Just look for something that you expect the encoding to adjust (or not adjust) and target it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

describe '#decode_records' do
it 'works' do
result = wos_records_encoded.send(:decode_records)
expect(result).to be_an String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

unless uid_intersection.empty?
# create a new set of records, use a philosophy of immutability
# Nokogiri::XML::NodeSet enumerable methods do not return new objects
# Nokogiri::XML::Document.dup is a deep copy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the hangup with Nokogiri is unsurprising and doesn't require a bunch of commentary. If you want to build a nodeset from existing nodes without retaining any connection back to the parent document (which all nodes have), then you can round-trip the node through serialization .to_xml and back into a new Nokogiri XML node/fragment. This would reduce the memory utilization by not copying the whole document.

Copy link
Contributor

@atz atz Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is not a bug (in Nokogiri).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss this in person, if necessary, as I like it as it is. I hope it's not a blocker. The round trip via .to_xml will not gain anything, as the doc only contains records anyway. The explanation fills a gap in the Nokogiri documentation and I believe will serve as a note to prevent later refactoring that could easily mistakenly introduce violations of the immutability desired (which should be caught by specs, but might not be understood without this commentary).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The round trip is what divorces it from the parent document (since it becomes just a String value, with no ability to point to location of the originating document in Ruby memory). The separation is the basis of your motivation to .dup. I agree this may benefit from better docs, but it is not unexpected behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get how the .to_xml works. What I'm saying is that the doc just contains REC elements, so there's nothing else duped by the doc.dup anyway, so there's not much gained from all the extra serialization transforms involved in the .to_xml; it will just slow things down and that might be worse that a faster doc.dup when collecting 100's of records.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there may be a computational tradeoff for the benefit of smaller memory footprint. We can defer on this one for later.


# @param name [String] a CSV name pattern: {last name}, {first_name} [{middle_name} | {middle initial}]
# @return [WosRecords]
def search_by_name(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have these 5 public methods that are effectively aliases for private ones? Put the body of the private methods here and remove the "collator" methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to move some stuff where this is done in #242 already, into this PR. This pattern of using a collator emerged from doing one request first and then another and another and then noticing that the collators themselves could be refactored, until they became so small that they, well, don't need to exist any more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

atz
atz previously requested changes Oct 5, 2017
# @param record_setB [WosRecords]
# @return [Set] duplicate WoS UID values
def duplicate_uids(record_setB)
uids.to_set.intersection(record_setB.uids.to_set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array already has an intersection operator in Ruby, the &. So without the need to convert to_set, this should be uids & record_setB.uids. Intersection also deduplicates multiples and preserves order!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll try that, good tip.

Copy link
Contributor Author

@dazza-codes dazza-codes Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this works for me:

> x = [*1..10]
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
> y = [*1..10].reject {|i| i % 2 == 0 }.reverse
=> [9, 7, 5, 3, 1]
> x & y
=> [1, 3, 5, 7, 9]
> x.reverse & y
=> [9, 7, 5, 3, 1]
> x.reverse & y.reverse
=> [9, 7, 5, 3, 1]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dazza-codes
Copy link
Contributor Author

Review changes pushed in separate commits to clearly identify how each commit addressed the review concerns.

@dazza-codes dazza-codes dismissed atz’s stale review October 5, 2017 04:41

Most of the existing review concerns are fixed, let's review again.

atz
atz previously requested changes Oct 5, 2017
describe 'Enumerable mixin' do
it 'responds to Enumerable.instance_methods' do
methods = Enumerable.instance_methods
methods.each { |method| expect(wos_records_encoded).to respond_to(method) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to enumerate methods individually, because that is how Ruby inheritance works. Please replace this spec with described_class.is_a?(Enumerable) or consider removing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@atz
Copy link
Contributor

atz commented Oct 5, 2017

Thanks for the responses.

The .dup and Hash composition comments are not blockers, as long as you are open to their revision in the future. In that regard, I would as: are you comfortable enough with the tests coverage to permit that?

Would appreciate attention to one outstanding item re: Enumerable testing.

@atz atz dismissed their stale review October 5, 2017 22:24

Accommodated review

@atz atz merged commit febbefb into master Oct 5, 2017
@atz atz removed the in progress label Oct 5, 2017
@atz atz deleted the wos-queries branch October 5, 2017 22:57
@dazza-codes dazza-codes self-assigned this Oct 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants