Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added handling for institutions that have ampersands when querying pu… #1060

Merged
merged 1 commit into from
May 17, 2019

Conversation

justinlittman
Copy link
Contributor

…bmed.

closes #1059

end

it 'generates the correct term string' do
expect(query_author.send(:term)).to eq('((Altman Russ[Author]) OR (Altman R[Author])) AND (Stanford University[Affiliation] OR William Mary[Affiliation] OR William and Mary[Affiliation])')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we confident the extra and isn't going to screw up the query parser? Is there a way to designate "William and Mary" is one token rather than 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Determined this approach by testing as described in #1059, which isn't to say that there are not other ways of formulating the query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we're doing \"oregon\"[MeSH Terms] would it be possible to do \"William & Mary\"[Affiliation]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried various variations with escaped quoting, escaping the ampersand, and using parentheses to no avail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@peetucket peetucket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears we already have some code that is doing something similar for the WoS search, this class: https://github.com/sul-dlss/sul_pub/blob/master/lib/agent/author_institution.rb

It is stripping things like "and" and "university". It is used here to construct a list of institutions to add to the query:

https://github.com/sul-dlss/sul_pub/blob/master/lib/web_of_science/query_author.rb#L40-L42

Thoughts on re-using this logic? The reason we ended up stripping "University" and "Institution" and "College" in WoS queries is I believe for a similar reason (it was picking up extra stuff), perhaps not a problem for Pubmed. But wanted to acknowledge a bit of duplication here for consideration.

@peetucket
Copy link
Member

author=Author.find(37959)
WebOfScience::QueryAuthor.new(author).send(:institutions)
=> ["stanford", "oregon health & science", "washington"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Determine why pubmed harvester is creating lots of publications
3 participants