Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple names for gnr_resolve() #12

Closed
lyttonhao opened this issue Mar 9, 2015 · 17 comments · Fixed by #19
Closed

Support multiple names for gnr_resolve() #12

lyttonhao opened this issue Mar 9, 2015 · 17 comments · Fixed by #19

Comments

@lyttonhao
Copy link
Contributor

Since the return line in gnr.py only return the first result, current gnr_resolve don't support to return results of multiple names. I change this line to return all results. It works well when the query containing about 100 names, but gets error of " No JSON object could be decoded" when the number is larger. I haven't fixed it. Anyone can help?

@sckott
Copy link
Owner

sckott commented Mar 9, 2015

hi @lyttonhao I'll take a look later today...

@lyttonhao
Copy link
Contributor Author

Thanks. @sckott

@sckott
Copy link
Owner

sckott commented Mar 11, 2015

@lyttonhao I used your fix in your fork for parsing more than 1, and fixed so that works with > 1 name passed in.

Can you share the example that was failing for you?

@lyttonhao
Copy link
Contributor Author

Okay. I will test the new code soon. Thanks, @sckott.

@lyttonhao
Copy link
Contributor Author

Hi @sckott, I think there is still a problem as I faced before. When I test 300 names it works well, but it failed when querying 500 or more names. It seems that the parameters should not be too long.

@sckott
Copy link
Owner

sckott commented Mar 15, 2015

See the documentation for the API http://resolver.globalnames.org/api they allow GET and POST requests. They don't say i think in those docs, but I found https://github.com/ropensci/taxize/blob/master/R/gnr_resolve.R#L23-L25 that 300 does work okay with GET, but after that POST is better

should be a simple thing to add in POST if you are interested

@lyttonhao
Copy link
Contributor Author

Hi @sckott, I've added some code to work with POST according to https://github.com/ropensci/taxize/blob/master/R/gnr_resolve.R#L86-L97. Below is my corresponding codes:

  elif http == 'post':
        with open('__gnr_names.txt', 'wb') as f:
            for name in names:
                f.write("1|%s\n"%name)
        payload = {'data_source_ids': source, 'format': format,
                'resolve_once': resolve_once, 'with_context': with_context,
                'best_match_only': best_match_only, 'header_only': header_only,
                'preferred_data_sources': preferred_data_sources}
        out = requests.post(url, params = payload, files = {'file': open('__gnr_names.txt', 'rb')} )
        out.raise_for_status()
        result_json = out.json()
        newurl = result_json['url']
        while result_json['status'] == 'working':
           # print result_json['message']
            out = requests.get(url=newurl)
            result_json = out.json()

However, it seems that when while result_json['status'] == 'working':, it would be an infinite loop. Can you give some advices? Thank your very much.

@sckott
Copy link
Owner

sckott commented Mar 23, 2015

@lyttonhao I'll have a look soon, trying to get testing and CI set up first, so we can have checks on all change/PR's, etc.

@sckott
Copy link
Owner

sckott commented Mar 24, 2015

@lyttonhao That while loop is used because when you use a POST request you get back a URL for a job that is processing, for which you need to send a new GET request to retrieve the data. So the while loop checks pinging the server until it retrieves the data itself, not just a message saying that it is still working. Does that make sense? Send a PR when you think you got is solved, or even if you don't, then I can take a look and see if I can help.

@lyttonhao
Copy link
Contributor Author

Hi @sckott, I'm very sorry that I've missed your message these days. Do you mean that change the while constraint? I've change it to while not 'data' in result_json:, but it seems that it still doesn't work.
Hi, @panks, I try to add "time.sleep(10)" as in your code. I'm afraid it's still in an infinite loop in my computer. Does it work well in your computer when the number of queried name is larger than 1000?

@panks
Copy link
Contributor

panks commented Mar 29, 2015

@lyttonhao I'm not sure. I think when GNR API starts operating in a queue it's not working as it's supposed to be. Here is a URL response from a job which I submitted more than 6 hours back, for query size = 1,010
http://resolver.globalnames.org/name_resolvers/5jyg8wkhbvoa.json

It still shows status as 'working'. Maybe they need to fix things on their end. But at least, we got it working for query size > 300 but < 1000, by adding POST.

@sckott Any ideas?

@lyttonhao
Copy link
Contributor Author

@panks I'm also doubt that maybe end-back codes have some bugs. Since I don't use R before, I haven't test taxize. @sckott Does it work well on taxize?

sckott added a commit that referenced this issue Mar 30, 2015
Support multiple names for gnr_resolve(), Add POST. Fixes #12. Resolution from TNRS Fixes #8.
@sckott
Copy link
Owner

sckott commented Mar 31, 2015

@lyttonhao @panks I'll take a look at this

@panks
Copy link
Contributor

panks commented Mar 31, 2015

If there isn't any hope of it working, then one thing we can do is split lists of size > 1000 into smaller chunks and concatenate their results.

@sckott
Copy link
Owner

sckott commented Mar 31, 2015

@panks @lyttonhao I just played with this now in R, and it seems that when number of names > 1000 the job seems to never finish. I am asking about this now - we should probably not pass more than 1000 names, so jus break up into chunks of < 1000 and pass those.

@sckott
Copy link
Owner

sckott commented Mar 31, 2015

see GlobalNamesArchitecture/gni#37

@panks
Copy link
Contributor

panks commented Mar 31, 2015

Yeah I guess splitting the list is the best way to go as of now. I will do that and send a PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants