Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections key not found with library #105

Closed
stav opened this issue Dec 20, 2018 · 5 comments
Closed

Collections key not found with library #105

stav opened this issue Dec 20, 2018 · 5 comments

Comments

@stav
Copy link
Contributor

stav commented Dec 20, 2018

I'm curious about the difference between Collection.get() and Collection.iter(key=[KEY])

>>> key = '456/789'
>>> store = project.collections.get_store('trump')
>>> store.set({'_key': key, 'value': 'abc'})
>>> print(store.list(key=[key]))

[{'value': 'abc', '_key': '456/789'}]  # https://storage.scrapinghub.com/collections/9328/s/trump?key=456%2F789&meta=_key

>>> try:
>>>     print(store.get(key))
>>> except scrapinghub.client.exceptions.NotFound as e:
>>>     print(getattr(e, 'http_error', e))

404 Client Error: Not Found for url: https://storage.scrapinghub.com/collections/9328/s/trump/456/789

I assume that Collection.get() is a handy shortcut for the key-filtered .iter() function so I guess the point of my issue is that .get() will raise an exception if given bad input, for example slashes

@vshlapakov
Copy link
Contributor

As you can see by the URLs, the functions work a bit differently, i.e. using different API endpoints.

In the first case, when using .list(), it calls scan logic based on the given parameters (an item key), and all works fine. In the second case, when using .get(), it tries to get a single item from the collection with a different endpoint, and it fails because there's a slash character in the middle of the key, while it works fine in other cases. The endpoint route excludes a slash character in the key name, to avoid messing with other endpoints routes.

However, I don't see there are any restrictions on collection item key format as of now, so the input is actually correct, and I'd say it's the endpoint who doesn't handle it properly. Another option is to enforce some rules on collection item keys, but that might affect existing collections, so it should be done very carefully. I'll create an internal ticket to discuss and handle that in the future, thanks for the report!

@stav
Copy link
Contributor Author

stav commented Dec 27, 2018

Good deal, thanks for the explanation

@stav stav closed this as completed Dec 27, 2018
@ejulio
Copy link
Contributor

ejulio commented Jan 21, 2019

I think we could add some docs here, or here https://doc.scrapinghub.com/api/collections.html#access-a-record about this subject. Last week it took me a few minutes to discover that everything was ok, but the endpoint did not allow for a / in the key name.

@stav
Copy link
Contributor Author

stav commented Feb 6, 2019

@ejulio
Copy link
Contributor

ejulio commented Feb 7, 2019

Given @stav PR above, I think we could work on a fix for .get() to always query using query string instead of pretty urls. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants