Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote ref parsing #61

Closed
gazpachoking opened this issue Feb 15, 2013 · 19 comments
Closed

remote ref parsing #61

gazpachoking opened this issue Feb 15, 2013 · 19 comments
Labels
Bug Something doesn't work the way it should.

Comments

@gazpachoking
Copy link
Contributor

There is currently a problem with the remote ref parsing (at least on python 3.) urlopen returns bytes, which json.load doesn't like; he wants a string.

@gazpachoking
Copy link
Contributor Author

The unit test doesn't have a problem, because the mock urlopen returns a string.

@gazpachoking
Copy link
Contributor Author

I think the correct method would be to check headers for an encoding first. If it is not specified there, the json spec says that the encoding will either be utf-8 16 or 32. It's tempting to just require requests for remote ref support, here are the relevant parts from their project. getting encoding from headers, guessing utf version

@Julian
Copy link
Member

Julian commented Feb 17, 2013

Ouch :/. OK. I'm fine with adding a dependency on requests.

We will need to check that requests supports all the reasonable URI schemes that urlopen does.

@gazpachoking
Copy link
Contributor Author

requests does not support file or ftp schemes. :/ Is that something we want to work as well?

@Julian
Copy link
Member

Julian commented Feb 17, 2013

Um. I guess having a way to add handlers for schemes like I originally envisioned would be nice if it doesn't..

@gazpachoking
Copy link
Contributor Author

What were you envisioning? Requests does have support for adding handlers for different schemas, maybe we could implement your idea using that.

@gazpachoking
Copy link
Contributor Author

In fact, in FlexGet, we handle transparent file: and ftp: schema support by just farming it out to urllib from requests.
EDIT: relevant code https://github.com/Flexget/Flexget/blob/master/flexget/utils/requests.py#L76 (FileAdapter is just above)
We aren't actually doing ftp, but adding that as a schema is just one line extra if urllib does that as well. We'd end up adding extra code anyway for that stuff, so it's also not optimal, it just becomes a question of which extra code is more desirable to include.

@Julian
Copy link
Member

Julian commented Feb 17, 2013

All I meant was that it'd be nice to do things like:

resolver = RefResolver(handlers={"ftp" : ftp_retriever})

and have that resolver's resolve_remote now call ftp_retriever to retrieve the bytes it needs. I don't know if I like the way that looks (having everything routed through requests). So maybe we should just borrow / write the encoding-guessing stuff.

@gazpachoking
Copy link
Contributor Author

Sounds good, I'll put something together. Do you think handlers should be defined on the instance or for the whole class?

@gazpachoking
Copy link
Contributor Author

Also, what should happen when a refresolver encounters a scheme it cannot handle? Raise a SchemaError?

@Julian
Copy link
Member

Julian commented Feb 17, 2013

No preference. Probably will be easier on the instance though.

@Julian
Copy link
Member

Julian commented Feb 17, 2013

A RefResolutionError

@gazpachoking
Copy link
Contributor Author

Oh shit, didn't even see that one, much better.

@gazpachoking
Copy link
Contributor Author

Right now I have it in the same style as FormatChecker, i.e.

try:
    import requests
    # We need requests >= 1.0.0
    if requests.__version__.split(".") < ["1"]:
        raise ImportError
except ImportError:
    pass
else:
    @RefResolver.handles("http")
    @RefResolver.handles("https")
    def handle_http_ref(uri):
        """
        Retrieves and parses json from ``uri``.
        Only handles http and https uri scheme.

        :argument str uri: the URI to resolve
        :returns: the retrieved document

        """

        return requests.get(uri).json()

That sound okay?

@Julian
Copy link
Member

Julian commented Feb 17, 2013

I feel uncomfortable depending on requests for this and doing absolutely nothing without it.

Can we optionally depend on requests to automagically detect encoding, otherwise fall back to urllib and just assume UTF-8, and of course allow someone to replace urllib? So essentially this but rather than doing nothing without requests, try as best we can.

Also I'm not sure we need such a heavy handed interface for this one. Creating new formats seemed like a common enough thing to want, so I went with the decorators, but here it's probably enough to just go with passing a dict to __init__ dontcha think?

@Julian
Copy link
Member

Julian commented Feb 17, 2013

Oh and btw I forget if I've mentioned that we have some docs now :) So it'd be great if we doc'ed this when we settle on something.

@gazpachoking
Copy link
Contributor Author

Sounds good on both counts, I'll fix that up then send a PR for you to check out.

@Julian
Copy link
Member

Julian commented Feb 17, 2013

Oh and -- for the ones requests / urllib support, it's probably unnecessary to specify them explicitly for each scheme I guess. Just fall back to requests/urllib if nothing matches the scheme, and allow people to add additional ones or superceed them by being explicit. Sound reasonable?

@gazpachoking
Copy link
Contributor Author

Sounds good to me.

gazpachoking added a commit to gazpachoking/jsonschema that referenced this issue Feb 17, 2013
Add ability to specify handlers for different uri schemes in RefResolver
Add unit tests for new features
@Julian Julian closed this as completed Feb 18, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something doesn't work the way it should.
Projects
None yet
Development

No branches or pull requests

2 participants