Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to depend on requests library #129

Closed
peterbe opened this issue Jun 30, 2015 · 19 comments
Closed

Switch to depend on requests library #129

peterbe opened this issue Jun 30, 2015 · 19 comments

Comments

@peterbe
Copy link
Owner

peterbe commented Jun 30, 2015

I have had more than one problem with the built-in urlopen() to download certain URLs.
This one for example fails on some SSL trouble in urlopen but opens nicely with requests from my OSX laptop.

I thought Python 3 was supposed to have requests built in so maybe this is only necessary for Python 2.

@peterbe
Copy link
Owner Author

peterbe commented Jun 30, 2015

What do you think @graingert @elidickinson @ionelmc ?

@graingert
Copy link
Contributor

python3 doesn't have requests built in.

@peterbe
Copy link
Owner Author

peterbe commented Jun 30, 2015

python3 doesn't have requests built in.

So that was just a rumor I heard then. Well, it makes it easier to just depend on the new external dependency.

@graingert
Copy link
Contributor

I think you should go for something like http://weasyprint.org/docs/tutorial/#url-fetchers

@peterbe
Copy link
Owner Author

peterbe commented Jun 30, 2015

Interesting but I'm not sure I follow. How is that better/different than requests?

@graingert
Copy link
Contributor

Their default implementation looks pretty robust:

https://github.com/Kozea/WeasyPrint/blob/2d7f68e8a17821374d3974ea987710eecae586ad/weasyprint/urls.py#L241

And users of this library can override it as they see fit. Eg for my site I checked for

def my_fetcher(url):
    if url.stats_with('/static/'):
        return django_static_file_finder(urlparse(url).path)
    else:
        return weasyprint.default_url_fetcher(url)

weasyprint.HTML(doc, url_fetcher=my_fetcher)

@graingert
Copy link
Contributor

You could also create a requests implementation:

try:
    import requests
except ImportError:
    def _default_url_fetcher(url):
        ...
else:
    def _default_url_fetcher(url):
        ...
        fileobj = session.get(url, stream=True).raw
        ...


# For autocomplete and static analysis etc
def default_url_fetcher(url):
    return _default_url_fetcher(url)


def premailer(doc, url_fetcher=default_url_fetcher):
    ...

@graingert
Copy link
Contributor

I'd also add an argument in for a HTTP 'Accept' header so for:

 <link rel="stylesheet" type="text/css" href="https://example.com/theme.css">

You'd call:

default_url_fetcher(url='https://example.com/theme.css', mimetype='text/css')

@ionelmc
Copy link
Contributor

ionelmc commented Jul 1, 2015

I think it's fine having some default_url_fetcher abstraction in case someone wants to use something else than requests.

@peterbe
Copy link
Owner Author

peterbe commented Jul 1, 2015

But why do you need all of that? What's wrong with simply doing requests.get('https://example.com/theme.css').text. That'll get the CSS payload as a unicode string. And if the response is gzipped, requests will take care of that.

@graingert
Copy link
Contributor

Accept header is nothing to do with gzip. The servers can return different documents depending on the Accept header

@peterbe
Copy link
Owner Author

peterbe commented Jul 1, 2015

But is that likely? Have you had problems with doing urlopen('some.com/url.css') and gotten the wrong content because of headers? Note, in premailer we only use urlopen to download external stylesheets.

(Sorry if I sound like I'm "fighting". I'm just trying to learn. And keep things simple :) at the same time)

@graingert
Copy link
Contributor

It's just part of the http spec. I've seen some sites render their entire page off of '/' using the accept header

@peterbe
Copy link
Owner Author

peterbe commented Jul 1, 2015

Haha! Well, they're nuts! :)

@graingert
Copy link
Contributor

What was the consensus on this?

@peterbe
Copy link
Owner Author

peterbe commented Jul 22, 2015

My view is that anything overly complex is going to mean more code to maintain. A switch to relying on requests is straightforward and it's an established and good package. It just swaps out the urllib.urlopen for a different opener and that's about it. Doesn't try to solve other problems as well.

@alexphelps
Copy link

I just ran into this bug today. This is the error when using an SSL url for the css file. The only fix I could find was to not use ssl on the url which makes configuring my app harder because all other pages, files, etc need ssl.

URLError: <urlopen error [Errno 1] _ssl.c:510: error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error>

@peterbe
Copy link
Owner Author

peterbe commented Sep 29, 2015

@alexphelps I bet if you take that same URL (the https one) and try to do requests.get(that_url) you'll find that it works.
@graingert is working on a fancy wrapper API on top of requests.get but if he doesn't have time, which is fine, we'll just replace urllib with requests.

@p12tic
Copy link
Contributor

p12tic commented Oct 4, 2018

@peterbe: I think this issue can be closed.

@peterbe peterbe closed this as completed Oct 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants