Constructor is potentially dangerous #203

peterbe · 2019-05-15T15:53:25Z

We use PyQuery internally in our system to parse rendered HTML strings that we store in the database to be displayed in a Django web response. Essentially the code looks a bit like this:

def render_document(document):
    parsed = PyQuery(document.rendered_html_string)
    ...

But if untrusted users can cause that document.rendered_html_string to become a string that looks something like this:

http://internaldomain/api/get_users/dangerous

Then, it becomes...

def render_document(document):
    parsed = PyQuery('http://internaldomain/api/get_users/dangerous')
   ...

which will cause PyQuery to requests.get('http://internaldomain/api/get_users/dangerous') which could be a big security risk.

It happens because of the constructor being too "naive".

On our app, we solved that by making the PyQuery constructor wrapped in a piece of code that does something like this:

def safer_pyquery(*args, **kwargs):
    # SIMPLIFIED FOR EXAMPLE
    args[0] = ' ' + args[0]
    return PyQuery(*args, **kwargs)

Ideally, we'd love to be able to always invoke PyQuery like this:

def render_document(document):
    parsed = PyQuery(html=document.rendered_html_string)
    ...

In similar style to how you can do pq(url='http://google.com/') as mentioned in the Quickstart documentation.

The text was updated successfully, but these errors were encountered:

gawel · 2019-05-15T16:18:52Z

I have no problème with that. Feel free to provide a PR if you want.

Something like this should be enough:

if 'html' in kwargs:
  kwargs['data'] = kwargs.pop('html')
elif current mess:

jcushman · 2021-07-20T19:37:16Z

@gawel, what would you think of changing the default behavior of PyQuery to not look for a url, and only fetch url contents if the user explicitly provides pq(url='http://google.com/')?

This would require a new version number, because it's a breaking change, but I think it would be worth it for the security benefits. Most users calling pq(text) probably don't want it to make a web request if the text happens to start with 'http', and if users do want that it's easy enough to add url=.

If this sounds good to you I'm happy to send a PR!

gawel · 2021-08-05T15:07:43Z

@jcushman seems fine. go for it

jcushman mentioned this issue Aug 5, 2021

Require url inputs to be explicit #222

Merged

gawel closed this as completed in #222 Aug 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constructor is potentially dangerous #203

Constructor is potentially dangerous #203

peterbe commented May 15, 2019

gawel commented May 15, 2019

jcushman commented Jul 20, 2021

gawel commented Aug 5, 2021

Constructor is potentially dangerous #203

Constructor is potentially dangerous #203

Comments

peterbe commented May 15, 2019

gawel commented May 15, 2019

jcushman commented Jul 20, 2021

gawel commented Aug 5, 2021