Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

absolute_url() behind a proxy assembles incorrect http://127.0.0.1:8001/ URLs #1387

Closed
simonw opened this issue Jul 2, 2021 · 8 comments
Closed
Labels

Comments

@simonw
Copy link
Owner

simonw commented Jul 2, 2021

Reported in the wild on https://ilsweb.cincinnatilibrary.org/collection-analysis/current_collection-3d4a4b7/bib?_facet=bib_level_callnumber - the "next page" link links to https://127.0.0.1:8010/collection-analysis/current_collection-3d4a4b7/bib?_facet=bib_level_callnumber&_next=100

That installation uses "base_url": "/collection-analysis/"

Weirdly all of the other links on that page - to facet results, sort orders, row permalinks etc - work fine. It's JUST the next_url one that is broken.

Also broken in their JSON: https://ilsweb.cincinnatilibrary.org/collection-analysis/current_collection-3d4a4b7/bib.json?_size=1 returns

  "suggested_facets": [],
  "next": "1",
  "next_url": "https://127.0.0.1:8010/collection-analysis/current_collection-3d4a4b7/bib.json?_size=1&_next=1",
  "private": false,
@simonw simonw added the bug label Jul 2, 2021
@simonw
Copy link
Owner Author

simonw commented Jul 2, 2021

What's weird here is that the URL itself is correct - it starts with /collection-analysis/ as expected - but the hostname is wrong.

@simonw
Copy link
Owner Author

simonw commented Jul 2, 2021

Here's what's happening:

next_url = self.ds.absolute_url(
request, path_with_replaced_args(request, added_args)
)

This is being run through absolute_url() - defined here:

datasette/datasette/app.py

Lines 633 to 637 in d23a267

def absolute_url(self, request, path):
url = urllib.parse.urljoin(request.url, path)
if url.startswith("http://") and self.setting("force_https_urls"):
url = "https://" + url[len("http://") :]
return url

That's because the next_url in the JSON needs to be a full URL that a client can retrieve - as opposed to the other links on that page which are all relative links that start with /:

{% if column.name == sort %}
<a href="{{ path_with_replaced_args(request, {'_sort_desc': column.name, '_sort': None, '_next': None}) }}" rel="nofollow">{{ column.name }}&nbsp;▼</a>
{% else %}
<a href="{{ path_with_replaced_args(request, {'_sort': column.name, '_sort_desc': None, '_next': None}) }}" rel="nofollow">{{ column.name }}{% if column.name == sort_desc %}&nbsp;▲{% endif %}</a>
{% endif %}

@simonw
Copy link
Owner Author

simonw commented Jul 2, 2021

And the links to apply a facet value are broken too! https://ilsweb.cincinnatilibrary.org/collection-analysis/current_collection-3d4a4b7/bib?_facet=bib_level_callnumber

        {
          "value": "g l fiction",
          "label": "g l fiction",
          "count": 212,
          "toggle_url": "https://127.0.0.1:8010/collection-analysis/current_collection-3d4a4b7/bib.json?_facet=bib_level_callnumber&bib_level_callnumber=g+l+fiction",
          "selected": false
        }

Same problem:

facet_results_values.append(
{
"value": row["value"],
"label": expanded.get((column, row["value"]), row["value"]),
"count": row["count"],
"toggle_url": self.ds.absolute_url(
self.request, toggle_path
),
"selected": selected,
}
)

@simonw simonw changed the title Bug with combination of next_url and base_url absolute_url() behind a proxy assembles incorrect http://127.0.0.1:8001/ URLs Jul 2, 2021
@simonw
Copy link
Owner Author

simonw commented Jul 2, 2021

In this case the proxy is Apache. So there are a couple of potential fixes:

  • Configure Apache to pass the original HTTP request Host: header through to the proxied application. This should then be documented.
  • Add a new optional feature to Datasette called something like base_host which, if set, is always used in place of the host in request.url when constructing new URLs.

@simonw
Copy link
Owner Author

simonw commented Jul 2, 2021

ProxyPreserveHost On is the Apache setting - it defaults to Off: https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypreservehost

@simonw
Copy link
Owner Author

simonw commented Jul 2, 2021

I'm going to add this to the suggested Apache configuration at https://docs.datasette.io/en/stable/deploying.html#apache-proxy-configuration

@simonw simonw closed this as completed in dbc61a1 Jul 2, 2021
@simonw
Copy link
Owner Author

simonw commented Jul 2, 2021

Updated documentation is here: https://docs.datasette.io/en/latest/deploying.html#apache-proxy-configuration

@rayvoelker
Copy link

Thanks Simon for nailing that one down! It does seem a little confusing that the ProxyPreservehost option is set to Off By default, but this config totally did the trick and fixed the issue

<Location /collection-analysis/>
   ProxyPass http://127.0.0.1:8010/collection-analysis/
   ProxyPreservehost On
</Location>

sthagen added a commit to sthagen/simonw-datasette that referenced this issue Jul 9, 2021
Documented ProxyPreserveHost On for Apache, closes simonw#1387
simonw added a commit that referenced this issue Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants