Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let OJS honor HTTP_X_FORWARDED_PROTO instead of trying to discover it #6851

Open
marcbria opened this issue Mar 10, 2021 · 13 comments
Open

Let OJS honor HTTP_X_FORWARDED_PROTO instead of trying to discover it #6851

marcbria opened this issue Mar 10, 2021 · 13 comments
Labels
Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day.

Comments

@marcbria
Copy link
Collaborator

marcbria commented Mar 10, 2021

Describe the problem you would like to solve

Over httpS, when you are behind a reverse proxy/loadbalancer or on complex networks architectures OJS have some trouble to discover the right protocol and it could generate some different issues like:

  • Infinite redirection loops (ERR_TOO_MANY_REDIRECTS)
  • Mixed content (http & httpS).

If httpS in forced in function getProtcolol() at "lib/pkp/classes/core/PKPRequest.inc.php" all problems are gone...

Describe the solution you'd like

This is supossed to be fixed with protocol relative URLs (allowProtocolRelative directive) but this is not yet full implemented in OJS... and (according to Alec) "We can't just enable protocol-relative URLs globally because of URLs that are delivered to outside mechanisms, e.g. API interactions, emails, etc."

So an alternative (and also to make OJS more flexible) the easiest solution is letting OJS honor the protocol specified at HTTP_X_FORWARDED_PROTO (if exists) instead of keep asking about the protocol to SCRIPT_URI.

It could be done here:

$this->_protocol = (!isset($_SERVER['HTTPS']) || strtolower_codesafe($_SERVER['HTTPS']) != 'on') ? 'http' : 'https';

Adding a condition to check if "HTTP_X_FORWARDED_PROTO" exists, and if yes, set the protocol to be whatever you find in the variable.

What I tried?

In my specific case, the problem was a reverse proxy (traefik) converting http petitions to httpS, so I tried with:

  • base_url with http.
  • base_url with httpS.
  • base_url with relative protocol.
  • setting base_url[index] and/or [myJournal].
  • setting base_url to relative protocol.
  • setting force_ssl to On.

All cases except the last one, results in "Mixed content" due JS references.

Last one (force_ssl) falls in an "infinite redirection loop" because "OJS sees an HTTP request (not knowing it’s coming from the reverse-proxy) and redirects the browser to the HTTPS location. So the browser makes a new request (using HTTPS as before) to the reverse-proxy resulting in a new HTTP request to the OJS backend which redirects to HTTPS again. And so on and so forth."

When protcolo is forced to httpS in getProtocol() ("lib/pkp/classes/core/PKPRequest.inc.php") problems are gone...

Who is asking for this feature?

  • System administrators with a complex architecture.
  • People installing their OJS in some hosting companies.
  • Docker based architectures...

Additional information

Variations of this same issue are described here:

And also in the following github issues:

Example of mixed content:
imagen

@asmecher
Copy link
Member

Note that any solution that relies on request headers to determine the protocol will need to consider URL generation from command-line tools e.g. when a CRON job sends off review reminder emails containing URLs.

@marcbria
Copy link
Collaborator Author

marcbria commented Mar 10, 2021

Hummm... didn't though in this. :_-(

I'm unsure if getProtocol() will be able to get the right one based on SCRIPT_URI, but in worst of cases cron will send a mail with an http petition that will be redirected to httpS when it arrives to the server.

But I understand cron is just an example and it could be a problem in other situations.

Jumping to a different approach... why "force_ssl" is not forcing the protocol in getProtocol()?

@asmecher
Copy link
Member

Jumping to a different approach... why "force_ssl" is not forcing the protocol in getProtocol()?

The force_ssl setting is checked in an authorization policy written in PKPHandler:

https://github.com/pkp/pkp-lib/blob/stable-3_3_0/classes/handler/PKPHandler.inc.php#L275..L279

If force_ssl is set, and PKPRequest::getProtocol() doesn't report https protocol (based on $_SERVER['HTTPS']), the authorization policy recommends redirecting to https.

@marcbria
Copy link
Collaborator Author

I really thought this was going to be an easy one. :-(

Tomorrow I will take a look but because I suspect command-line tools will take the protocol properly even without headers, or we won't mind much if they are going through http...

What other features of ojs are not reading headers apart of command-line tools?

Trying to evaluate if a a non perfect implementation would be better than nothing.

@marcbria
Copy link
Collaborator Author

marcbria commented Mar 11, 2021

Checking what other CMS are doing:

In WP they are aware of the issue and offer a couple of workarounds:

image

  • Their "FORCE_SSL_ADMIN" is like our "force_ssl"... to check authorization, not protocol.
  • They don't implement "HTTP_X_FORWARDED_PROTO".

First workaround is adding a condition to their wpconfig.php to check the X_FORWARDED_PROTO.
Second solution is use relative protocol (that is full implemented in WP).

References:

In the other hand, in drupal they full honor X-FORWARDED* tags since long time ago:

https://www.drupal.org/project/drupal/issues/313145

They use config variables to full cover those tags.

Said that...

We can not apply WP solutions directly, because we can't add a condition to our config.inc.php, but (even is not nice because it means branching your code) we can force it at the top of index.php.

From Drupal, I know we are trying to reduce variables in config.inc.php but we crossed that river when we added "trust_x_forwarded_for", isn't it?

I mean, why implementing "X-Forwarded-For" but not implementing the other three X_FORWARDED main tags too?

Or why not a new "force_protocol" config variable that will work with command-line tools (and any piece of code that is not reading headers)?

Not sure if I'm biased, but as far as infrastructures are becoming more complex, I think this is not a minor issue.
I mean, loadbalancers and reverse proxies are out there, in the middle, even final users don't usually notice:
https://blog.matrixpost.net/deploy-wordpress-in-azure-app-service-with-staging-slots-for-the-production-and-development-environment

@marcbria
Copy link
Collaborator Author

This post let me understand part of the problem:
#4919 (comment)

@NateWr NateWr added the Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day. label Mar 18, 2021
@nunesvictor
Copy link

nunesvictor commented Feb 23, 2023

I'm struggling with the exact same problem. My installation is behind a proxy who handles SSL and then passes the request to a Traefik server that points to a correct docker container. Traefik was always retuning a 404 even when ojs service was up and running, later I figured it had something to do with trust_x_forwarded_for (I guess because traefik uses the forwarded host from the proxy to route the request to it's destination).

So basically the thing are like this: if I set trust_x_forwarded_for = Off nothing works at all because I'm behind a proxy. With trust_x_forwarded_for = On I keep getting a blocked: mixed-content error. And finally when I do set force_ssl = On I get too many redirects (which I guess it's happening because the proxy is the one who handles SSL)

I don't get why the request to static assets are being made over HTTP and not HTTPS. I tought base_url was the issue but it also points to a HTTPS url.

Tomorrow I'll try again with SSL disabled both on the proxy and ojs server and see if it works, regardless that this isn't the ideal thing to do.

@marcbria
Copy link
Collaborator Author

marcbria commented Feb 24, 2023

Just in case something is useful for your case... I will publish my full configuration.

Let me say that I'm not proud (because it's a dirty solution) but it works.

The trick (for me), apart of the proper variables in config.inc.php is overwriting index.php to force the contexts.
This technique is stolen from wordpress, that offers this option "out-of-the-box" with a config variable.

Notice I use single-tenant OJS, so it won't probably work in multi-tenant journal.

This is my index.php I use when is a directory. Example: https://foo.org/journalContext
Notice this combination is tricky, because OJS is thought to be multi-tenant and home will be under https://foo.org/journalContext/index...

$journalPath='journalContext';

// $_SERVER['REQUEST_URI'] = '/'. $journalPath . $_SERVER['REQUEST_URI'];
$_SERVER['SCRIPT_NAME'] = '/'. $journalPath . $_SERVER['SCRIPT_NAME'];
$_SERVER['PHP_SELF'] = '/'. $journalPath . $_SERVER['PHP_SELF'];

// Initialize global environment
define('INDEX_FILE_LOCATION', __FILE__);
$application = require('./lib/pkp/includes/bootstrap.inc.php');

// Serve the request
$application->execute();

Overwriting "REQUEST_URI" caused more trouble than help so this is why it's commented.

Said that, relevant variables in config.inc.php are:

base_url[index] = https://revistes.uab.cat/journalContext/index
base_url[journalContext] = https://revistes.uab.cat/journalContext
trust_x_forwarded_for = On
force_ssl = Off
restful_urls = On

Not proud either about my apache virtual host (could be simplified and fails with some site-admin pages), but does the job:

LoadModule slotmem_shm_module modules/mod_slotmem_shm.so
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule expires_module modules/mod_expires.so

<VirtualHost *:80>
        ServerName foo.org

        DocumentRoot /var/www/html
        DirectoryIndex index.php index.html

        RewriteEngine On
        AcceptPathInfo On

        SetEnv HTTPS On

        <Directory /var/www/html>

                # Redirect root to index
                RewriteRule ^/?$ index.php/journalContext/index [L=307]
                RewriteRule ^journalContext$ index.php/journalContext/index [L=307]
                # Bloc admin pages with DB info:
                ## RewriteRule ^/?journalContext/index/admin/systemInfo/?$ /journalContext/index [L]

                ## Force httpS (basically to reach the API):
                RewriteCond %{HTTPS}  !=on
                RewriteCond %{REQUEST_URI} ^/journalContext\/api\/v1\/
                RewriteCond %{REQUEST_FILENAME} !-f
                RewriteRule ^/?(.*) https://%{SERVER_NAME}/journalContext/index.php/$1 [L,R=307]

                # OJS expects to be at root, so removing journal's subdomain:
                RewriteCond %{REQUEST_URI} ^/journalContext
                RewriteCond %{REQUEST_FILENAME} !-d
                RewriteCond %{REQUEST_FILENAME} !-f
                RewriteRule ^journalContext/(.*)$ $1 [QSA,L]

                ## Rewrite API calls if not including subdomain journals:
                RewriteCond %{REQUEST_URI} api\/v1\/
                RewriteCond %{REQUEST_FILENAME} !-f
                RewriteRule ^(.*)$ journalContext/index.php/journalContext/$1 [L,R=307]

                # Renames the short journal's rule:
                RewriteCond %{REQUEST_URI} !^index.php\/journalContext
                RewriteCond %{REQUEST_FILENAME} !-d
                RewriteCond %{REQUEST_FILENAME} !-f
                RewriteRule ^(.*)$ index.php/journalContext/$1 [QSA,L]

                # This removes index.php from the url (apache 2.2 and new ones)
                RewriteCond %{REQUEST_URI} !^/journalContext/index.php
                RewriteCond %{REQUEST_FILENAME} !-d
                RewriteCond %{REQUEST_FILENAME} !-f
                RewriteRule ^(.*)$ index.php/$1 [QSA,L]

        </Directory>

        # LogLevel alert rewrite:trace2

        ErrorLog  /var/log/apache2/error.log
        CustomLog  /var/log/apache2/access.log combined

@nunesvictor
Copy link

@marcbria I’ll try your workaround in a different branch and build a new docker imagem from it but I’ll say I totally agree with you that this doesn’t seems like a minor issue. In my specific case, our infrastructure doesn’t support alternatives to it. All of our systems are behind a proxy and docker based.

We already have a journal system and we’re trying to migrate to OJS, but I don’t know if without I’m being able to handle this problem the project manager will let this fly.

I’ll try this solution and will keep you posted. Thanks in advance

@marcbria
Copy link
Collaborator Author

I started the tests to move to OJS 3.3 and I'm hit again with this issue.
Does anybody knows if have been any change in API calls since OJS 3.2?

To recall (so long since I deal with it) this happens when OJS ask for an API resource with httpS, because behind the proxy OJS is still running with http and this generates a "mixed content" issue in frontend.

Right now, my error is this

/workflow/index/59/5#publication:1 Mixed Content: The page at 'https://foo.org/workflow/index/59/5#publication' was loaded over HTTPS, but requested an insecure XMLHttpRequest endpoint 'http://foo.org/var/www/html/index.php/myjournal/api/v1/submissions/59/publications/59'. This request has been blocked; the content must be served over HTTPS.

Wired thing is now, when I force "https" in getProtocol() funcion (or even in getBaseUrl), OJS keep asking for "http".

@nunesvictor did you manage to find a solution?
I think @AndrewGearhart was also trying to deal with this.

I'm quite sure it could be adressed with extra mod_rewrite black magic, a little bit of ojs tunning (like the trick in index.php to force the journalContext) combined with ojs settings in config.inc.php... but at same time I feel we are overcomplicating the system configuration and I feel that would be nice to make OJS a little bit more friendly with proxies when not all runs over httpS (that is not an inusual scenario).

@asmecher any idea about how could we simplify the this to force OJS to use "https" protocol all the time?

Thanks you all for your time,
m.

@nunesvictor
Copy link

hey @marcbria, I found a workaround for this, it's a little messy but for now is getting the job done.

I found this at the PKP forums, but I lost the link for the original post. Here we go, the protocol selection is made here:

$this->_protocol = (!isset($_SERVER['HTTPS']) || strtolower_codesafe($_SERVER['HTTPS']) != 'on') ? 'http' : 'https';

All I did was, while the apps was running, use docker cp to make a copy of this file to a local folder, outside the container. Then, I just enforce 'https' on both sides of the ternary expression, leaving no chance but to use https, regardless of the 'autodiscovered' protocol. Doing that, I was able to mount this file on the mapping volumes of my docker-compose.yml file and that's it.

Like I said, that's a messy way to do that but, for now, is working fine. I'll keep you posted if I found some new issues related to that. It would be nice see this done by simply setting an env variable or by respecting the forwarded protocol, until there I think I'll have to do that.

Hope that helps.

PS.

I'm building a Docker image from scratch and I'm using OJS 3.3.0-X tarball as source, not git. So, the related file was found under lib/pkp/classes/core/PKPRequest.inc.php

@marcbria
Copy link
Collaborator Author

Thank you for taking the time to write with your feedback Victor.
Did you confirmed this patch works equally well when calling the API or when running from the command line?

I'm also patching my dockerized OJS with volumes from docker-compose. It doesn't seem elegant to me but it works and at the moment I haven't thought of anything better. Maybe adding an entrypoint that overwrites (if the folders are maintained, a cp would do) everything that the "patch" directory finds?

Actually the problem is that the philosophy of containers clashes with the idea of patches, since containers are immutable... so if you have to patch, you have to generate a new image.

Keep in touch!

@marcbria marcbria changed the title Let OJS Read HTTP_X_FORWARDED_PROTO instead of trying to discover it Let OJS honor HTTP_X_FORWARDED_PROTO instead of trying to discover it Mar 8, 2024
@marcbria
Copy link
Collaborator Author

marcbria commented Mar 8, 2024

Not sure when this was implemented or if this was working since the very begining but adding a X-Forwarded-Proto directive in virtual host seams to do the work in some secenarios. (See #348)

SetEnvIf X-Forwarded-Proto "https" HTTPS=on

Does it means the other X-Forward tags are also implemented?

Funny part is...

  • When restful=On this seams to work fine in web's frontend and backend, but need to be deeply tested with API and cli-tool calls.
  • With restful=Off issue is still there and some urls are built with http instead of https.

I have been dealing with this bug for years, and let me say this is really frustrating because it's extremely difficult to degug.

You got caches everywhere (browser, proxy, OJS...) and, in general, apache rules are a nightmare.
At same time, if a redirection is set as permanent, nevermind if you modify it... the browser will keep redirecting.
Then you have the protocol (in my case ojs is running as http but proxy serves in httpS), the urls (inside or outside the containers) and the config.inc.php variables.
Finally, you have the code, that uses different libraries in frontend and backend or the API... and will get different referer if you run it from the web or as commandline.

I spend a lot of hours in a trial and error process and I still don't have a rock-solid set of mod_rewrite rules for each url (domain, subdomain or folder installations), for dirty and clean urls, or for both protocols.

@asmecher how could I help PKP to improve this? What do you need?
Actually it's the only blocker for anybody trying to run OJS in containers behind a proxy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day.
Projects
None yet
Development

No branches or pull requests

4 participants