Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CGI Variable "PATH_INFO" incorrectly removing runs of slash characters causing errors in CGI scripts #69

Closed
acidus99 opened this issue Nov 30, 2022 · 4 comments

Comments

@acidus99
Copy link

I found this issue while passing Base64 encoded data to a CGI script via the PATH_INFO variable.

The value of the PATH_INFO seems to be collapsing multiple / characters next to each other in a single / character. You can see the problem in the screen shot below:

image

This is just a simple CGI script which echos out the GEMINI_URL and PATH_INFO environment variables that are passed to the CGI.

#!/bin/bash
echo "20 text/plain"
echo "GEMINI_URL: \"$GEMINI_URL\""
echo "PATH_INFO: \"$PATH_INFO\""

You can see the URL contains 3 / characters between the words of the message "Hi///There///Mozz" However the value of PATH_INFO has collapsed those slashes into the message Hi/There/Mozz which is incorrect.

This is causing my CGI script to fail, since I'm passing Base64 data via PATH_INFO. / is a valid Base64 character, and you can have 2 slashes next to each other. Since runs of slashes are collapsed into a single slash, it's altering the Base64 data 🙀

I believe this is an issue with Jetforce, and not my Gemini client, since the GEMINI_URL variable shows the 3 slashes are being sent to JetForce and even are being passed to the CGI script properly. The bug appears to be in how PATH_INFO is constructed from the URL.

@michael-lazar
Copy link
Owner

michael-lazar commented Nov 30, 2022

Thanks for the report, I can also reproduce on my server:

gemini://mozz.us/cgi-bin/debug.py/Hi%2F%2F%2FThere%2F%2F%2FMozz

I'm open to changing behavior if there is a clear and correct solution, but I think this one is complicated...

The slashes are being merged because jetforce is normalizing the path. i.e. collapsing double slashes and . characters. This needs to happen in order for the server to split the SCRIPT_NAME from the PATH_INFO by traversing the file system.

It looks like the jury is still out on whether this is the correct behavior or not.

This is exactly the same procedure that jetforce uses:

To determine PATH_INFO, lighttpd url-decodes the URI, and then normalizes a path to the filesystem. "." and ".." URL path segments are resolved in the virtual path, and multiple consecutive slashes (e.g. "////") are reduced to a single slash. This normalized virtual URL path is used in config conditional matching so that conditions are applied consistently. This path is then tested against the filesystem and the longest existing path is used as the request target, with the remainder of the path treated as PATH_INFO. Since lighttpd has no way in advance to know what is part of the PATH_INFO, the entire path is normalized. (The patch you suggested above is not recommended since it might allow a malicious URL crafted with %2F earlier in the URL path to potentially cause config conditional match to fail to match a condition when it should match that condition (false-negative).)

Others have suggested adding non-standard RAW_PATH_INFO variable: https://bulknews.typepad.com/blog/2009/09/path_info-decoding-horrors.html

And finally, this is a copout, but the easiest solution for your use-case is probably switching to a URL-safe version of base64 that uses _ instead of / characters. E.g. https://docs.python.org/3/library/base64.html#base64.urlsafe_b64encode

@acidus99
Copy link
Author

acidus99 commented Dec 1, 2022

That makes sense. Thanks for pointing out the URL-safe Base64 functions. I will move to those, since that's probably a more robust solution.

However, I still think you can support this, because presumably you have to have special logic to handle CGI and determine the SCRIPT_NAME from the PATH_INFO
For example:

Incoming URL: 
gemini://mozz.us/cgi-bin/debug.py/Hi%2F%2F%2FThere%2F%2F%2FMozz?foo=bar

Normalized path used to check the file system:
/cgi-bin/debug.py/Hi/There/Mozz

That normalized path isn't going to exactly match anything in the file system, because debug.py isn't a directory and there are no subdirectories Hi/There/Mozz under it. Jetforce has to have logic that figures out that /cgi-bin/debug.py is the CGI script that needs to be run, and that it's also the value of the SCRIPT_NAME environment variable.

But once you have determined the CGI file to execute, there is nothing that is requiring you to use the normalized path to also construct the PATH_INFO variable. You could instead go back to the original URL, look at its absolute path, and use everything after the SCRIPT_NAME as the PATH_INFO (URL decoded per the CGI spec as well)

Incoming URL: 
gemini://mozz.us/cgi-bin/debug.py/Hi%2F%2F%2FThere%2F%2F%2FMozz?foo=bar

Absolute Path of the URL:
/cgi-bin/debug.py/Hi%2F%2F%2FThere%2F%2F%2FMozz

SCRIPT_INFO:
/cgi-bin/debug.py

PATH_INFO:
/Hi///There///Mozz

I have a workaround in my code that is very similar to this:

// this is hacky just to make it cleaner. The real version looks to see that script name appears at the start of the URL path, and only removes it there
var path_Info = UrlDecode(cgi.RequestUrl.AbsolutePath.Replace(cgi.ScriptName, ""));

I'm going to adopt the URL-safe Base64 functions, but still something to consider: Once you know you are dealing with a CGI path and have determined the script file to run, there isn't a reason to do path normalization on the rest of the path that becomes the PATH_INFO value.

(Also, thanks for JetForce. ❤️ I've been using it for a year and its been rock solid)

@michael-lazar
Copy link
Owner

The trouble is that normalizing the path can also affect the script name. Take this example

gemini://mozz.us/cgi-bin/debug.py/%2E%2E/ah%20ha%2Epy%2F%2Fextra%20stuff

decoded, turns into

gemini://mozz.us/cgi-bin/debug.py/../ah ha.py//extra stuff

then normalized

gemini://mozz.us/cgi-bin/ah ha.py/extra stuff

which ends up pointing to an entirely different CGI script. You would get

SCRIPT_NAME=gemini://mozz.us/cgi-bin/ah ha.py
PATH_INFO=/extra stuff

but now the non-normalized URL doesn't start with the script name, so you can't plug it back into step 2 to get //extra stuff.

@acidus99
Copy link
Author

acidus99 commented Dec 1, 2022

ahh. good point. Moved to URL-safe Base64 functions anyway. Thanks

@acidus99 acidus99 closed this as completed Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants