New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update render_markdown.py #2

Merged
merged 3 commits into from Sep 6, 2016

Conversation

Projects
None yet
2 participants
@Lisias
Contributor

Lisias commented Aug 30, 2016

Fixing the file parameter on the UCSGI parameters (allowing to fetch files from redirects on NGINX)

Update render_markdown.py
Fixing the file parameter on the UCSGI parameters (allowing to fetch files from redirects on NGINX)
@skurfer

This comment has been minimized.

Show comment
Hide comment
@skurfer

skurfer Aug 31, 2016

Owner

I haven’t tested under Nginx yet, but this change breaks the script under Apache.

Maybe it should look for PATH_INFO and, if not found, fall back to REQUEST_URI?

Owner

skurfer commented Aug 31, 2016

I haven’t tested under Nginx yet, but this change breaks the script under Apache.

Maybe it should look for PATH_INFO and, if not found, fall back to REQUEST_URI?

@Lisias

This comment has been minimized.

Show comment
Hide comment
@Lisias

Lisias Aug 31, 2016

Contributor

It's some time since I used Apache, I should be aware of this. Sorry.

Let me research the issue a bit more. If there's a standard followed by both services, it's better to use it.

Contributor

Lisias commented Aug 31, 2016

It's some time since I used Apache, I should be aware of this. Sorry.

Let me research the issue a bit more. If there's a standard followed by both services, it's better to use it.

@Lisias

This comment has been minimized.

Show comment
Hide comment
@Lisias

Lisias Aug 31, 2016

Contributor

INFO.

The code below is from the uwsgi_params that we include in the NGINX host configuration file when proxying a request by UWSGI,

uwsgi_param  QUERY_STRING       $query_string;
uwsgi_param  REQUEST_METHOD     $request_method;
uwsgi_param  CONTENT_TYPE       $content_type;
uwsgi_param  CONTENT_LENGTH     $content_length;

uwsgi_param  REQUEST_URI        $request_uri;
uwsgi_param  PATH_INFO          $document_uri;
uwsgi_param  DOCUMENT_ROOT      $document_root;
uwsgi_param  SERVER_PROTOCOL    $server_protocol;
uwsgi_param  REQUEST_SCHEME     $scheme;
uwsgi_param  HTTPS              $https if_not_empty;

uwsgi_param  REMOTE_ADDR        $remote_addr;
uwsgi_param  REMOTE_PORT        $remote_port;
uwsgi_param  SERVER_PORT        $server_port;
uwsgi_param  SERVER_NAME        $server_name;

From the NGINX documentation, I found:

$document_uri
same as $uri

and

$uri
current URI in request, normalized
The value of $uri may change during request processing, e.g. when doing internal redirects, or when using index files.

From the WSGI documentation, I found:

PATH_INFO
The remainder of the request URL’s “path”, designating the virtual “location” of the request’s target within the application. This may be an empty string, if the request URL targets the application root and does not have a trailing slash.

What's makes sense. NGINX feeds PATH_INFO with the URI after all processing is applied. Curiously, it's exactly my environment: I made a rewrite on the URI before throwing it into render_markdown.py and was this that triggered my problem (untouched URIs works fine).

At a first glance, it appears to be a problem on mod_wsgi . I will pursue this.

Contributor

Lisias commented Aug 31, 2016

INFO.

The code below is from the uwsgi_params that we include in the NGINX host configuration file when proxying a request by UWSGI,

uwsgi_param  QUERY_STRING       $query_string;
uwsgi_param  REQUEST_METHOD     $request_method;
uwsgi_param  CONTENT_TYPE       $content_type;
uwsgi_param  CONTENT_LENGTH     $content_length;

uwsgi_param  REQUEST_URI        $request_uri;
uwsgi_param  PATH_INFO          $document_uri;
uwsgi_param  DOCUMENT_ROOT      $document_root;
uwsgi_param  SERVER_PROTOCOL    $server_protocol;
uwsgi_param  REQUEST_SCHEME     $scheme;
uwsgi_param  HTTPS              $https if_not_empty;

uwsgi_param  REMOTE_ADDR        $remote_addr;
uwsgi_param  REMOTE_PORT        $remote_port;
uwsgi_param  SERVER_PORT        $server_port;
uwsgi_param  SERVER_NAME        $server_name;

From the NGINX documentation, I found:

$document_uri
same as $uri

and

$uri
current URI in request, normalized
The value of $uri may change during request processing, e.g. when doing internal redirects, or when using index files.

From the WSGI documentation, I found:

PATH_INFO
The remainder of the request URL’s “path”, designating the virtual “location” of the request’s target within the application. This may be an empty string, if the request URL targets the application root and does not have a trailing slash.

What's makes sense. NGINX feeds PATH_INFO with the URI after all processing is applied. Curiously, it's exactly my environment: I made a rewrite on the URI before throwing it into render_markdown.py and was this that triggered my problem (untouched URIs works fine).

At a first glance, it appears to be a problem on mod_wsgi . I will pursue this.

@Lisias

This comment has been minimized.

Show comment
Hide comment
@Lisias

Lisias Aug 31, 2016

Contributor

Oh, wait... Are you using mod_wsgi or mod_python?

Contributor

Lisias commented Aug 31, 2016

Oh, wait... Are you using mod_wsgi or mod_python?

@skurfer

This comment has been minimized.

Show comment
Hide comment
@skurfer

skurfer Sep 1, 2016

Owner

One server where I’m using it in production has mod_wsgi 4.3.0.

Owner

skurfer commented Sep 1, 2016

One server where I’m using it in production has mod_wsgi 4.3.0.

@Lisias

This comment has been minimized.

Show comment
Hide comment
@Lisias

Lisias Sep 3, 2016

Contributor

More on this.

On the mod_wsgi, I found that it fetches the PATH_INFO correctly:

    path_info = apr_table_get(r->subprocess_env, "PATH_INFO");

    if (*path_info == '/') {
        while (*path_info && (*(path_info+1) == '/'))
            path_info++;
        path_info = apr_pstrdup(r->pool, path_info);
        ap_no2slash((char*)path_info);
        apr_table_setn(r->subprocess_env, "PATH_INFO", path_info);
    }

source: https://github.com/GrahamDumpleton/mod_wsgi/blob/develop/src/server/mod_wsgi.c

mod_wsgi is reported to support PATH_INFO since circa 2010 (around version 3.1 or 3.2, I think - the Chagne History for the product omits dates, and it's cumbersome to check the clsoed issues dates). So I don't think that your server using mod_wsgi should be presenting this problem!

For the mod_python, curiously, I found handling for the PATH_INFO too:

    if (!r->path_info || !*r->path_info) {
        apr_table_setn(e, "SCRIPT_NAME", r->uri);
    }
    else {
        int path_info_start = ap_find_path_info(r->uri, r->path_info);

        apr_table_setn(e, "SCRIPT_NAME",
                      apr_pstrndup(r->pool, r->uri, path_info_start));

        apr_table_setn(e, "PATH_INFO", r->path_info);
    }

source: https://github.com/grisha/mod_python/blob/master/src/requestobject.c

This piece of code was commited in 2013 (search for it on the git's blame).

What's pinpoints the source of the problem to be the Apache2 itself! (or so it appears to me, I didn't give these code a full code review).


On the bottom line, your approach of checking both variables will work (obviously!). But I'm unrest about not knowing why it didn't worked for you.

Contributor

Lisias commented Sep 3, 2016

More on this.

On the mod_wsgi, I found that it fetches the PATH_INFO correctly:

    path_info = apr_table_get(r->subprocess_env, "PATH_INFO");

    if (*path_info == '/') {
        while (*path_info && (*(path_info+1) == '/'))
            path_info++;
        path_info = apr_pstrdup(r->pool, path_info);
        ap_no2slash((char*)path_info);
        apr_table_setn(r->subprocess_env, "PATH_INFO", path_info);
    }

source: https://github.com/GrahamDumpleton/mod_wsgi/blob/develop/src/server/mod_wsgi.c

mod_wsgi is reported to support PATH_INFO since circa 2010 (around version 3.1 or 3.2, I think - the Chagne History for the product omits dates, and it's cumbersome to check the clsoed issues dates). So I don't think that your server using mod_wsgi should be presenting this problem!

For the mod_python, curiously, I found handling for the PATH_INFO too:

    if (!r->path_info || !*r->path_info) {
        apr_table_setn(e, "SCRIPT_NAME", r->uri);
    }
    else {
        int path_info_start = ap_find_path_info(r->uri, r->path_info);

        apr_table_setn(e, "SCRIPT_NAME",
                      apr_pstrndup(r->pool, r->uri, path_info_start));

        apr_table_setn(e, "PATH_INFO", r->path_info);
    }

source: https://github.com/grisha/mod_python/blob/master/src/requestobject.c

This piece of code was commited in 2013 (search for it on the git's blame).

What's pinpoints the source of the problem to be the Apache2 itself! (or so it appears to me, I didn't give these code a full code review).


On the bottom line, your approach of checking both variables will work (obviously!). But I'm unrest about not knowing why it didn't worked for you.

Update render_markdown.py
Fixing (again) to cope with the original author's environment.

For a full discussion, see #2
@Lisias

This comment has been minimized.

Show comment
Hide comment
@Lisias

Lisias Sep 3, 2016

Contributor

The last commit reflects what I'm using right know on my server.

Contributor

Lisias commented Sep 3, 2016

The last commit reflects what I'm using right know on my server.

Update render_markdown.py
Improperly tested code is worst than untested code. Sorry.

Proper error handling now.
@skurfer

This comment has been minimized.

Show comment
Hide comment
@skurfer

skurfer Sep 6, 2016

Owner

I’d have gone with environ.get() for the case where the key might not exist, but that will work. I’ll try to add something similar to the PHP version.

Can you share your Nginx rewrite rules so I can add that to the documentation as well?

Thanks.

Owner

skurfer commented Sep 6, 2016

I’d have gone with environ.get() for the case where the key might not exist, but that will work. I’ll try to add something similar to the PHP version.

Can you share your Nginx rewrite rules so I can add that to the documentation as well?

Thanks.

@skurfer skurfer closed this Sep 6, 2016

@skurfer skurfer reopened this Sep 6, 2016

@skurfer skurfer merged commit 208b327 into skurfer:master Sep 6, 2016

@Lisias

This comment has been minimized.

Show comment
Hide comment
@Lisias

Lisias Sep 6, 2016

Contributor

Java habits are hard to get rid of. :-)

Any internal redirects (rewrite with break or last) would trigger the problem. A simplified set of rules that triggered the problem follows below.

The intent is to create MarkDowns for different languages, and letting the client's browser select the desired language using Accept Language. The MarkDown links are always the same, the target file is selected by internal redirects.

server {
    listen xx.xx.xxx.xxx:80;
    server_name my.own.domain.net;

    # nginx_accept_language_module
    set_from_accept_language $lang pt en es ;

    location ~ .*/(.+)\.(pt|en|es)\.md$ {
        if (!-f $request_filename) {
            return 404 ;
        }

        uwsgi_pass      unix:/var/run/www/rendermarkdown.wsgi.socket;
        include         uwsgi_params;
        uwsgi_param     SCRIPT_FILENAME /var/www/support/render_markdown.py;
        uwsgi_param     UWSGI_SCRIPT render_markdown;
    }

    location ~ .*/(.+)\.md$ {
        rewrite     ^(.*)\.(.*)$    $1.$lang.$2 last ;
    }

    location ~ .*/(.+)\.md-text$ {
        if ( $lang != 'pt') {
            rewrite     ^(.*)\.(.*)$    $1.$lang.$2 break ;
        }

        add_header  Content-Type text/plain;
        rewrite     ^(.*)(\-text)$ $1 break ;
    }
}
Contributor

Lisias commented Sep 6, 2016

Java habits are hard to get rid of. :-)

Any internal redirects (rewrite with break or last) would trigger the problem. A simplified set of rules that triggered the problem follows below.

The intent is to create MarkDowns for different languages, and letting the client's browser select the desired language using Accept Language. The MarkDown links are always the same, the target file is selected by internal redirects.

server {
    listen xx.xx.xxx.xxx:80;
    server_name my.own.domain.net;

    # nginx_accept_language_module
    set_from_accept_language $lang pt en es ;

    location ~ .*/(.+)\.(pt|en|es)\.md$ {
        if (!-f $request_filename) {
            return 404 ;
        }

        uwsgi_pass      unix:/var/run/www/rendermarkdown.wsgi.socket;
        include         uwsgi_params;
        uwsgi_param     SCRIPT_FILENAME /var/www/support/render_markdown.py;
        uwsgi_param     UWSGI_SCRIPT render_markdown;
    }

    location ~ .*/(.+)\.md$ {
        rewrite     ^(.*)\.(.*)$    $1.$lang.$2 last ;
    }

    location ~ .*/(.+)\.md-text$ {
        if ( $lang != 'pt') {
            rewrite     ^(.*)\.(.*)$    $1.$lang.$2 break ;
        }

        add_header  Content-Type text/plain;
        rewrite     ^(.*)(\-text)$ $1 break ;
    }
}
@skurfer

This comment has been minimized.

Show comment
Hide comment
@skurfer

skurfer Sep 7, 2016

Owner

Thanks. I’ll get the docs updated.

Owner

skurfer commented Sep 7, 2016

Thanks. I’ll get the docs updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment