Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Fix issue #255, create function url_join #304

Closed
wants to merge 6 commits into from

5 participants

@k21
k21 commented

No description provided.

r2/r2/lib/utils/utils.py
@@ -1372,3 +1372,31 @@ def constant_time_compare(actual, expected):
result |= ord(actual[i]) ^ ord(expected[i % expected_len])
return result == 0
+def url_join(*parts):
+ """
+ Join one or more URL components. Slashes are added between each two
+ components. Extra slashes are removed.
+ """
+ path = []
+ for p in parts:
+ if p == '': continue
+
+ if p[0] == '/':
@kemitche Collaborator
kemitche added a note

1384 - 1398 can be simplified to:
p.strip("/")

@k21
k21 added a note

Thanks, I will change it. I feel stupid :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@kemitche
Collaborator

Looks great!

@spladug
Owner

Very nice. There are a ton of other places that use os.path.join incorrectly, would you be interested in throwing a few more in here? I'm happy to merge as-is if not.

@k21
k21 commented

I personally expected to find more of them. When I grepped the sources for 'os.path.join' or 'path.join' I have found only 29 occurrences so I checked them all manually and it looked like all the others are used for files. Are you sure there are a ton of places? I still cannot find them.

@spladug
Owner

Wow, you're right. It's not as bad as I thought. It does seem that template_helpers.py:93 and js.py:78 could be url_join'd though.

@k21
k21 commented

I have noticed a problem, if the url starts with a slash, this slash is removed, which changes absolute path to relative. I am going to fix it and change the two locations.

@k21
k21 commented

I have noticed that the function in template_helpers.py does sometimes call os.path.join with parameters like ['/static/', '/static/js', 'utils.js']. It relies on the fact that if there is a absolute path, all paths before it are ignored. Should this be fixed in url_join or in the caller?

@k21
k21 commented

For now, I changed url_join to behave more like os.path.join.

@bboe

Just curious, what's wrong with urlparse.urljoin?

No solution is going to handle everything, for instance if you forget the trailing slash for a directory then it will be stripped off in the case of:

>>> urlparse.urljoin('foo', 'bar')
'bar'

Likewise having leading slashes on the latter part will truncate the remainder, however, I think that's to be expected.

>>> urlparse.urljoin('foo/', '/bar')
'/bar'

I see that one condition is you want to join more than two parameters. So maybe use urlparse.url join inside the iterator?

@k21
k21 commented

@bboe: I did not check it thoroughly, but it only supports 2 parameters, while some invocations need to pass more, and the first parameter has to be a complete valid url, so it probably should contain things like schema, hostname etc., which is also not what we want.

What I want to say that there are two solutions to this problem: change uses of os.path.join to something with different behavior and make sure that the usage is correct in all the sites OR create a new function that behaves similarly to the old function, but it is platform-independent and semantically correct, and then we only have one function to check.

@k21
k21 commented

I created another alternative which uses string concatenation instead of join, so it is more readable. It does possibly have worse performance, but it should not matter for the few strings for which it is used now. Please look here and tell me if it is acceptable, thanks :)

@chromakode

Nice! The new function and its invocations look good to me. I think it's good to include the absolute URL handing behavior, since it is useful in URL building functions like static. I like the new version, though it looks like it'd be simple enough to turn it into an array based string builder again.

An aside: while taking a look around template_helpers.py, I noticed a function called join_urls. It looks like it might be trivial to remove this duplicate function and replace it with your new util.

@spladug
Owner

Unfortunately, due to our latency in merging this, it's become unmergeable and out of date. I'm going to close this. Thanks a tonne for your work and we should definitely still do something like this.

@spladug spladug closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jan 6, 2012
  1. @k21
  2. @k21
Commits on Jan 7, 2012
  1. @k21

    Change behavior of url_join

    k21 authored
  2. @k21
Commits on Jan 9, 2012
  1. @k21

    Simplify url_join

    k21 authored
  2. @k21
This page is out of date. Refresh to see the latest.
View
3  r2/r2/lib/js.py
@@ -6,6 +6,7 @@
import json
from r2.lib.translation import iter_langs
+from r2.lib.utils import url_join
if __name__ != "__main__":
from pylons import g, c
@@ -75,7 +76,7 @@ def use(self):
path = [g.static_path, self.name]
if g.uncompressedJS:
path.insert(1, "js")
- return script_tag.format(src=static(os.path.join(*path)))
+ return script_tag.format(src=static(url_join(*path)))
@property
def dependencies(self):
View
18 r2/r2/lib/template_helpers.py
@@ -21,7 +21,7 @@
################################################################################
from r2.models import *
from filters import unsafe, websafe, _force_unicode
-from r2.lib.utils import vote_hash, UrlParser, timesince, is_subdomain
+from r2.lib.utils import vote_hash, UrlParser, timesince, is_subdomain, url_join
from r2.lib.media import s3_direct_url
@@ -90,7 +90,7 @@ def static(path, allow_gzip=True):
actual_filename = g.static_names.get(filename, filename)
path_components.append(actual_filename + suffix)
- actual_path = os.path.join(*path_components)
+ actual_path = url_join(*path_components)
return urlparse.urlunsplit((
scheme,
domain,
@@ -391,20 +391,6 @@ def add_sr(path, sr_path = True, nocname=False, force_hostname = False, retain_e
return u.unparse()
-def join_urls(*urls):
- """joins a series of urls together without doubles slashes"""
- if not urls:
- return
-
- url = urls[0]
- for u in urls[1:]:
- if not url.endswith('/'):
- url += '/'
- while u.startswith('/'):
- u = utils.lstrips(u, '/')
- url += u
- return url
-
def style_line(button_width = None, bgcolor = "", bordercolor = ""):
style_line = ''
bordercolor = c.bordercolor or bordercolor
View
4 r2/r2/lib/traffic.py
@@ -22,7 +22,7 @@
from httplib import HTTPConnection
from urlparse import urlparse
from cPickle import loads
-from utils import query_string
+from utils import query_string, url_join
import os, socket, time, datetime
from pylons import g
from r2.lib.memoize import memoize
@@ -43,7 +43,7 @@ def format_date(d):
d = d.astimezone(g.tz)
return ":".join(map(str, d.timetuple()[:6]))
- traffic_url = os.path.join(g.traffic_url, interval, what, iden)
+ traffic_url = url_join(g.traffic_url, interval, what, iden)
args = {}
if what == 'thing' and interval == 'hour':
if start_time:
View
22 r2/r2/lib/utils/utils.py
@@ -1372,3 +1372,25 @@ def constant_time_compare(actual, expected):
result |= ord(actual[i]) ^ ord(expected[i % expected_len])
return result == 0
+def url_join(*parts):
+ """
+ Join one or more URL components. Slashes are added between each two
+ components. Extra slashes are removed. If any of the components is
+ an absolute path, ignore all components in front of it.
+ """
+
+ path = []
+
+ for p in parts:
+ if p == '': continue
+ if p.startswith('/'):
+ path = ['/']
+ if len(path) != 0 and not path[-1].endswith('/'):
+ path.append('/')
+ stripped = p.strip('/')
+ if stripped:
+ path.append(stripped)
+ if p.endswith('/'):
+ path.append('/')
+
+ return ''.join(path)
View
4 r2/r2/templates/cnameframe.html
@@ -21,7 +21,7 @@
################################################################################
<%!
- from r2.lib.template_helpers import join_urls
+ from r2.lib.utils import url_join
%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
@@ -33,7 +33,7 @@
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
%if thing.original_path:
<link rel="alternate" type="application/rss+xml" title="RSS"
- href="${join_urls(thing.original_path,'.rss')}" />
+ href="${url_join(thing.original_path,'.rss')}" />
%endif
</head>
View
5 r2/r2/templates/reddit.html
@@ -21,12 +21,13 @@
################################################################################
<%!
- from r2.lib.template_helpers import add_sr, static, join_urls, class_dict, get_domain
+ from r2.lib.template_helpers import add_sr, static, class_dict, get_domain
from r2.lib.filters import unsafe
from r2.lib.pages import SearchForm, ClickGadget, SideContentBox
from r2.lib import tracking
from pylons import request
from r2.lib.strings import strings
+ from r2.lib.utils import url_join
from r2.models import make_feedurl, Sub
%>
<%namespace file="login.html" import="login_panel, login_form"/>
@@ -91,7 +92,7 @@
type="image/x-icon" />
%if thing.extension_handling:
<%
- rss= add_sr(join_urls(request.path,'.rss'))
+ rss= add_sr(url_join(request.path,'.rss'))
if thing.extension_handling == "private":
rss = make_feedurl(c.user, rss)
%>
View
3  r2/r2/templates/subredditstylesheet.html
@@ -23,6 +23,7 @@
from r2.lib.filters import keep_space
from r2.lib.template_helpers import add_sr, static
from r2.lib.media import s3_direct_url
+ from r2.lib.utils import url_join
import os
%>
@@ -143,7 +144,7 @@
function showDefaultStylesheet(btn) {
var default_stylesheet = $('#default_stylesheet');
/* make sure to load the default stylesheet contents from the local domain so it doesn't get caught by cross-domain rules. */
- var stylesheet_url = '${add_sr(os.path.join(c.site.static_path, g.static_names.get("reddit.css", "reddit.css")), nocname=True, sr_path=False)}';
+ var stylesheet_url = '${add_sr(url_join(c.site.static_path, g.static_names.get("reddit.css", "reddit.css")), nocname=True, sr_path=False)}';
if (!default_stylesheet.data('loaded')) {
default_stylesheet.text('loading...');
Something went wrong with that request. Please try again.