Shorter urls #42

Merged
merged 2 commits into from May 4, 2013

Conversation

Projects
None yet
3 participants
@mk-fg
Contributor

mk-fg commented Apr 29, 2013

Instead of this:

http://some.0bin.site/paste/9611846d1c86e549fabe205837d40b780f5efce3#bLilFEQok1qTOzifX2gdvZO1ZoHki1getSt/YnAdAFQ=

Produce this:

http://some.0bin.site/paste/FxsUsOGB#aW8DhwMy

6eec295 changes how paste-ids are generated - instead of base16 hexdigest, base64-like (with "/" replaced with "-" to produce less confusing urls and require no special fs-path conversion) ids of configurable length are used.

Adds new PASTE_ID_LENGTH option:

# length of base64-like paste-id string in the url, max: 27 (length of sha1 digest)
# total number of unique pastes can be calculated as 2^(6*PASTE_ID_LENGTH)
# for PASTE_ID_LENGTH=8, for example, it's 2^(6*8) = 281 474 976 710 656
PASTE_ID_LENGTH = 8

It's currently not mentioned in the doc, but if you see no problem with these commits, I can easily fix this (or you can, if you intend to update fr doc alongside anyway - just one option).

As mentioned in the default_settings.py, with 8 base64 (6-bit) chars, number of possible pastes should be 2^(6*8) = 281 474 976 710 656, which I think should suffice to keep ids unique.

All old pastes should still be accessible under long old-style ids, while new ones will be shorter.

@sametmax

This comment has been minimized.

Show comment
Hide comment
@sametmax

sametmax Apr 30, 2013

Owner

Hello man, thanks again for this PL. It deserves I spend some time on it. I can do that today as I'm starting a new mission, but I'll have a look this WE.

Owner

sametmax commented Apr 30, 2013

Hello man, thanks again for this PL. It deserves I spend some time on it. I can do that today as I'm starting a new mission, but I'll have a look this WE.

@sametmax

This comment has been minimized.

Show comment
Hide comment
@sametmax

sametmax Apr 30, 2013

Owner

Just a quick though: now that key length is, we probably want it to be checked so it's not under a certain size (at least 4 characters since we us it to create a path) and zerobin.py should return a non zero error code with an error message if it detects a too short key length.

Owner

sametmax commented Apr 30, 2013

Just a quick though: now that key length is, we probably want it to be checked so it's not under a certain size (at least 4 characters since we us it to create a path) and zerobin.py should return a non zero error code with an error message if it detects a too short key length.

@mk-fg

This comment has been minimized.

Show comment
Hide comment
@mk-fg

mk-fg Apr 30, 2013

Contributor

Good point indeed, squashed that into 82e2dad.

Contributor

mk-fg commented Apr 30, 2013

Good point indeed, squashed that into 82e2dad.

@sametmax

This comment has been minimized.

Show comment
Hide comment
@sametmax

sametmax May 1, 2013

Owner

We must somewhere else to put the check, the runserver command is only called if you don't use a personable WSGI wrapper. get_app() feels more appropriate, with should raise an excecption, then catched by runserver which display the massage and exit. This way we separate programmatic running (and allow a custom WSGI setup) and automaticly setup scripted running.

Owner

sametmax commented May 1, 2013

We must somewhere else to put the check, the runserver command is only called if you don't use a personable WSGI wrapper. get_app() feels more appropriate, with should raise an excecption, then catched by runserver which display the massage and exit. This way we separate programmatic running (and allow a custom WSGI setup) and automaticly setup scripted running.

@mk-fg

This comment has been minimized.

Show comment
Hide comment
@mk-fg

mk-fg May 1, 2013

Contributor

Ah, true, moved the thing to get_app(), also adding "settings" keyword there to be able to pass custom settings object, while leaving existing settings_file interface intact as well. I actually run 0bin from uwsgi as well.

Currently it doesn't mean much, as both runserver() and get_app() use module-global "settings" var, but if they'll ever get separated, passing pre-populated (using cli) settings like that might be useful, and I thought it might also be useful for a general wsgi apps.
Not sure if having second keyword there for a somewhat similar and maybe too liberal interface there might be too much though, but two functions communicating via global var just doesn't look right ;)

Contributor

mk-fg commented May 1, 2013

Ah, true, moved the thing to get_app(), also adding "settings" keyword there to be able to pass custom settings object, while leaving existing settings_file interface intact as well. I actually run 0bin from uwsgi as well.

Currently it doesn't mean much, as both runserver() and get_app() use module-global "settings" var, but if they'll ever get separated, passing pre-populated (using cli) settings like that might be useful, and I thought it might also be useful for a general wsgi apps.
Not sure if having second keyword there for a somewhat similar and maybe too liberal interface there might be too much though, but two functions communicating via global var just doesn't look right ;)

@xyb

This comment has been minimized.

Show comment
Hide comment
@xyb

xyb May 2, 2013

This is a great idea, but I perform Crockford's Base32 encoding which is more human readable in case of you write it down.

xyb commented May 2, 2013

This is a great idea, but I perform Crockford's Base32 encoding which is more human readable in case of you write it down.

@mk-fg

This comment has been minimized.

Show comment
Hide comment
@mk-fg

mk-fg May 2, 2013

Contributor

I didn't really had the "write down on paper" use-case in mind, picking base64 because it includes most of the url-allowed ascii chars and is readily available (e.g. via str.encode('base64')).

For 8 chars, using 5-bit base32 still shouldn't make a difference, I'd question whether it's a relevant use-case to optimize for though - imho not, especially since you can still write down base64 with just a bit more effort to make upper/lower-case distinguishable.
Otherwise, I'd at least tweak base32 to be all-lowercase, which is more readable and less annoying than all-caps ;)

Actually, there's an interesting side-effect to both non-base16 encodings - there's a one-in-a-zillion chance your paste url will be "F-CKY0U#M0RON" ;)

Contributor

mk-fg commented May 2, 2013

I didn't really had the "write down on paper" use-case in mind, picking base64 because it includes most of the url-allowed ascii chars and is readily available (e.g. via str.encode('base64')).

For 8 chars, using 5-bit base32 still shouldn't make a difference, I'd question whether it's a relevant use-case to optimize for though - imho not, especially since you can still write down base64 with just a bit more effort to make upper/lower-case distinguishable.
Otherwise, I'd at least tweak base32 to be all-lowercase, which is more readable and less annoying than all-caps ;)

Actually, there's an interesting side-effect to both non-base16 encodings - there's a one-in-a-zillion chance your paste url will be "F-CKY0U#M0RON" ;)

@sametmax

This comment has been minimized.

Show comment
Hide comment
@sametmax

sametmax May 4, 2013

Owner

A friend of mine pointed out that we should not use the hash to generate the id, but rather a uuid to avoid attacks base on hash prediction.

Owner

sametmax commented May 4, 2013

A friend of mine pointed out that we should not use the hash to generate the id, but rather a uuid to avoid attacks base on hash prediction.

@mk-fg

This comment has been minimized.

Show comment
Hide comment
@mk-fg

mk-fg May 4, 2013

Contributor

More details on what do you mean by "hash prediction attacks" would be nice, I assume attack here is to guess paste-id without having the URL?

0bin doesn't just use "hash of the plaintext" - it generates (a) random key which is added to (b) random nonces (pbkdf2, ccm), (c) hashed, (d) passed through aes along with the (e) contents and then all that is hashed to produce paste-id.

So:

  • Even having the key and the plaintext attacker still can't get the paste-id.
  • Having paste-id, to match some plaintext against it, attacker would still have to bruteforce the key and all the nonces.
  • With the complete ciphertext, attacker can get the paste-id, but then it'd be also enough to look at the browser URL, headers of the request or html, not sure why that would be bad.

Using e.g. os.urandom() to add server's randomness to that won't hurt, I guess, but find it hard to see why it'd be necessary.

Contributor

mk-fg commented May 4, 2013

More details on what do you mean by "hash prediction attacks" would be nice, I assume attack here is to guess paste-id without having the URL?

0bin doesn't just use "hash of the plaintext" - it generates (a) random key which is added to (b) random nonces (pbkdf2, ccm), (c) hashed, (d) passed through aes along with the (e) contents and then all that is hashed to produce paste-id.

So:

  • Even having the key and the plaintext attacker still can't get the paste-id.
  • Having paste-id, to match some plaintext against it, attacker would still have to bruteforce the key and all the nonces.
  • With the complete ciphertext, attacker can get the paste-id, but then it'd be also enough to look at the browser URL, headers of the request or html, not sure why that would be bad.

Using e.g. os.urandom() to add server's randomness to that won't hurt, I guess, but find it hard to see why it'd be necessary.

@sametmax

This comment has been minimized.

Show comment
Hide comment
@sametmax

sametmax May 4, 2013

Owner

Currently we don't check if a paste exist, and somebody could override a paste with that. So first we should check for collision (which I didn't do because I though about randomness not security). Secondly, having a non predictable name is always nice.

Anyway, it's not the purpose of this PL so let's not worry about it.

I may open a ticket for that or just fix it by myself.

@xyb : can you open a ticket for this as well, so we can debate about base32 ?

In the meantime, will accept the PL.

Owner

sametmax commented May 4, 2013

Currently we don't check if a paste exist, and somebody could override a paste with that. So first we should check for collision (which I didn't do because I though about randomness not security). Secondly, having a non predictable name is always nice.

Anyway, it's not the purpose of this PL so let's not worry about it.

I may open a ticket for that or just fix it by myself.

@xyb : can you open a ticket for this as well, so we can debate about base32 ?

In the meantime, will accept the PL.

sametmax added a commit that referenced this pull request May 4, 2013

Merge pull request #42 from mk-fg/shorter_urls
Shorter urls. Thanks again for this piece of work to the author.

@sametmax sametmax merged commit 759bb82 into sametmax:master May 4, 2013

@mk-fg mk-fg deleted the mk-fg:shorter_urls branch May 4, 2013

@mk-fg

This comment has been minimized.

Show comment
Hide comment
@mk-fg

mk-fg May 4, 2013

Contributor

My point is that "somebody could override a paste with that" after several trillon paste submissions - sure, it's bad that "somebody could", but given the odds, I'd rather bet on lighting striking "somebody" before that happens ;)

Contributor

mk-fg commented May 4, 2013

My point is that "somebody could override a paste with that" after several trillon paste submissions - sure, it's bad that "somebody could", but given the odds, I'd rather bet on lighting striking "somebody" before that happens ;)

@xyb xyb referenced this pull request May 5, 2013

Closed

Base32 for shorter urls #43

@xyb

This comment has been minimized.

Show comment
Hide comment

xyb commented May 5, 2013

@sametmax done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment