Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base64url #4

Closed
baturinsky opened this issue Oct 3, 2014 · 8 comments
Closed

base64url #4

baturinsky opened this issue Oct 3, 2014 · 8 comments
Assignees

Comments

@baturinsky
Copy link

Thing to consider. There are more than one standards of base64. base64url, for example, is increasingly popular, because it makes embedding it in urls easier.

You may want to add support for base64url to your lib, unless you consider it too much bloat.

@marklister marklister self-assigned this Oct 6, 2014
@marklister
Copy link
Owner

RFC 4648

I might be able to use a similar approach to what I did in in the CSV parser of product-collections to simplify the usage.

@baturinsky
Copy link
Author

I'm not sure about use cases. Whether people need to have lib that does both, or have two slightly small libs that can do one each? I think second option is more likely. People either prefer "classic" base64 or base64url, and only need to have one.

@marklister
Copy link
Owner

Commit 3e17910 should suffice.

@baturinsky
Copy link
Author

I think there is also a need of option to omit padding. Or even make it off by default in base64url.
I'd add padding char as the last one in B64Scheme, and if it is omitted - no padding.

@marklister
Copy link
Owner

I re read the rfc and I'm not convinced:

Josefsson                   Standards Track                     [Page 3]

RFC 4648                    Base-N Encodings                October 2006


3.2.  Padding of Encoded Data

   In some circumstances, the use of padding ("=") in base-encoded data
   is not required or used.  In the general case, when assumptions about
   the size of transported data cannot be made, padding is required to
   yield correct decoded data.

   Implementations MUST include appropriate pad characters at the end of
   encoded data unless the specification referring to this document
   explicitly states otherwise.

   The base64 and base32 alphabets use padding, as described below in
   sections 4 and 6, but the base16 alphabet does not need it; see
   section 8.

My current implementation has the decode table recomputed at each decode. I'll fix this shortly by encapsulating B64Scheme in it's own class. It would be possible to put padding into this class but I don't think there's much use for unpadded base64...

@baturinsky
Copy link
Author

base64url is supposed to be used in URLs, therefore omission of usual reserved characters.

http://tools.ietf.org/html/rfc3986#section-3.4

URI producing applications often use the reserved characters allowed in a segment to delimit
scheme-specific or dereference-handler-specific subcomponents. For
example, the semicolon (";") and equals ("=") reserved characters are
often used to delimit parameters and parameter values applicable to
that segment. The comma (",") reserved character is often used for
similar purposes. For example, one URI producer might use a segment
such as "name;v=1.1" to indicate a reference to version 1.1 of
"name", whereas another might use a segment such as "name,1.1" to
indicate the same.

@marklister
Copy link
Owner

The problem is that if padding is omitted the encoding is non canonical.

The RFC suggests percent encoding the "=" character.

The pad character "=" is typically percent-encoded when used in an
   URI [9], but if the data length is known implicitly, this can be
   avoided by skipping the padding; see section 3.2.

I guess the padding could be stripped:

scala> "abcd".getBytes.toBase64
res1: String = YWJjZA==

scala> res1.reverse.dropWhile(_=='=').reverse
res3: String = YWJjZA

And if wanted to disable it I'd do something like this:

class  B64Scheme (val encodeTable:IndexedSeq[Char], paddding:Boolean=true){
    lazy val decodeTable=collection.immutable.TreeMap(encodeTable.zipWithIndex : _*)
    lazy val pad = if (padding) 3 else 1
  }

Then substitute 'pad' for 3 in the rest of the code.

@marklister
Copy link
Owner

I'll push branch "nopadding" shortly:

scala> implicit val scheme = new B64Scheme(base64.encodeTable,false)
scheme: io.github.marklister.base64.Base64.B64Scheme = io.github.marklister.base64.Base64$B64Scheme@14bef04

scala> "abcd".getBytes.toBase64
res1: String = 9hYmNk

scala> res1.toByteArray.map(_.toChar)
res2: Array[Char] = Array(�, a, b, c, d)

You'll see the issue with non-canonical encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants