Skip to content

Commit

Permalink
feat: some improvements on glossary Base64 (#33423)
Browse files Browse the repository at this point in the history
* feat: some improvements

* fix: remove new line

* feat: added some glossary macros

* fix: link reordered

* Apply suggestions from code review

Co-authored-by: Brian Thomas Smith <brian@smith.berlin>

---------

Co-authored-by: Brian Thomas Smith <brian@smith.berlin>
  • Loading branch information
PassionPenguin and bsmth committed May 6, 2024
1 parent 94ef07a commit 76d86fa
Showing 1 changed file with 16 additions and 6 deletions.
22 changes: 16 additions & 6 deletions files/en-us/glossary/base64/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,22 @@ page-type: glossary-definition

{{GlossarySidebar}}

**Base64** is a group of similar [binary-to-text encoding](https://en.wikipedia.org/wiki/Binary-to-text_encoding) schemes that represent binary data in an {{glossary("ASCII")}} string format by translating it into a radix-64 representation. The term _Base64_ originates from a specific [MIME content transfer encoding](https://en.wikipedia.org/wiki/MIME#Content-Transfer-Encoding).
**Base64** is a group of similar [binary-to-text encoding](https://en.wikipedia.org/wiki/Binary-to-text_encoding) schemes that represent binary data in an {{glossary("ASCII")}} string format by transforming it into a radix-64 representation. The term _Base64_ originates from a specific [MIME content transfer encoding](https://en.wikipedia.org/wiki/MIME#Content-Transfer-Encoding).

When the term "Base64" is used on its own to refer to a specific algorithm, it typically refers to the version of Base64 outlined in [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648), section 4, which uses the following alphabet to represent the radix-64 digits, alongside `=` as a padding character:
When the term "Base64" is used on its own to refer to a specific {{glossary("algorithm")}}, it typically refers to the version of Base64 outlined in [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648), section 4, which uses the following alphabet to represent the radix-64 digits, alongside `=` as a padding character:

```plain
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
```

A common variant is "Base64 URL safe", which omits the padding and replaces `+/` with `-_` to avoid characters that might cause problems in URL path segments or query parameters.
A common variant is "Base64 URL safe", which omits the padding and replaces `+/` with `-_` to avoid characters that might cause problems in
{{glossary("URL")}} path segments or query parameters.

Base64 encoding schemes are commonly used to encode binary data for storage or transfer over media that can only deal with ASCII text (or some superset of ASCII that still falls short of accepting arbitrary binary data). This ensures that the data remains intact without modification during transport. Common applications of Base64 include:

- Email via [MIME](https://en.wikipedia.org/wiki/MIME)
- Storing complex data in [XML](/en-US/docs/Web/XML)
- Encoding binary data so it can be included in a [`data:` URL](/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs)
- Encoding binary data so that it can be included in a [`data:` URL](/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs)

## Encoded size increase

Expand All @@ -35,13 +36,13 @@ Browsers natively provide two JavaScript functions for decoding and encoding Bas
- [`btoa`](/en-US/docs/Web/API/btoa): creates a Base64-encoded ASCII string from a string of binary data ("btoa" should be read as "binary to ASCII").
- [`atob`](/en-US/docs/Web/API/atob): decodes a Base64-encoded string ("atob" should be read as "ASCII to binary").

> **Note:** Base64 is a binary encoding rather than a text encoding, but `btoa` and `atob` were added to the web platform before it supported binary data types. As a result, the two functions use strings to represent binary data, with the code point of each character representing the value of each byte. This has led to a common misconception that `btoa` can be used to encode arbitrary text data — for example, creating a Base64 `data:` URL of a text or HTML document.
> **Note:** Base64 is a binary encoding rather than a text encoding, but `btoa` and `atob` were added to the web platform before it supported binary data types. As a result, the two functions use strings to represent binary data, with the {{glossary("code point")}} of each character representing the value of each byte. This has led to a common misconception that `btoa` can be used to encode arbitrary text data — for example, creating a Base64 `data:` URL of a text or HTML document.
>
> However, the byte-to-code-point correspondence only reliably holds true for code points up to `0x7f`. Furthermore, code points over `0xff` will cause `btoa` to throw an error due to exceeding the maximum value for 1 byte. The next section details how to work around this limitation when encoding arbitrary Unicode text.
## The "Unicode Problem"

Since `btoa` interprets the code points of its input string as byte values, calling `btoa` on a string will cause a "Character Out Of Range" exception if a character's code point exceeds `0xff`. For use cases where you need to encode arbitrary Unicode text, it is necessary to first convert the string to its constituent bytes in UTF-8, and then encode the bytes.
Since `btoa` interprets the code points of its input string as byte values, calling `btoa` on a string will cause a "Character Out Of Range" exception if a character's code point exceeds `0xff`. For use cases where you need to encode arbitrary Unicode text, it is necessary to first convert the string to its constituent bytes in {{glossary("UTF-8")}}, and then encode the bytes.

The simplest solution is to use `TextEncoder` and `TextDecoder` to convert between UTF-8 and single-byte representations of the string:

Expand Down Expand Up @@ -89,3 +90,12 @@ async function dataUrlToBytes(dataUrl) {
await bytesToBase64DataUrl(new Uint8Array([0, 1, 2])); // "data:application/octet-stream;base64,AAEC"
await dataUrlToBytes("data:application/octet-stream;base64,AAEC"); // Uint8Array [0, 1, 2]
```

## See Also

- JavaScript APIs:
- [btoa() global function](/en-US/docs/Web/API/btoa)
- [atob() global function](/en-US/docs/Web/API/atob)
- [Data URLs](/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs)
- [Base64](https://en.wikipedia.org/wiki/Base64) on Wikipedia
- Base64 Algorithm described in [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648)

0 comments on commit 76d86fa

Please sign in to comment.