diff --git a/README.md b/README.md index 3c8ca84..a03ec06 100644 --- a/README.md +++ b/README.md @@ -8,159 +8,18 @@ [![Coverage Status][coveralls_badge]][coveralls] [![Dependency Status][dependency_badge]][dependency] -This repository provides a Rust [library] and a [binary] providing efficient -common and custom data-encodings. - -## Common use-cases - -The [library] provides the following common encodings: - -- `HEXLOWER`: lowercase hexadecimal -- `HEXLOWER_PERMISSIVE`: lowercase hexadecimal with case-insensitive decoding -- `HEXUPPER`: uppercase hexadecimal -- `HEXUPPER_PERMISSIVE`: uppercase hexadecimal with case-insensitive decoding -- `BASE32`: RFC4648 base32 -- `BASE32_NOPAD`: RFC4648 base32 without padding -- `BASE32_DNSSEC`: RFC5155 base32 -- `BASE32_DNSCURVE`: DNSCurve base32 -- `BASE32HEX`: RFC4648 base32hex -- `BASE32HEX_NOPAD`: RFC4648 base32hex without padding -- `BASE64`: RFC4648 base64 -- `BASE64_NOPAD`: RFC4648 base64 without padding -- `BASE64_MIME`: RFC2045-like base64 -- `BASE64URL`: RFC4648 base64url -- `BASE64URL_NOPAD`: RFC4648 base64url without padding - -Typical usage looks like: - -```rust -// allocating functions -BASE64.encode(&input_to_encode) -HEXLOWER.decode(&input_to_decode) -// in-place functions -BASE32.encode_mut(&input_to_encode, &mut encoded_output) -BASE64_URL.decode_mut(&input_to_decode, &mut decoded_output) -``` - -See the [documentation] or the [changelog] for more details. - -## Custom use-cases - -The [library] also provides the possibility to define custom little-endian ASCII -base-conversion encodings for bases of size 2, 4, 8, 16, 32, and 64 (for which -all above use-cases are particular instances). It supports: - -- padded and unpadded encodings -- canonical encodings (e.g. trailing bits are checked) -- in-place encoding and decoding functions -- partial decoding functions (e.g. for error recovery) -- character translation (e.g. for case-insensitivity) -- most and least significant bit-order -- ignoring characters when decoding (e.g. for skipping newlines) -- wrapping the output when encoding - -The typical definition of a custom encoding looks like: - -```rust -lazy_static! { - static ref HEX: Encoding = { - let mut spec = Specification::new(); - spec.symbols.push_str("0123456789abcdef"); - spec.translate.from.push_str("ABCDEF"); - spec.translate.to.push_str("abcdef"); - spec.encoding().unwrap() - }; - static ref BASE64: Encoding = { - let mut spec = Specification::new(); - spec.symbols.push_str( - "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"); - spec.padding = Some('='); - spec.encoding().unwrap() - }; -} -``` - -You may also use the [macro] library to define a compile-time custom encoding: - -```rust -const HEX: Encoding = new_encoding!{ - symbols: "0123456789abcdef", - translate_from: "ABCDEF", - translate_to: "abcdef", -}; -const BASE64: Encoding = new_encoding!{ - symbols: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", - padding: '=', -}; -``` - -See the [documentation] or the [changelog] for more details. - -## Performance - -The performance of the encoding and decoding functions (for both common and -custom encodings) are similar to existing implementations in C, Rust, and other -high-performance languages. You may run the benchmarks with `make bench`. - -## Swiss-knife binary - -The [binary] is mostly a wrapper around the library. You can run `make install` -to install it from the repository. By default, it will be installed as -`~/.cargo/bin/data-encoding`. You can also run `cargo install data-encoding-bin` -to install the latest version published on `crates.io`. This second alternative -does not require to clone the repository. - -Once installed, you can run `data-encoding --help` (assuming `~/.cargo/bin` is -in your `PATH` environment variable) to see the usage: - -``` -Usage: data-encoding --mode= --base= [] -Usage: data-encoding --mode= --symbols= [] - -Options: - -m, --mode {encode|decode|describe} - -b, --base {16|hex|32|32hex|64|64url} - -i, --input read from instead of standard input - -o, --output write to instead of standard output - --block read blocks of about bytes - -p, --padding - pad with - -g, --ignore - when decoding, ignore characters in - -w, --width when encoding, wrap every characters - -s, --separator - when encoding, wrap with - --symbols - define a custom base using - --translate - when decoding, translate as - --ignore_trailing_bits - when decoding, ignore non-zero trailing bits - --least_significant_bit_first - use least significant bit first bit-order - -Examples: - # Encode using the RFC4648 base64 encoding - data-encoding -mencode -b64 # without padding - data-encoding -mencode -b64 -p= # with padding - - # Encode using the MIME base64 encoding - data-encoding -mencode -b64 -p= -w76 -s$'\r\n' - - # Show base information for the permissive hexadecimal encoding - data-encoding --mode=describe --base=hex - - # Decode using the DNSCurve base32 encoding - data-encoding -mdecode \ - --symbols=0123456789bcdfghjklmnpqrstuvwxyz \ - --translate=BCDFGHJKLMNPQRSTUVWXYZbcdfghjklmnpqrstuvwxyz \ - --least_significant_bit_first -``` +This repository provides the following Rust crates for data-encoding: +- The `data-encoding` library provides common and custom encodings, like + hexadecimal, base32, and base64. See the [documentation] for more information. +- The `data-encoding-macro` library provides compile-time facilities. See the + [documentation][macro] for more information. +- The `data-encoding-bin` binary is a command-line tool to define and use + encodings. See the [binary] for more information. +- The [website] provides a playground to define and use encodings. [appveyor]: https://ci.appveyor.com/project/ia0/data-encoding [appveyor_badge]:https://ci.appveyor.com/api/projects/status/wm4ga69xnlriukhl/branch/master?svg=true [binary]: https://crates.io/crates/data-encoding-bin -[changelog]: https://github.com/ia0/data-encoding/blob/master/lib/CHANGELOG.md [coveralls]: https://coveralls.io/github/ia0/data-encoding?branch=master [coveralls_badge]: https://coveralls.io/repos/github/ia0/data-encoding/badge.svg?branch=master [dependency]: https://deps.rs/crate/data-encoding/2.3.0 @@ -170,7 +29,8 @@ Examples: [library]: https://crates.io/crates/data-encoding [license]: https://github.com/ia0/data-encoding/blob/master/LICENSE [license_badge]: https://img.shields.io/crates/l/data-encoding.svg -[macro]: https://crates.io/crates/data-encoding-macro +[macro]: https://docs.rs/data-encoding-macro [travis]: https://travis-ci.org/ia0/data-encoding [travis_badge]: https://travis-ci.org/ia0/data-encoding.svg?branch=master [version_badge]: https://img.shields.io/crates/v/data-encoding.svg +[website]: https://data-encoding.rs diff --git a/bin/Cargo.toml b/bin/Cargo.toml index 4b3772f..9bf35bf 100644 --- a/bin/Cargo.toml +++ b/bin/Cargo.toml @@ -18,7 +18,3 @@ path = "src/main.rs" [dependencies] data-encoding = { version = "2", path = "../lib" } getopts = "0.2" - -[badges] -appveyor = { repository = "ia0/data-encoding" } -travis-ci = { repository = "ia0/data-encoding" } diff --git a/bin/README.md b/bin/README.md index a25c25f..46f7e32 100644 --- a/bin/README.md +++ b/bin/README.md @@ -1,65 +1,21 @@ -This binary is a wrapper around the `data-encoding` [library]. - -## Installation - -You can run `make install` to install the binary from the [github] repository. -By default, it will be installed as `~/.cargo/bin/data-encoding`. You can also -run `cargo install data-encoding-bin` to install the latest version published on -`crates.io`. This second alternative does not require to clone the repository. - -## Usage - -You can run `data-encoding --help` (assuming `~/.cargo/bin` is in your `PATH` -environment variable) to see the usage: +To install the binary from the [github] repository: ``` -Usage: data-encoding --mode= --base= [] -Usage: data-encoding --mode= --symbols= [] - -Options: - -m, --mode {encode|decode|describe} - -b, --base {16|hex|32|32hex|64|64url} - -i, --input read from instead of standard input - -o, --output write to instead of standard output - --block read blocks of about bytes - -p, --padding - pad with - -g, --ignore - when decoding, ignore characters in - -w, --width when encoding, wrap every characters - -s, --separator - when encoding, wrap with - --symbols - define a custom base using - --translate - when decoding, translate as - --ignore_trailing_bits - when decoding, ignore non-zero trailing bits - --least_significant_bit_first - use least significant bit first bit-order - -Examples: - # Encode using the RFC4648 base64 encoding - data-encoding -mencode -b64 # without padding - data-encoding -mencode -b64 -p= # with padding - - # Encode using the MIME base64 encoding - data-encoding -mencode -b64 -p= -w76 -s$'\r\n' +make install +``` - # Show base information for the permissive hexadecimal encoding - data-encoding --mode=describe --base=hex +To install the latest version published on `crates.io` (does not require to clone the repository): - # Decode using the DNSCurve base32 encoding - data-encoding -mdecode \ - --symbols=0123456789bcdfghjklmnpqrstuvwxyz \ - --translate=BCDFGHJKLMNPQRSTUVWXYZbcdfghjklmnpqrstuvwxyz \ - --least_significant_bit_first +``` +cargo install data-encoding-bin ``` -## Performance +By default, the binary will be installed as `~/.cargo/bin/data-encoding`. +Assuming `~/.cargo/bin` is in your `PATH` environment variable, you can see the +usage by running: -The performance of this binary is similar or faster than the GNU `base64` -program (see how to run the benchmarks on [github]). +``` +data-encoding --help +``` -[library]: https://crates.io/crates/data-encoding [github]: https://github.com/ia0/data-encoding diff --git a/cmp/.gitignore b/cmp/.gitignore new file mode 100644 index 0000000..042776a --- /dev/null +++ b/cmp/.gitignore @@ -0,0 +1,2 @@ +/Cargo.lock +/target/ diff --git a/cmp/Cargo.toml b/cmp/Cargo.toml index 7f7243d..3b0c86f 100644 --- a/cmp/Cargo.toml +++ b/cmp/Cargo.toml @@ -11,8 +11,6 @@ publish = false base64 = { git = "https://github.com/alicemaz/rust-base64" } data-encoding = { path = "../lib" } libc = "0.2" -rustc-serialize = "0.3" -lazy_static = "1" [build-dependencies] cc = "1" diff --git a/cmp/tests/lib.rs b/cmp/tests/lib.rs index a7f80d8..22b8290 100644 --- a/cmp/tests/lib.rs +++ b/cmp/tests/lib.rs @@ -1,13 +1,6 @@ -extern crate base64; -extern crate cmp; -extern crate data_encoding; -#[macro_use] -extern crate lazy_static; -extern crate rustc_serialize; - +use base64::DecodeError::*; use data_encoding::DecodeKind::*; -use data_encoding::{DecodeError, Encoding, Specification, BASE64}; -use rustc_serialize::base64::{FromBase64, ToBase64, STANDARD}; +use data_encoding::{DecodeError, BASE64}; #[test] fn encode_exact() { @@ -31,56 +24,29 @@ fn encode_exact() { BASE64.encode_mut(i, &mut r); assert_eq!(&r, o); } - for &(ref i, ref o) in tests { - assert_eq!(&i.to_base64(STANDARD).as_bytes(), o); - } } #[test] fn difference() { let x = b"AAB="; assert_eq!(BASE64.decode(x).err().unwrap(), DecodeError { position: 2, kind: Trailing }); - assert_eq!(x.from_base64().unwrap(), vec![0, 0]); - assert!(base64::decode(x).is_err()); + assert_eq!(base64::decode(x).err().unwrap(), InvalidLastSymbol(2, b'B')); let x = b"AA\nB="; assert_eq!(BASE64.decode(x).err().unwrap(), DecodeError { position: 4, kind: Length }); - assert_eq!(x.from_base64().unwrap(), vec![0, 0]); - assert_eq!(base64::decode(x).err().unwrap(), base64::DecodeError::InvalidLength); + assert_eq!(base64::decode(x).err().unwrap(), InvalidLength); let x = b"AAB"; assert_eq!(BASE64.decode(x).err().unwrap(), DecodeError { position: 0, kind: Length }); - assert_eq!(x.from_base64().unwrap(), vec![0, 0]); - assert!(base64::decode(x).is_err()); + assert_eq!(base64::decode(x).err().unwrap(), InvalidLastSymbol(2, b'B')); let x = b"AAA"; assert_eq!(BASE64.decode(x).err().unwrap(), DecodeError { position: 0, kind: Length }); - assert_eq!(x.from_base64().unwrap(), vec![0, 0]); assert_eq!(base64::decode(x).unwrap(), vec![0, 0]); let x = b"A\rA\nB="; assert_eq!(BASE64.decode(x).err().unwrap(), DecodeError { position: 4, kind: Length }); - assert_eq!(x.from_base64().unwrap(), vec![0, 0]); - assert_eq!(base64::decode(x).err().unwrap(), base64::DecodeError::InvalidByte(1, b'\r')); + assert_eq!(base64::decode(x).err().unwrap(), InvalidByte(1, b'\r')); let x = b"-_\r\n"; assert_eq!(BASE64.decode(x).err().unwrap(), DecodeError { position: 0, kind: Symbol }); - assert_eq!(x.from_base64().unwrap(), vec![251]); - assert_eq!(base64::decode(x).err().unwrap(), base64::DecodeError::InvalidByte(0, b'-')); + assert_eq!(base64::decode(x).err().unwrap(), InvalidByte(0, b'-')); let x = b"AA==AA=="; assert_eq!(BASE64.decode(x).unwrap(), vec![0, 0]); - assert!(x.from_base64().is_err()); - assert_eq!(base64::decode(x).err().unwrap(), base64::DecodeError::InvalidByte(2, b'=')); -} - -lazy_static! { - static ref HEX: Encoding = { - let mut spec = Specification::new(); - spec.symbols.push_str("0123456789abcdef"); - spec.translate.from.push_str("ABCDEF"); - spec.translate.to.push_str("abcdef"); - spec.encoding().unwrap() - }; -} - -#[test] -fn lazy_static_hex() { - assert_eq!(HEX.encode(b"Hello"), "48656c6c6f"); - assert_eq!(HEX.decode(b"48656c6c6f").unwrap(), b"Hello"); - assert_eq!(*HEX, data_encoding::HEXLOWER_PERMISSIVE); + assert_eq!(base64::decode(x).err().unwrap(), InvalidByte(2, b'=')); } diff --git a/lib/README.md b/lib/README.md index 93b5a3c..b9dea39 100644 --- a/lib/README.md +++ b/lib/README.md @@ -2,110 +2,36 @@ [![Build Status][appveyor_badge]][appveyor] [![Coverage Status][coveralls_badge]][coveralls] -## Common use-cases - This library provides the following common encodings: -- `HEXLOWER`: lowercase hexadecimal -- `HEXLOWER_PERMISSIVE`: lowercase hexadecimal with case-insensitive decoding -- `HEXUPPER`: uppercase hexadecimal -- `HEXUPPER_PERMISSIVE`: uppercase hexadecimal with case-insensitive decoding -- `BASE32`: RFC4648 base32 -- `BASE32_NOPAD`: RFC4648 base32 without padding -- `BASE32_DNSSEC`: RFC5155 base32 -- `BASE32_DNSCURVE`: DNSCurve base32 -- `BASE32HEX`: RFC4648 base32hex -- `BASE32HEX_NOPAD`: RFC4648 base32hex without padding -- `BASE64`: RFC4648 base64 -- `BASE64_NOPAD`: RFC4648 base64 without padding -- `BASE64_MIME`: RFC2045-like base64 -- `BASE64URL`: RFC4648 base64url -- `BASE64URL_NOPAD`: RFC4648 base64url without padding - -Typical usage looks like: - -```rust -// allocating functions -BASE64.encode(&input_to_encode) -HEXLOWER.decode(&input_to_decode) -// in-place functions -BASE32.encode_mut(&input_to_encode, &mut encoded_output) -BASE64_URL.decode_mut(&input_to_decode, &mut decoded_output) -``` - -See the [documentation] or the [changelog] for more details. - -## Custom use-cases - -This library also provides the possibility to define custom little-endian ASCII +| Name | Description | +| --- | --- | +| `HEXLOWER` | lowercase hexadecimal | +| `HEXLOWER_PERMISSIVE` | lowercase hexadecimal (case-insensitive decoding) | +| `HEXUPPER` | uppercase hexadecimal | +| `HEXUPPER_PERMISSIVE` | uppercase hexadecimal (case-insensitive decoding) | +| `BASE32` | RFC4648 base32 | +| `BASE32_NOPAD` | RFC4648 base32 (no padding) | +| `BASE32_DNSSEC` | RFC5155 base32 | +| `BASE32_DNSCURVE` | DNSCurve base32 | +| `BASE32HEX` | RFC4648 base32hex | +| `BASE32HEX_NOPAD` | RFC4648 base32hex (no padding) | +| `BASE64` | RFC4648 base64 | +| `BASE64_NOPAD` | RFC4648 base64 (no padding) | +| `BASE64_MIME` | RFC2045-like base64 | +| `BASE64URL` | RFC4648 base64url | +| `BASE64URL_NOPAD` | RFC4648 base64url (no padding) | + +It also provides the possibility to define custom little-endian ASCII base-conversion encodings for bases of size 2, 4, 8, 16, 32, and 64 (for which -all above use-cases are particular instances). It supports: - -- padded and unpadded encodings -- canonical encodings (e.g. trailing bits are checked) -- in-place encoding and decoding functions -- partial decoding functions (e.g. for error recovery) -- character translation (e.g. for case-insensitivity) -- most and least significant bit-order -- ignoring characters when decoding (e.g. for skipping newlines) -- wrapping the output when encoding - -The typical definition of a custom encoding looks like: - -```rust -lazy_static! { - static ref HEX: Encoding = { - let mut spec = Specification::new(); - spec.symbols.push_str("0123456789abcdef"); - spec.translate.from.push_str("ABCDEF"); - spec.translate.to.push_str("abcdef"); - spec.encoding().unwrap() - }; - static ref BASE64: Encoding = { - let mut spec = Specification::new(); - spec.symbols.push_str( - "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"); - spec.padding = Some('='); - spec.encoding().unwrap() - }; -} -``` - -You may also use the [macro] library to define a compile-time custom encoding: - -```rust -const HEX: Encoding = new_encoding!{ - symbols: "0123456789abcdef", - translate_from: "ABCDEF", - translate_to: "abcdef", -}; -const BASE64: Encoding = new_encoding!{ - symbols: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", - padding: '=', -}; -``` - -See the [documentation] or the [changelog] for more details. - -## Performance - -The performance of the encoding and decoding functions (for both common and -custom encodings) are similar to existing implementations in C, Rust, and other -high-performance languages (see how to run the benchmarks on [github]). - -## Swiss-knife binary +all above use-cases are particular instances). -This crate is a library. If you are looking for the [binary] using this library, -see the installation instructions on [github]. +See the [documentation] for more details. [appveyor]: https://ci.appveyor.com/project/ia0/data-encoding [appveyor_badge]:https://ci.appveyor.com/api/projects/status/wm4ga69xnlriukhl/branch/master?svg=true -[binary]: https://crates.io/crates/data-encoding-bin -[changelog]: https://github.com/ia0/data-encoding/blob/master/lib/CHANGELOG.md [coveralls]: https://coveralls.io/github/ia0/data-encoding?branch=master [coveralls_badge]: https://coveralls.io/repos/github/ia0/data-encoding/badge.svg?branch=master [documentation]: https://docs.rs/data-encoding -[github]: https://github.com/ia0/data-encoding -[macro]: https://crates.io/crates/data-encoding-macro [travis]: https://travis-ci.org/ia0/data-encoding [travis_badge]: https://travis-ci.org/ia0/data-encoding.svg?branch=master diff --git a/lib/macro/README.md b/lib/macro/README.md index 6e39d0e..1f15bd6 100644 --- a/lib/macro/README.md +++ b/lib/macro/README.md @@ -3,39 +3,7 @@ encoded strings (using common bases like base64, base32, or hexadecimal, and also custom bases). It also provides a macro to define compile-time custom encodings to be used with the [data-encoding] crate. -If you were familiar with the [binary_macros] crate, this library is actually -[inspired][binary_macros_issue] from it. - -### Examples - -You can define a compile-time byte slice or array (using the `hexlower` or -`base64` macros for example): - -```rust -const HELLO: &'static [u8] = &hexlower!("68656c6c6f"); -const FOOBAR: &'static [u8] = &base64!("Zm9vYmFy"); -// It is possible to define an array instead of a slice: -hexlower_array!("const HELLO" = "68656c6c6f"); -base64_array!("const FOOBAR" = "Zm9vYmFy"); -``` - -You can define a compile-time custom encoding using the `new_encoding` macro: - -```rust -const HEX: Encoding = new_encoding!{ - symbols: "0123456789abcdef", - translate_from: "ABCDEF", - translate_to: "abcdef", -}; -const BASE64: Encoding = new_encoding!{ - symbols: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", - padding: '=', -}; -``` - See the [documentation] for more details. -[binary_macros]: https://crates.io/crates/binary_macros -[binary_macros_issue]: https://github.com/ia0/data-encoding/issues/7 [data-encoding]: https://crates.io/crates/data-encoding [documentation]: https://docs.rs/data-encoding-macro diff --git a/lib/src/lib.rs b/lib/src/lib.rs index 5d6e5c9..dd8a860 100644 --- a/lib/src/lib.rs +++ b/lib/src/lib.rs @@ -3,35 +3,31 @@ //! This [crate] provides little-endian ASCII base-conversion encodings for //! bases of size 2, 4, 8, 16, 32, and 64. It supports: //! -//! - padded and unpadded encodings -//! - canonical encodings (e.g. trailing bits are checked) -//! - in-place encoding and decoding functions -//! - partial decoding functions (e.g. for error recovery) -//! - character translation (e.g. for case-insensitivity) -//! - most and least significant bit-order -//! - ignoring characters when decoding (e.g. for skipping newlines) -//! - wrapping the output when encoding -//! - no-std with `std` and `alloc` features +//! - [padding] for streaming +//! - canonical encodings (e.g. [trailing bits] are checked) +//! - in-place [encoding] and [decoding] functions +//! - partial [decoding] functions (e.g. for error recovery) +//! - character [translation] (e.g. for case-insensitivity) +//! - most and least significant [bit-order] +//! - [ignoring] characters when decoding (e.g. for skipping newlines) +//! - [wrapping] the output when encoding +//! - no-std environments with `default-features = false, features = ["alloc"]` +//! - no-alloc environments with `default-features = false` //! -//! The performance of the encoding and decoding functions are similar to -//! existing implementations (see how to run the benchmarks on [github]). -//! -//! This is the library documentation. If you are looking for the [binary], see -//! the installation instructions on [github]. +//! You may use the [binary] or the [website] to play around. //! //! # Examples //! -//! This crate provides predefined encodings as [constants]. These constants are -//! of type [`Encoding`]. This type provides encoding and decoding functions -//! with in-place or allocating variants. Here is an example using the -//! allocating encoding function of [base64]: +//! This crate provides predefined encodings as [constants]. These constants are of type +//! [`Encoding`]. This type provides encoding and decoding functions with in-place or allocating +//! variants. Here is an example using the allocating encoding function of [`BASE64`]: //! //! ```rust //! use data_encoding::BASE64; //! assert_eq!(BASE64.encode(b"Hello world"), "SGVsbG8gd29ybGQ="); //! ``` //! -//! Here is an example using the in-place decoding function of [base32]: +//! Here is an example using the in-place decoding function of [`BASE32`]: //! //! ```rust //! use data_encoding::BASE32; @@ -41,9 +37,9 @@ //! assert_eq!(&output[0 .. len], b"Hello world"); //! ``` //! -//! You are not limited to the predefined encodings. You may define your own -//! encodings (with the same correctness and performance properties as the -//! predefined ones) using the [`Specification`] type: +//! You are not limited to the predefined encodings. You may define your own encodings (with the +//! same correctness and performance properties as the predefined ones) using the [`Specification`] +//! type: //! //! ```rust //! use data_encoding::Specification; @@ -55,23 +51,11 @@ //! assert_eq!(hex.encode(b"hello"), "68656c6c6f"); //! ``` //! -//! If you use the [`lazy_static`] crate, you can define a global encoding: -//! -//! ```rust,ignore -//! lazy_static! { -//! static ref HEX: Encoding = { -//! let mut spec = Specification::new(); -//! spec.symbols.push_str("0123456789abcdef"); -//! spec.translate.from.push_str("ABCDEF"); -//! spec.translate.to.push_str("abcdef"); -//! spec.encoding().unwrap() -//! }; -//! } -//! ``` -//! -//! You may also use the [macro] library to define a compile-time custom encoding: +//! You may use the [macro] library to define a compile-time custom encoding: //! //! ```rust,ignore +//! use data_encoding::Encoding; +//! use data_encoding_macro::new_encoding; //! const HEX: Encoding = new_encoding!{ //! symbols: "0123456789abcdef", //! translate_from: "ABCDEF", @@ -85,89 +69,78 @@ //! //! # Properties //! -//! The [base16], [base32], [base32hex], [base64], and [base64url] predefined -//! encodings are conform to [RFC4648]. +//! The [`HEXUPPER`], [`BASE32`], [`BASE32HEX`], [`BASE64`], and [`BASE64URL`] predefined encodings +//! are conform to [RFC4648]. //! -//! In general, the encoding and decoding functions satisfy the following -//! properties: +//! In general, the encoding and decoding functions satisfy the following properties: //! //! - They are deterministic: their output only depends on their input //! - They have no side-effects: they do not modify a hidden mutable state //! - They are correct: encoding then decoding gives the initial data -//! - They are canonical (unless [`is_canonical`] returns false): decoding then -//! encoding gives the initial data +//! - They are canonical (unless [`is_canonical`] returns false): decoding then encoding gives the +//! initial data //! -//! This last property is usually not satisfied by common base64 implementations -//! (like the `rustc-serialize` crate, the `base64` crate, or the `base64` GNU -//! program). This is a matter of choice and this crate has made the choice to -//! let the user choose. Support for canonical encoding as described by the -//! [RFC][canonical] is provided. But it is also possible to disable checking -//! trailing bits, to add characters translation, to decode concatenated padded -//! inputs, and to ignore some characters. +//! This last property is usually not satisfied by base64 implementations. This is a matter of +//! choice and this crate has made the choice to let the user choose. Support for canonical encoding +//! as described by the [RFC][canonical] is provided. But it is also possible to disable checking +//! trailing bits, to add characters translation, to decode concatenated padded inputs, and to +//! ignore some characters. //! -//! Since the RFC specifies the encoding function on all inputs and the decoding -//! function on all possible encoded outputs, the differences between -//! implementations come from the decoding function which may be more or less -//! permissive. In this crate, the decoding function of canonical encodings -//! rejects all inputs that are not a possible output of the encoding function. -//! Here are some concrete examples of decoding differences between this crate, -//! the `rustc-serialize` crate, the `base64` crate, and the `base64` GNU -//! program: +//! Since the RFC specifies the encoding function on all inputs and the decoding function on all +//! possible encoded outputs, the differences between implementations come from the decoding +//! function which may be more or less permissive. In this crate, the decoding function of canonical +//! encodings rejects all inputs that are not a possible output of the encoding function. Here are +//! some concrete examples of decoding differences between this crate, the `base64` crate, and the +//! `base64` GNU program: //! -//! | Input | `data-encoding` | `rustc` | `base64` | GNU `base64` | -//! | ---------- | --------------- | -------- | -------- | ------------- | -//! | `AAB=` | `Trailing(2)` | `[0, 0]` | `Err(2)` | `\x00\x00` | -//! | `AA\nB=` | `Length(4)` | `[0, 0]` | `Length` | `\x00\x00` | -//! | `AAB` | `Length(0)` | `[0, 0]` | `Err(2)` | Invalid input | -//! | `A\rA\nB=` | `Length(4)` | `[0, 0]` | `Err(1)` | Invalid input | -//! | `-_\r\n` | `Symbol(0)` | `[251]` | `Err(0)` | Invalid input | -//! | `AA==AA==` | `[0, 0]` | `Err` | `Err(2)` | `\x00\x00` | +//! | Input | `data-encoding` | `base64` | GNU `base64` | +//! | ---------- | --------------- | --------- | ------------- | +//! | `AAB=` | `Trailing(2)` | `Last(2)` | `\x00\x00` | +//! | `AA\nB=` | `Length(4)` | `Length` | `\x00\x00` | +//! | `AAB` | `Length(0)` | `Last(2)` | Invalid input | +//! | `AAA` | `Length(0)` | `[0, 0]` | Invalid input | +//! | `A\rA\nB=` | `Length(4)` | `Byte(1)` | Invalid input | +//! | `-_\r\n` | `Symbol(0)` | `Byte(0)` | Invalid input | +//! | `AA==AA==` | `[0, 0]` | `Byte(2)` | `\x00\x00` | //! //! We can summarize these discrepancies as follows: //! -//! | Discrepancy | `data-encoding` | `rustc` | `base64` | GNU `base64` | -//! | ----------- | --------------- | ------- | -------- | ------------ | -//! | Check trailing bits | Yes | No | No | No | -//! | Ignored characters | None | `\r` and `\n` | None | `\n` | -//! | Translated characters | None | `-_` mapped to `+/` | None | None | -//! | Check padding | Yes | No | No | Yes | -//! | Support concatenated input | Yes | No | No | Yes | -//! -//! This crate permits to disable checking trailing bits. It permits to ignore -//! some characters. It permits to translate characters. It permits to use -//! unpadded encodings. However, for padded encodings, support for concatenated -//! inputs cannot be disabled. This is simply because it doesn't make sense to -//! use padding if it is not to support concatenated inputs. -//! -//! # Migration -//! -//! The [changelog] describes the changes between v1 and v2. Here are the -//! migration steps for common usage: +//! | Discrepancy | `data-encoding` | `base64` | GNU `base64` | +//! | -------------------------- | --------------- | -------- | ------------ | +//! | Check trailing bits | Yes | Yes | No | +//! | Ignored characters | None | None | `\n` | +//! | Translated characters | None | None | None | +//! | Check padding | Yes | No | Yes | +//! | Support concatenated input | Yes | No | Yes | //! -//! | v1 | v2 | -//! | --------------------------- | --------------------------- | -//! | `use data_encoding::baseNN` | `use data_encoding::BASENN` | -//! | `baseNN::function` | `BASENN.method` | -//! | `baseNN::function_nopad` | `BASENN_NOPAD.method` | +//! This crate permits to disable checking trailing bits. It permits to ignore some characters. It +//! permits to translate characters. It permits to use unpadded encodings. However, for padded +//! encodings, support for concatenated inputs cannot be disabled. This is simply because it doesn't +//! make sense to use padding if it is not to support concatenated inputs. //! +//! [RFC4648]: https://tools.ietf.org/html/rfc4648 +//! [`BASE32HEX`]: constant.BASE32HEX.html +//! [`BASE32`]: constant.BASE32.html +//! [`BASE64URL`]: constant.BASE64URL.html +//! [`BASE64`]: constant.BASE64.html //! [`Encoding`]: struct.Encoding.html +//! [`HEXUPPER`]: constant.HEXUPPER.html //! [`Specification`]: struct.Specification.html //! [`is_canonical`]: struct.Encoding.html#method.is_canonical -//! [`lazy_static`]: https://crates.io/crates/lazy_static -//! [RFC4648]: https://tools.ietf.org/html/rfc4648 -//! [base16]: constant.HEXUPPER.html -//! [base32]: constant.BASE32.html -//! [base32hex]: constant.BASE32HEX.html -//! [base64]: constant.BASE64.html -//! [base64url]: constant.BASE64URL.html //! [binary]: https://crates.io/crates/data-encoding-bin +//! [bit-order]: struct.Specification.html#structfield.bit_order //! [canonical]: https://tools.ietf.org/html/rfc4648#section-3.5 -//! [changelog]: -//! https://github.com/ia0/data-encoding/blob/master/lib/CHANGELOG.md //! [constants]: index.html#constants //! [crate]: https://crates.io/crates/data-encoding -//! [github]: https://github.com/ia0/data-encoding +//! [decoding]: struct.Encoding.html#method.decode_mut +//! [encoding]: struct.Encoding.html#method.encode_mut +//! [ignoring]: struct.Specification.html#structfield.ignore //! [macro]: https://crates.io/crates/data-encoding-macro +//! [padding]: struct.Specification.html#structfield.padding +//! [trailing bits]: struct.Specification.html#structfield.check_trailing_bits +//! [translation]: struct.Specification.html#structfield.translate +//! [website]: https://data-encoding.rs +//! [wrapping]: struct.Specification.html#structfield.wrap #![no_std] #![warn(unused_results, missing_docs)] @@ -221,6 +194,7 @@ define!(N6: usize = 6); #[derive(Copy, Clone)] struct On; + impl Static> for On { fn val(self) -> Option { None @@ -229,6 +203,7 @@ impl Static> for On { #[derive(Copy, Clone)] struct Os(T); + impl Static> for Os { fn val(self) -> Option { Some(self.0) @@ -268,18 +243,22 @@ unsafe fn chunk_unchecked(x: &[u8], n: usize, i: usize) -> &[u8] { let ptr = x.as_ptr().add(n * i); core::slice::from_raw_parts(ptr, n) } + unsafe fn chunk_mut_unchecked(x: &mut [u8], n: usize, i: usize) -> &mut [u8] { debug_assert!((i + 1) * n <= x.len()); let ptr = x.as_mut_ptr().add(n * i); core::slice::from_raw_parts_mut(ptr, n) } + unsafe fn as_array(x: &[u8]) -> &[u8; 256] { debug_assert_eq!(x.len(), 256); &*(x.as_ptr() as *const [u8; 256]) } + fn div_ceil(x: usize, m: usize) -> usize { (x + m - 1) / m } + fn floor(x: usize, m: usize) -> usize { x / m * m } @@ -300,13 +279,17 @@ fn vectorize(n: usize, bs: usize, mut f: F) { pub enum DecodeKind { /// Invalid length Length, + /// Invalid symbol Symbol, + /// Non-zero trailing bits Trailing, + /// Invalid padding length Padding, } + #[cfg(feature = "std")] impl std::fmt::Display for DecodeKind { fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { @@ -325,14 +308,16 @@ impl std::fmt::Display for DecodeKind { pub struct DecodeError { /// Error position /// - /// This position is always a valid input position and represents the first - /// encountered error. + /// This position is always a valid input position and represents the first encountered error. pub position: usize, + /// Error kind pub kind: DecodeKind, } + #[cfg(feature = "std")] impl std::error::Error for DecodeError {} + #[cfg(feature = "std")] impl std::fmt::Display for DecodeError { fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { @@ -345,14 +330,12 @@ impl std::fmt::Display for DecodeError { pub struct DecodePartial { /// Number of bytes read from input /// - /// This number does not exceed the error position: `read <= - /// error.position`. + /// This number does not exceed the error position: `read <= error.position`. pub read: usize, /// Number of bytes written to output /// - /// This number does not exceed the decoded length: `written <= - /// decode_len(read)`. + /// This number does not exceed the decoded length: `written <= decode_len(read)`. pub written: usize, /// Decoding error @@ -370,6 +353,7 @@ fn order(msb: bool, n: usize, i: usize) -> usize { i } } + fn enc(bit: usize) -> usize { debug_assert!(1 <= bit && bit <= 6); match bit { @@ -379,6 +363,7 @@ fn enc(bit: usize) -> usize { _ => unreachable!(), } } + fn dec(bit: usize) -> usize { enc(bit) * 8 / bit } @@ -386,6 +371,7 @@ fn dec(bit: usize) -> usize { fn encode_len>(bit: B, len: usize) -> usize { div_ceil(8 * len, bit.val()) } + fn encode_block, M: Static>( bit: B, msb: M, symbols: &[u8; 256], input: &[u8], output: &mut [u8], ) { @@ -402,6 +388,7 @@ fn encode_block, M: Static>( *output = symbols[y as usize % 256]; } } + fn encode_mut, M: Static>( bit: B, msb: M, symbols: &[u8; 256], input: &[u8], output: &mut [u8], ) { @@ -442,6 +429,7 @@ fn decode_block, M: Static>( } Ok(()) } + // Fails if an input character does not translate to a symbol. The error `pos` // is the lowest index of such character. The output is valid up to `pos / dec * // enc` excluded. @@ -460,6 +448,7 @@ fn decode_mut, M: Static>( decode_block(bit, msb, values, &input[dec * n ..], &mut output[enc * n ..]) .map_err(|e| dec * n + e) } + // Fails if there are non-zero trailing bits. fn check_trail, M: Static>( bit: B, msb: M, ctb: bool, values: &[u8; 256], input: &[u8], @@ -478,6 +467,7 @@ fn check_trail, M: Static>( check!((), values[input[input.len() - 1] as usize] & mask == 0); Ok(()) } + // Fails if the padding length is invalid. The error is the index of the first // padding character. fn check_pad>(bit: B, values: &[u8; 256], input: &[u8]) -> Result { @@ -493,6 +483,7 @@ fn check_pad>(bit: B, values: &[u8; 256], input: &[u8]) -> Resu fn encode_base_len>(bit: B, len: usize) -> usize { encode_len(bit, len) } + fn encode_base, M: Static>( bit: B, msb: M, symbols: &[u8; 256], input: &[u8], output: &mut [u8], ) { @@ -506,6 +497,7 @@ fn encode_pad_len, P: Static>>(bit: B, pad: P, len: Some(_) => div_ceil(len, enc(bit.val())) * dec(bit.val()), } } + fn encode_pad, M: Static, P: Static>>( bit: B, msb: M, symbols: &[u8; 256], spad: P, input: &[u8], output: &mut [u8], ) { @@ -535,6 +527,7 @@ fn encode_wrap_len< Some((col, end)) => olen + end.len() * div_ceil(olen, col), } } + fn encode_wrap_mut< 'a, B: Static, @@ -596,6 +589,7 @@ fn decode_pad_len, P: Static>( fn decode_base_len>(bit: B, len: usize) -> Result { decode_pad_len(bit, Bf, len) } + // Fails with Symbol if an input character does not translate to a symbol. The // error is the lowest index of such character. // Fails with Trailing if there are non-zero trailing bits. @@ -695,6 +689,7 @@ fn skip_ignore(values: &[u8; 256], input: &[u8], mut inpos: usize) -> usize { } inpos } + // Returns next input and output position. // Fails with Symbol if an input character does not translate to a symbol. The // error is the lowest index of such character. @@ -731,6 +726,7 @@ fn decode_wrap_block, M: Static, P: Static>( })?; Ok((inpos, written)) } + // Fails with Symbol if an input character does not translate to a symbol. The // error is the lowest index of such character. // Fails with Padding if some padding length is invalid. The error is the index @@ -792,41 +788,42 @@ fn decode_wrap_mut, M: Static, P: Static, I: Static /// Order in which bits are read from a byte /// -/// The base-conversion encoding is always little-endian. This means that the -/// least significant *byte* is always first. However, we can still choose -/// whether, within a byte, this is the most significant or the least -/// significant *bit* that is first. If the terminology is confusing, testing on -/// an asymmetrical example should be enough to choose the correct value. +/// The base-conversion encoding is always little-endian. This means that the least significant +/// **byte** is always first. However, we can still choose whether, within a byte, this is the most +/// significant or the least significant **bit** that is first. If the terminology is confusing, +/// testing on an asymmetrical example should be enough to choose the correct value. /// /// # Examples /// -/// In the following example, we can see that a base with the -/// `MostSignificantFirst` bit-order has the most significant bit first in the -/// encoded output. In particular, the output is in the same order as the bits -/// in the byte. The opposite happens with the `LeastSignificantFirst` -/// bit-order. The least significant bit is first and the output is in the -/// reverse order. +/// In the following example, we can see that a base with the `MostSignificantFirst` bit-order has +/// the most significant bit first in the encoded output. In particular, the output is in the same +/// order as the bits in the byte. The opposite happens with the `LeastSignificantFirst` bit-order. +/// The least significant bit is first and the output is in the reverse order. /// /// ```rust /// use data_encoding::{BitOrder, Specification}; /// let mut spec = Specification::new(); /// spec.symbols.push_str("01"); -/// // spec.bit_order = BitOrder::MostSignificantFirst; // default +/// spec.bit_order = BitOrder::MostSignificantFirst; // default /// let msb = spec.encoding().unwrap(); /// spec.bit_order = BitOrder::LeastSignificantFirst; /// let lsb = spec.encoding().unwrap(); /// assert_eq!(msb.encode(&[0b01010011]), "01010011"); /// assert_eq!(lsb.encode(&[0b01010011]), "11001010"); /// ``` +/// +/// # Features +/// +/// Requires the `alloc` feature. #[derive(Debug, Copy, Clone, PartialEq, Eq)] #[cfg(feature = "alloc")] pub enum BitOrder { /// Most significant bit first /// - /// This is the most common and most intuitive bit-order. In particular, - /// this is the bit-order used by [RFC4648] and thus the usual hexadecimal, - /// base64, base32, base64url, and base32hex encodings. This is the default - /// bit-order when [specifying](struct.Specification.html) a base. + /// This is the most common and most intuitive bit-order. In particular, this is the bit-order + /// used by [RFC4648] and thus the usual hexadecimal, base64, base32, base64url, and base32hex + /// encodings. This is the default bit-order when [specifying](struct.Specification.html) a + /// base. /// /// [RFC4648]: https://tools.ietf.org/html/rfc4648 MostSignificantFirst, @@ -859,8 +856,7 @@ pub type InternalEncoding = &'static [u8]; /// Base-conversion encoding /// -/// See [Specification](struct.Specification.html) for technical details or how -/// to define a new one. +/// See [Specification](struct.Specification.html) for technical details or how to define a new one. // Required fields: // 0 - 256 (256) symbols // 256 - 512 (256) values @@ -885,15 +881,20 @@ pub struct Encoding(pub InternalEncoding); /// How to translate characters when decoding /// -/// The order matters. The first character of the `from` field is translated to -/// the first character of the `to` field. The second to the second. Etc. +/// The order matters. The first character of the `from` field is translated to the first character +/// of the `to` field. The second to the second. Etc. /// /// See [Specification](struct.Specification.html) for more information. +/// +/// # Features +/// +/// Requires the `alloc` feature. #[derive(Debug, Clone)] #[cfg(feature = "alloc")] pub struct Translate { /// Characters to translate from pub from: String, + /// Characters to translate to pub to: String, } @@ -901,6 +902,10 @@ pub struct Translate { /// How to wrap the output when encoding /// /// See [Specification](struct.Specification.html) for more information. +/// +/// # Features +/// +/// Requires the `alloc` feature. #[derive(Debug, Clone)] #[cfg(feature = "alloc")] pub struct Wrap { @@ -923,43 +928,38 @@ pub struct Wrap { /// Base-conversion specification /// -/// It is possible to define custom encodings given a specification. To do so, -/// it is important to understand the theory first. +/// It is possible to define custom encodings given a specification. To do so, it is important to +/// understand the theory first. /// /// # Theory /// -/// Each subsection has an equivalent subsection in the [Practice](#practice) -/// section. +/// Each subsection has an equivalent subsection in the [Practice](#practice) section. /// /// ## Basics /// -/// The main idea of a [base-conversion] encoding is to see `[u8]` as numbers -/// written in little-endian base256 and convert them in another little-endian -/// base. For performance reasons, this crate restricts this other base to be of -/// size 2 (binary), 4 (base4), 8 (octal), 16 (hexadecimal), 32 (base32), or 64 -/// (base64). The converted number is written as `[u8]` although it doesn't use -/// all the 256 possible values of `u8`. This crate encodes to ASCII, so only -/// values smaller than 128 are allowed. +/// The main idea of a [base-conversion] encoding is to see `[u8]` as numbers written in +/// little-endian base256 and convert them in another little-endian base. For performance reasons, +/// this crate restricts this other base to be of size 2 (binary), 4 (base4), 8 (octal), 16 +/// (hexadecimal), 32 (base32), or 64 (base64). The converted number is written as `[u8]` although +/// it doesn't use all the 256 possible values of `u8`. This crate encodes to ASCII, so only values +/// smaller than 128 are allowed. /// /// More precisely, we need the following elements: /// -/// - The bit-width N: 1 for binary, 2 for base4, 3 for octal, 4 for -/// hexadecimal, 5 for base32, and 6 for base64 +/// - The bit-width N: 1 for binary, 2 for base4, 3 for octal, 4 for hexadecimal, 5 for base32, and +/// 6 for base64 /// - The [bit-order](enum.BitOrder.html): most or least significant bit first -/// - The symbols function S from [0, 2N) (called values and written -/// `uN`) to symbols (represented as `u8` although only ASCII symbols are -/// allowed, i.e. smaller than 128) -/// - The values partial function V from ASCII to [0, 2N), i.e. from -/// `u8` to `uN` -/// - Whether trailing bits are checked: trailing bits are leading zeros in -/// theory, but since numbers are little-endian they come last -/// -/// For the encoding to be correct (i.e. encoding then decoding gives back the -/// initial input), V(S(i)) must be defined and equal to i for all i in [0, -/// 2N). For the encoding to be [canonical][canonical] (i.e. -/// different inputs decode to different outputs, or equivalently, decoding then -/// encoding gives back the initial input), trailing bits must be checked and if -/// V(i) is defined then S(V(i)) is equal to i for all i. +/// - The symbols function S from [0, 2N) (called values and written `uN`) to symbols +/// (represented as `u8` although only ASCII symbols are allowed, i.e. smaller than 128) +/// - The values partial function V from ASCII to [0, 2N), i.e. from `u8` to `uN` +/// - Whether trailing bits are checked: trailing bits are leading zeros in theory, but since +/// numbers are little-endian they come last +/// +/// For the encoding to be correct (i.e. encoding then decoding gives back the initial input), +/// V(S(i)) must be defined and equal to i for all i in [0, 2N). For the encoding to be +/// [canonical][canonical] (i.e. different inputs decode to different outputs, or equivalently, +/// decoding then encoding gives back the initial input), trailing bits must be checked and if V(i) +/// is defined then S(V(i)) is equal to i for all i. /// /// Encoding and decoding are given by the following pipeline: /// @@ -981,46 +981,43 @@ pub struct Wrap { /// /// - the bit-width is 3 (octal), 5 (base32), or 6 (base64) /// - the length of the data to encode is not known in advance +/// - the data must be sent without buffering /// -/// Bases for which the bit-width N does not divide 8 may not concatenate -/// encoded data. This comes from the fact that it is not possible to make the -/// difference between trailing bits and encoding bits. Padding solves this -/// issue by adding a new character (which is not a symbol) to discriminate -/// between trailing bits and encoding bits. The idea is to work by blocks of -/// lcm(8, N) bits, where lcm(8, N) is the least common multiple of 8 and N. -/// When such block is not complete, it is padded. +/// Bases for which the bit-width N does not divide 8 may not concatenate encoded data. This comes +/// from the fact that it is not possible to make the difference between trailing bits and encoding +/// bits. Padding solves this issue by adding a new character to discriminate between trailing bits +/// and encoding bits. The idea is to work by blocks of lcm(8, N) bits, where lcm(8, N) is the least +/// common multiple of 8 and N. When such block is not complete, it is padded. /// /// To preserve correctness, the padding character must not be a symbol. /// /// ### Ignore characters when decoding /// -/// Ignoring characters when decoding is useful if after encoding some -/// characters are added for convenience or any other reason (like wrapping). In -/// that case we want to first ignore thoses characters before decoding. +/// Ignoring characters when decoding is useful if after encoding some characters are added for +/// convenience or any other reason (like wrapping). In that case we want to first ignore thoses +/// characters before decoding. /// -/// To preserve correctness, ignored characters must not contain symbols or the -/// padding character. +/// To preserve correctness, ignored characters must not contain symbols or the padding character. /// /// ### Wrap output when encoding /// -/// Wrapping output when encoding is useful if the output is meant to be printed -/// in a document where width is limited (typically 80-columns documents). In -/// that case, the wrapping width and the wrapping separator have to be defined. +/// Wrapping output when encoding is useful if the output is meant to be printed in a document where +/// width is limited (typically 80-columns documents). In that case, the wrapping width and the +/// wrapping separator have to be defined. /// -/// To preserve correctness, the wrapping separator characters must be ignored -/// (see previous subsection). As such, wrapping separator characters must also -/// not contain symbols or the padding character. +/// To preserve correctness, the wrapping separator characters must be ignored (see previous +/// subsection). As such, wrapping separator characters must also not contain symbols or the padding +/// character. /// /// ### Translate characters when decoding /// -/// Translating characters when decoding is useful when encoded data may be -/// copied by a humain instead of a machine. Humans tend to confuse some -/// characters for others. In that case we want to translate those characters -/// before decoding. +/// Translating characters when decoding is useful when encoded data may be copied by a humain +/// instead of a machine. Humans tend to confuse some characters for others. In that case we want to +/// translate those characters before decoding. /// -/// To preserve correctness, the characters we translate from must not contain -/// symbols or the padding character, and the characters we translate to must -/// only contain symbols or the padding character. +/// To preserve correctness, the characters we translate _from_ must not contain symbols or the +/// padding character, and the characters we translate _to_ must only contain symbols or the padding +/// character. /// /// # Practice /// @@ -1041,12 +1038,11 @@ pub struct Wrap { /// assert_eq!(hexadecimal.encode(b"Bit"), "426974"); /// ``` /// -/// The `binary` base has 2 symbols `0` and `1` with value 0 and 1 respectively. -/// The `octal` base has 8 symbols `0` to `7` with value 0 to 7. The -/// `hexadecimal` base has 16 symbols `0` to `9` and `a` to `f` with value 0 to -/// 15. The following diagram gives the idea of how encoding works in the -/// previous example (note that we can actually write such diagram only because -/// the bit-order is most significant first): +/// The `binary` base has 2 symbols `0` and `1` with value 0 and 1 respectively. The `octal` base +/// has 8 symbols `0` to `7` with value 0 to 7. The `hexadecimal` base has 16 symbols `0` to `9` and +/// `a` to `f` with value 0 to 15. The following diagram gives the idea of how encoding works in the +/// previous example (note that we can actually write such diagram only because the bit-order is +/// most significant first): /// /// ```text /// [ octal] | 2 : 0 : 4 : 6 : 4 : 5 : 6 : 4 | @@ -1055,13 +1051,12 @@ pub struct Wrap { /// ^-- LSB ^-- MSB /// ``` /// -/// Note that in theory, these little-endian numbers are read from right to left -/// (the most significant bit is at the right). Since leading zeros are -/// meaningless (in our usual decimal notation 0123 is the same as 123), it -/// explains why trailing bits must be zero. Trailing bits may occur when the -/// bit-width of a base does not divide 8. Only binary, base4, and hexadecimal -/// don't have trailing bits issues. So let's consider octal and base64, which -/// have trailing bits in similar circumstances: +/// Note that in theory, these little-endian numbers are read from right to left (the most +/// significant bit is at the right). Since leading zeros are meaningless (in our usual decimal +/// notation 0123 is the same as 123), it explains why trailing bits must be zero. Trailing bits may +/// occur when the bit-width of a base does not divide 8. Only binary, base4, and hexadecimal don't +/// have trailing bits issues. So let's consider octal and base64, which have trailing bits in +/// similar circumstances: /// /// ```rust /// use data_encoding::{Specification, BASE64_NOPAD}; @@ -1074,8 +1069,7 @@ pub struct Wrap { /// assert_eq!(octal.encode(b"B"), "204"); /// ``` /// -/// We have the following diagram, where the base64 values are written between -/// parentheses: +/// We have the following diagram, where the base64 values are written between parentheses: /// /// ```text /// [base64] | Q(16) : g(32) : [has 4 zero trailing bits] @@ -1089,8 +1083,8 @@ pub struct Wrap { /// /// ### Padding /// -/// For octal and base64, lcm(8, 3) == lcm(8, 6) == 24 bits or 3 bytes. For -/// base32, lcm(8, 5) is 40 bits or 5 bytes. Let's consider octal and base64: +/// For octal and base64, lcm(8, 3) == lcm(8, 6) == 24 bits or 3 bytes. For base32, lcm(8, 5) is 40 +/// bits or 5 bytes. Let's consider octal and base64: /// /// ```rust /// use data_encoding::{Specification, BASE64}; @@ -1128,8 +1122,8 @@ pub struct Wrap { /// /// ### Ignore characters when decoding /// -/// The typical use-case is to ignore newlines (`\r` and `\n`). But to keep the -/// example small, we will ignore spaces. +/// The typical use-case is to ignore newlines (`\r` and `\n`). But to keep the example small, we +/// will ignore spaces. /// /// ```rust /// let mut spec = data_encoding::HEXLOWER.specification(); @@ -1140,9 +1134,8 @@ pub struct Wrap { /// /// ### Wrap output when encoding /// -/// The typical use-case is to wrap after 64 or 76 characters with a newline -/// (`\r\n` or `\n`). But to keep the example small, we will wrap after 8 -/// characters with a space. +/// The typical use-case is to wrap after 64 or 76 characters with a newline (`\r\n` or `\n`). But +/// to keep the example small, we will wrap after 8 characters with a space. /// /// ```rust /// let mut spec = data_encoding::BASE64.specification(); @@ -1156,9 +1149,8 @@ pub struct Wrap { /// /// ### Translate characters when decoding /// -/// The typical use-case is to translate lowercase to uppercase or reciprocally, -/// but it is also used for letters that look alike, like `O0` or `Il1`. Let's -/// illustrate both examples. +/// The typical use-case is to translate lowercase to uppercase or reciprocally, but it is also used +/// for letters that look alike, like `O0` or `Il1`. Let's illustrate both examples. /// /// ```rust /// let mut spec = data_encoding::HEXLOWER.specification(); @@ -1168,56 +1160,56 @@ pub struct Wrap { /// assert_eq!(base.decode(b"BOIl"), base.decode(b"b011")); /// ``` /// -/// [base-conversion]: -/// https://en.wikipedia.org/wiki/Positional_notation#Base_conversion +/// # Features +/// +/// Requires the `alloc` feature. +/// +/// [base-conversion]: https://en.wikipedia.org/wiki/Positional_notation#Base_conversion /// [canonical]: https://tools.ietf.org/html/rfc4648#section-3.5 #[derive(Debug, Clone)] #[cfg(feature = "alloc")] pub struct Specification { /// Symbols /// - /// The number of symbols must be 2, 4, 8, 16, 32, or 64. Symbols must be - /// ASCII characters (smaller than 128) and they must be unique. + /// The number of symbols must be 2, 4, 8, 16, 32, or 64. Symbols must be ASCII characters + /// (smaller than 128) and they must be unique. pub symbols: String, /// Bit-order /// - /// The default is to use most significant bit first since it is the most - /// common. + /// The default is to use most significant bit first since it is the most common. pub bit_order: BitOrder, /// Check trailing bits /// - /// The default is to check trailing bits. This field is ignored when - /// unnecessary (i.e. for base2, base4, and base16). + /// The default is to check trailing bits. This field is ignored when unnecessary (i.e. for + /// base2, base4, and base16). pub check_trailing_bits: bool, /// Padding /// - /// The default is to not use padding. The padding character must be ASCII - /// and must not be a symbol. + /// The default is to not use padding. The padding character must be ASCII and must not be a + /// symbol. pub padding: Option, /// Characters to ignore when decoding /// - /// The default is to not ignore characters when decoding. The characters to - /// ignore must be ASCII and must not be symbols or the padding character. + /// The default is to not ignore characters when decoding. The characters to ignore must be + /// ASCII and must not be symbols or the padding character. pub ignore: String, /// How to wrap the output when encoding /// - /// The default is to not wrap the output when encoding. The wrapping - /// characters must be ASCII and must not be symbols or the padding - /// character. + /// The default is to not wrap the output when encoding. The wrapping characters must be ASCII + /// and must not be symbols or the padding character. pub wrap: Wrap, /// How to translate characters when decoding /// - /// The default is to not translate characters when decoding. The characters - /// to translate from must be ASCII and must not have already been assigned - /// a semantics. The characters to translate to must be ASCII and must have - /// been assigned a semantics (symbol, padding character, or ignored - /// character). + /// The default is to not translate characters when decoding. The characters to translate from + /// must be ASCII and must not have already been assigned a semantics. The characters to + /// translate to must be ASCII and must have been assigned a semantics (symbol, padding + /// character, or ignored character). pub translate: Translate, } @@ -1232,9 +1224,11 @@ impl Encoding { fn sym(&self) -> &[u8; 256] { unsafe { as_array(&self.0[0 .. 256]) } } + fn val(&self) -> &[u8; 256] { unsafe { as_array(&self.0[256 .. 512]) } } + fn pad(&self) -> Option { if self.0[512] < 128 { Some(self.0[512]) @@ -1242,21 +1236,26 @@ impl Encoding { None } } + fn ctb(&self) -> bool { self.0[513] & 0x10 != 0 } + fn msb(&self) -> bool { self.0[513] & 0x8 != 0 } + fn bit(&self) -> usize { (self.0[513] & 0x7) as usize } + fn wrap(&self) -> Option<(usize, &[u8])> { if self.0.len() <= 515 { return None; } Some((self.0[514] as usize, &self.0[515 ..])) } + fn has_ignore(&self) -> bool { self.0.len() >= 515 } @@ -1279,8 +1278,8 @@ impl Encoding { /// /// # Panics /// - /// Panics if the `output` length does not match the result of - /// [`encode_len`] for the `input` length. + /// Panics if the `output` length does not match the result of [`encode_len`] for the `input` + /// length. /// /// # Examples /// @@ -1318,6 +1317,10 @@ impl Encoding { /// BASE64.encode_append(input, &mut output); /// assert_eq!(output, "Result: SGVsbG8gd29ybGQ="); /// ``` + /// + /// # Features + /// + /// Requires the `alloc` feature. #[cfg(feature = "alloc")] pub fn encode_append(&self, input: &[u8], output: &mut String) { let output = unsafe { output.as_mut_vec() }; @@ -1334,6 +1337,10 @@ impl Encoding { /// use data_encoding::BASE64; /// assert_eq!(BASE64.encode(b"Hello world"), "SGVsbG8gd29ybGQ="); /// ``` + /// + /// # Features + /// + /// Requires the `alloc` feature. #[cfg(feature = "alloc")] pub fn encode(&self, input: &[u8]) -> String { let mut output = vec![0u8; self.encode_len(input.len())]; @@ -1347,8 +1354,8 @@ impl Encoding { /// /// # Errors /// - /// Returns an error if `len` is invalid. The error kind is [`Length`] and - /// the [position] is the greatest valid input length. + /// Returns an error if `len` is invalid. The error kind is [`Length`] and the [position] is the + /// greatest valid input length. /// /// [`decode_mut`]: struct.Encoding.html#method.decode_mut /// [`Length`]: enum.DecodeKind.html#variant.Length @@ -1368,26 +1375,24 @@ impl Encoding { /// Decodes `input` in `output` /// - /// Returns the length of the decoded output. This length may be smaller - /// than the output length if the input contained padding or ignored - /// characters. The output bytes after the returned length are not - /// initialized and should not be read. + /// Returns the length of the decoded output. This length may be smaller than the output length + /// if the input contained padding or ignored characters. The output bytes after the returned + /// length are not initialized and should not be read. /// /// # Panics /// - /// Panics if the `output` length does not match the result of - /// [`decode_len`] for the `input` length. Also panics if `decode_len` fails - /// for the `input` length. + /// Panics if the `output` length does not match the result of [`decode_len`] for the `input` + /// length. Also panics if `decode_len` fails for the `input` length. /// /// # Errors /// - /// Returns an error if `input` is invalid. See [`decode`] for more details. - /// The are two differences though: + /// Returns an error if `input` is invalid. See [`decode`] for more details. The are two + /// differences though: /// - /// - [`Length`] may be returned only if the encoding allows ignored - /// characters, because otherwise this is already checked by [`decode_len`]. - /// - The [`read`] first bytes of the input have been successfully decoded - /// to the [`written`] first bytes of the output. + /// - [`Length`] may be returned only if the encoding allows ignored characters, because + /// otherwise this is already checked by [`decode_len`]. + /// - The [`read`] first bytes of the input have been successfully decoded to the [`written`] + /// first bytes of the output. /// /// # Examples /// @@ -1424,16 +1429,16 @@ impl Encoding { /// /// Returns an error if `input` is invalid. The error kind can be: /// - /// - [`Length`] if the input length is invalid. The [position] is the - /// greatest valid input length. - /// - [`Symbol`] if the input contains an invalid character. The [position] - /// is the first invalid character. - /// - [`Trailing`] if the input has non-zero trailing bits. This is only - /// possible if the encoding checks trailing bits. The [position] is the - /// first character containing non-zero trailing bits. - /// - [`Padding`] if the input has an invalid padding length. This is only - /// possible if the encoding uses padding. The [position] is the first - /// padding character of the first padding of invalid length. + /// - [`Length`] if the input length is invalid. The [position] is the greatest valid input + /// length. + /// - [`Symbol`] if the input contains an invalid character. The [position] is the first invalid + /// character. + /// - [`Trailing`] if the input has non-zero trailing bits. This is only possible if the + /// encoding checks trailing bits. The [position] is the first character containing non-zero + /// trailing bits. + /// - [`Padding`] if the input has an invalid padding length. This is only possible if the + /// encoding uses padding. The [position] is the first padding character of the first padding + /// of invalid length. /// /// # Examples /// @@ -1442,6 +1447,10 @@ impl Encoding { /// assert_eq!(BASE64.decode(b"SGVsbA==byB3b3JsZA==").unwrap(), b"Hello world"); /// ``` /// + /// # Features + /// + /// Requires the `alloc` feature. + /// /// [`Length`]: enum.DecodeKind.html#variant.Length /// [`Symbol`]: enum.DecodeKind.html#variant.Symbol /// [`Trailing`]: enum.DecodeKind.html#variant.Trailing @@ -1490,6 +1499,10 @@ impl Encoding { } /// Returns the encoding specification + /// + /// # Features + /// + /// Requires the `alloc` feature. #[cfg(feature = "alloc")] pub fn specification(&self) -> Specification { let mut specification = Specification::new(); @@ -1560,6 +1573,10 @@ enum SpecificationErrorImpl { use crate::SpecificationErrorImpl::*; /// Specification error +/// +/// # Features +/// +/// Requires the `alloc` feature. #[derive(Debug, Copy, Clone)] #[cfg(feature = "alloc")] pub struct SpecificationError(SpecificationErrorImpl);