Skip to content

Commit

Permalink
doc: use HTTPS in links
Browse files Browse the repository at this point in the history
PR #726
  • Loading branch information
atouchet committed Jan 12, 2021
1 parent 2bab987 commit 259863d
Show file tree
Hide file tree
Showing 6 changed files with 26 additions and 26 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,12 +245,12 @@ supported version of Rust.
This project is licensed under either of

* Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
http://www.apache.org/licenses/LICENSE-2.0)
https://www.apache.org/licenses/LICENSE-2.0)
* MIT license ([LICENSE-MIT](LICENSE-MIT) or
http://opensource.org/licenses/MIT)
https://opensource.org/licenses/MIT)

at your option.

The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode
License Agreement
([LICENSE-UNICODE](http://www.unicode.org/copyright.html#License)).
([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)).
34 changes: 17 additions & 17 deletions UNICODE.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Unicode conformance

This document describes the regex crate's conformance to Unicode's
[UTS#18](http://unicode.org/reports/tr18/)
[UTS#18](https://unicode.org/reports/tr18/)
report, which lays out 3 levels of support: Basic, Extended and Tailored.

Full support for Level 1 ("Basic Unicode Support") is provided with two
Expand All @@ -10,7 +10,7 @@ exceptions:
1. Line boundaries are not Unicode aware. Namely, only the `\n`
(`END OF LINE`) character is recognized as a line boundary.
2. The compatibility properties specified by
[RL1.2a](http://unicode.org/reports/tr18/#RL1.2a)
[RL1.2a](https://unicode.org/reports/tr18/#RL1.2a)
are ASCII-only definitions.

Little to no support is provided for either Level 2 or Level 3. For the most
Expand Down Expand Up @@ -61,18 +61,18 @@ provide a convenient way to construct character classes of groups of code
points specified by Unicode. The regex crate does not provide exhaustive
support, but covers a useful subset. In particular:

* [General categories](http://unicode.org/reports/tr18/#General_Category_Property)
* [Scripts and Script Extensions](http://unicode.org/reports/tr18/#Script_Property)
* [Age](http://unicode.org/reports/tr18/#Age)
* [General categories](https://unicode.org/reports/tr18/#General_Category_Property)
* [Scripts and Script Extensions](https://unicode.org/reports/tr18/#Script_Property)
* [Age](https://unicode.org/reports/tr18/#Age)
* A smattering of boolean properties, including all of those specified by
[RL1.2](http://unicode.org/reports/tr18/#RL1.2) explicitly.
[RL1.2](https://unicode.org/reports/tr18/#RL1.2) explicitly.

In all cases, property name and value abbreviations are supported, and all
names/values are matched loosely without regard for case, whitespace or
underscores. Property name aliases can be found in Unicode's
[`PropertyAliases.txt`](http://www.unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt)
[`PropertyAliases.txt`](https://www.unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt)
file, while property value aliases can be found in Unicode's
[`PropertyValueAliases.txt`](http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt)
[`PropertyValueAliases.txt`](https://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt)
file.

The syntax supported is also consistent with the UTS#18 recommendation:
Expand Down Expand Up @@ -149,10 +149,10 @@ properties correspond to properties required by RL1.2):

## RL1.2a Compatibility Properties

[UTS#18 RL1.2a](http://unicode.org/reports/tr18/#RL1.2a)
[UTS#18 RL1.2a](https://unicode.org/reports/tr18/#RL1.2a)

The regex crate only provides ASCII definitions of the
[compatibility properties documented in UTS#18 Annex C](http://unicode.org/reports/tr18/#Compatibility_Properties)
[compatibility properties documented in UTS#18 Annex C](https://unicode.org/reports/tr18/#Compatibility_Properties)
(sans the `\X` class, for matching grapheme clusters, which isn't provided
at all). This is because it seems to be consistent with most other regular
expression engines, and in particular, because these are often referred to as
Expand All @@ -165,7 +165,7 @@ Their traditional ASCII definition can be used by disabling Unicode. That is,

## RL1.3 Subtraction and Intersection

[UTS#18 RL1.3](http://unicode.org/reports/tr18/#Subtraction_and_Intersection)
[UTS#18 RL1.3](https://unicode.org/reports/tr18/#Subtraction_and_Intersection)

The regex crate provides full support for nested character classes, along with
union, intersection (`&&`), difference (`--`) and symmetric difference (`~~`)
Expand All @@ -178,7 +178,7 @@ For example, to match all non-ASCII letters, you could use either

## RL1.4 Simple Word Boundaries

[UTS#18 RL1.4](http://unicode.org/reports/tr18/#Simple_Word_Boundaries)
[UTS#18 RL1.4](https://unicode.org/reports/tr18/#Simple_Word_Boundaries)

The regex crate provides basic Unicode aware word boundary assertions. A word
boundary assertion can be written as `\b`, or `\B` as its negation. A word
Expand All @@ -196,9 +196,9 @@ the following classes:
* `\p{gc:Connector_Punctuation}`

In particular, this differs slightly from the
[prescription given in RL1.4](http://unicode.org/reports/tr18/#Simple_Word_Boundaries)
[prescription given in RL1.4](https://unicode.org/reports/tr18/#Simple_Word_Boundaries)
but is permissible according to
[UTS#18 Annex C](http://unicode.org/reports/tr18/#Compatibility_Properties).
[UTS#18 Annex C](https://unicode.org/reports/tr18/#Compatibility_Properties).
Namely, it is convenient and simpler to have `\w` and `\b` be in sync with
one another.

Expand All @@ -211,7 +211,7 @@ boundaries is currently sub-optimal on non-ASCII text.

## RL1.5 Simple Loose Matches

[UTS#18 RL1.5](http://unicode.org/reports/tr18/#Simple_Loose_Matches)
[UTS#18 RL1.5](https://unicode.org/reports/tr18/#Simple_Loose_Matches)

The regex crate provides full support for case insensitive matching in
accordance with RL1.5. That is, it uses the "simple" case folding mapping. The
Expand All @@ -226,7 +226,7 @@ then all characters classes are case folded as well.

## RL1.6 Line Boundaries

[UTS#18 RL1.6](http://unicode.org/reports/tr18/#Line_Boundaries)
[UTS#18 RL1.6](https://unicode.org/reports/tr18/#Line_Boundaries)

The regex crate only provides support for recognizing the `\n` (`END OF LINE`)
character as a line boundary. This choice was made mostly for implementation
Expand All @@ -239,7 +239,7 @@ well, and in theory, this could be done efficiently.

## RL1.7 Code Points

[UTS#18 RL1.7](http://unicode.org/reports/tr18/#Supplementary_Characters)
[UTS#18 RL1.7](https://unicode.org/reports/tr18/#Supplementary_Characters)

The regex crate provides full support for Unicode code point matching. Namely,
the fundamental atom of any match is always a single code point.
Expand Down
4 changes: 2 additions & 2 deletions regex-syntax/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ pub fn is_meta_character(c: char) -> bool {
/// character.
///
/// A Unicode word character is defined by
/// [UTS#18 Annex C](http://unicode.org/reports/tr18/#Compatibility_Properties).
/// [UTS#18 Annex C](https://unicode.org/reports/tr18/#Compatibility_Properties).
/// In particular, a character
/// is considered a word character if it is in either of the `Alphabetic` or
/// `Join_Control` properties, or is in one of the `Decimal_Number`, `Mark`
Expand All @@ -236,7 +236,7 @@ pub fn is_word_character(c: char) -> bool {
/// character.
///
/// A Unicode word character is defined by
/// [UTS#18 Annex C](http://unicode.org/reports/tr18/#Compatibility_Properties).
/// [UTS#18 Annex C](https://unicode.org/reports/tr18/#Compatibility_Properties).
/// In particular, a character
/// is considered a word character if it is in either of the `Alphabetic` or
/// `Join_Control` properties, or is in one of the `Decimal_Number`, `Mark`
Expand Down
2 changes: 1 addition & 1 deletion regex-syntax/src/unicode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -823,7 +823,7 @@ fn symbolic_name_normalize(x: &str) -> String {
/// The slice returned is guaranteed to be valid UTF-8 for all possible values
/// of `slice`.
///
/// See: http://unicode.org/reports/tr44/#UAX44-LM3
/// See: https://unicode.org/reports/tr44/#UAX44-LM3
fn symbolic_name_normalize_bytes(slice: &mut [u8]) -> &mut [u8] {
// I couldn't find a place in the standard that specified that property
// names/aliases had a particular structure (unlike character names), but
Expand Down
4 changes: 2 additions & 2 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ assert_eq!((mat.start(), mat.end()), (3, 23));
```
For a more detailed breakdown of Unicode support with respect to
[UTS#18](http://unicode.org/reports/tr18/),
[UTS#18](https://unicode.org/reports/tr18/),
please see the
[UNICODE](https://github.com/rust-lang/regex/blob/master/UNICODE.md)
document in the root of the regex repository.
Expand Down Expand Up @@ -455,7 +455,7 @@ assert_eq!(&cap[0], "abc");
## Perl character classes (Unicode friendly)
These classes are based on the definitions provided in
[UTS#18](http://www.unicode.org/reports/tr18/#Compatibility_Properties):
[UTS#18](https://www.unicode.org/reports/tr18/#Compatibility_Properties):
<pre class="rust">
\d digit (\p{Nd})
Expand Down
2 changes: 1 addition & 1 deletion src/sparse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ use std::slice;
/// entire set can also be done in constant time. Iteration yields elements
/// in the order in which they were inserted.
///
/// The data structure is based on: http://research.swtch.com/sparse
/// The data structure is based on: https://research.swtch.com/sparse
/// Note though that we don't actually use uninitialized memory. We generally
/// reuse allocations, so the initial allocation cost is bareable. However,
/// its other properties listed above are extremely useful.
Expand Down

0 comments on commit 259863d

Please sign in to comment.