- Title: String in Rust
- Slug: rust-str
- Date: 2020-04-20 08:57:29
- Category: Computer Science
- Tags: programming, Rust, string, str, character, bytes
- Author: Ben Du
- Modified: 2021-06-16 09:07:09


## Tips and Traps 

1. Rust has 2 different string types `String` and `str`. 
    `String` is a MUTABLE (different from Java and Python), heap-allocated data structure 
    while `str` is an immutable sequence of UTF-8 bytes somewhere in memory 
    (static storage, heap or stack).
    `String` owns the memory for it while `&str` does NOT.

2. Since the size of `str` is unknown, 
    one can only handle it behind a pointer. 
    This means that `str` most commonly appears as `&str`: 
    a reference to some UTF-8 data, normally called a "string slice" or just a "slice".

3. `&String` is a reference to a `String`
    and is also called a borrowed type.
    It is nothing more than a pointer 
    which you can pass around without giving up ownership. 
    `&String` can be coerced to a `&str` implicitly.

4. If you want a rea-only view of a string, 
    `&str` is preferred.
    If you want to own and mutate a string,
    `String` should be used.
    For example,
    `String` should be used for returning strings created within a function
    or (usually) when storing sstrings in a struct or enum.
    
    
5. Indexing into a string is not available in Rust. 
    The reason for this is that Rust strings are encoded in UTF-8 internally, 
    so the concept of indexing itself would be ambiguous and people would misuse it. 
    Byte indexing is fast, 
    but almost always incorrect 
    (when your text contains non-ASCII symbols, 
    byte indexing may leave you inside a character, 
    which is really bad if you need text processing) 
    while char indexing is not free because UTF-8 is a variable-length encoding, 
    so you have to traverse the entire string to find the required code point.
    
    There are 2 ways to get chars out of a string.
    First, 
    you can call the `chars` method which returns an iterator. 
    This ways is not efficient of course if you want random access.
    Second, 
    you can get the underlying bytes representation of a string 
    by calling the `as_bytes` method 
    (which returns a byte slice `&[u8]`. 
    You can then index the byte slice and convert a `u8` variable to `char` 
    using the `as` keyword.

5. `let my_str = "Hello World";` defines a `&str` (not `String`). 

6. If you have a `&str` and want a new `String`,
    you can clone it either by `to_owned()` or `to_string()` 
    (they are effectively the same).
    Both of those 2 methods will copy the memory and make a new String.

## &str

Primitive, immutable, fixed length.

In [2]:
let mut s: &str = "how are you";
s

"how are you"

In [4]:
let s2 = String::from("abc");
s2[0]

Error: the type `String` cannot be indexed by `{integer}`

In [3]:
s[0]

Error: the type `str` cannot be indexed by `{integer}`

In [11]:
s + 'a'

Error: cannot add `char` to `&str`

In [8]:
s.chars()

Chars(['h', 'o', 'w', ' ', 'a', 'r', 'e', ' ', 'y', 'o', 'u'])

In [10]:
s.chars().nth(4)

Some('a')

In [20]:
s.push('c2')

Error: character literal may only contain one codepoint

Error: expected one of `.`, `;`, `?`, `}`, or an operator, found `evcxr_variable_store`

Error: no method named `push` found for type `&str` in the current scope

In [21]:
s.is_empty()

false

In [3]:
s.len()

11

## String

In [4]:
let s1: String = "Hello World!";
s1

Error: mismatched types

In [5]:
let mut s2: String = String::from("Hello World!");
s2

"Hello World!"

In [12]:
s2 + 'a'

Error: mismatched types

In [13]:
s2.push('a')

()

In [14]:
s2

"Hello World!a"

## Construct Strings

### String::new

`String::new` creates an new empty string.

In [8]:
String::new()

""

`String::with_capacity` creates a new emtpy string with the given capacity.

In [10]:
let my_str = String::with_capacity(2);
my_str

""

In [12]:
my_str.capacity()

2

## Cases of String

1. The `to_*case` methods return a new String object
    (mainly because changing the case of non-ASCII character might change the length of the string).
    The `make_ascii_*case` methods changes cases in place 
    (as changing the case of ASCII characters won't change the length of the string).
    
2. `to_*case` methods change the case of all characters 
    while `to_ascii_*case` methods only change the case of ASCII characters 
    and leave non-ASCII characters unchanged.

### to_lowercase and to_uppercase

### to_ascii_lowercase and to_ascii_uppercase

### make_ascii_lowercase and make_ascii_upper

## chars

## contains

## get

In [2]:
let s: String = String::from("Hello World!");
s.get(0..3)

Some("Hel")

In [5]:
let s: String = String::from("Hello World!");
let ss = s.get(0..3).unwrap().to_string();
ss

"Hel"

## join

In [6]:
["a", "b"].join("")

"ab"

In [7]:
['a', 'b'].join("")

Error: no method named `join` found for array `[char; 2]` in the current scope

In [6]:
vec!["a", "b"].join("")

"ab"

In [7]:
vec![String::from("a"), String::from("b")].join("")

"ab"

## len

## replace

## parse (Convert String to Other Types)

[String Conversions](https://cheats.rs/#string-conversions)

Convert an integer to string.

In [2]:
let s = 123.to_string();
s

"123"

In [3]:
1.to_string()

"1"

Convert a string to bytes.

In [8]:
"1".as_bytes()

[49]

In [4]:
1.to_string().as_bytes()

[49]

In [5]:
1i32.to_be_bytes()

[0, 0, 0, 1]

Convert the string back to integer.

In [6]:
s.parse::<i32>()

Ok(123)

In [7]:
s.parse::<i32>().unwrap()

123

## push

You cannot concatenate a char to a string using the `+` operator.
However,
you can use the `String.push` method to add a char to the end of a String.

## push_str

## is_empty

## split

In [4]:
"".split(",").collect::<Vec<&str>>()

[""]

In [5]:
"".split(" ").collect::<Vec<&str>>()

[""]

In [2]:
"1,2,3".split(",")

Split(SplitInternal { start: 0, end: 5, matcher: StrSearcher { haystack: "1,2,3", needle: ",", searcher: TwoWay(TwoWaySearcher { crit_pos: 0, crit_pos_back: 1, period: 1, byteset: 17592186044416, position: 0, end: 5, memory: 0, memory_back: 1 }) }, allow_trailing_empty: true, finished: false })

In [20]:
let mut it = "1,2,3".split(",");
it

Split(SplitInternal { start: 0, end: 5, matcher: StrSearcher { haystack: "1,2,3", needle: ",", searcher: TwoWay(TwoWaySearcher { crit_pos: 0, crit_pos_back: 1, period: 1, byteset: 17592186044416, position: 0, end: 5, memory: 0, memory_back: 1 }) }, allow_trailing_empty: true, finished: false })

In [21]:
it.next()

Some("1")

In [22]:
it.next()

Some("2")

In [23]:
it.next()

Some("3")

In [24]:
it.next()

None

In [5]:
let v: Vec<&str> = "1,2,3".split(",").collect();
v

["1", "2", "3"]

In [17]:
let v: Vec<i8> = "1,2,3".split(",").map(|x| x.parse::<i8>().unwrap()).collect();
v

[1, 2, 3]

## split_whitespace

In [23]:
"how are you".split_whitespace()

SplitWhitespace { inner: Filter { iter: Split(SplitInternal { start: 0, end: 11, matcher: CharPredicateSearcher { haystack: "how are you", char_indices: CharIndices { front_offset: 0, iter: Chars { iter: Iter([104, 111, 119, 32, 97, 114, 101, 32, 121, 111, 117]) } } }, allow_trailing_empty: true, finished: false }) } }

In [27]:
for word in "how are you".split_whitespace() {
    println!("{}", word);
}

how
are
you


()

## with_capacity

In [25]:
let ss = String::with_capacity(3);
ss

""

## Print Strings

1. You cannot use print an integer directly.
    Instead,
    you have to convert it to a String first.
    
2. It is suggested that you use `println!("{}", var);`
    to print the variable to terminal so that you do not have to worry about its type.m

In [2]:
println!(5)

Error: format argument must be a string literal

In [3]:
println!("{}", 5)

5


()

In [5]:
println!("My name is {} and I'm {}", "Ben", 34);

My name is Ben and I'm 34


In [6]:
println!("{0} * {0} = {1}", 3, 9);

3 * 3 = 9


In [7]:
println!("{x} * {x} = {y}", x=3, y=9);

3 * 3 = 9


## Placeholder Traits

In [12]:
println!("Binary: {v:b}, Hex: {v:x}, Octol: {v:o}", v = 64);

Binary: 1000000, Hex: 40, Octol: 100


## Print an Iterable

In [13]:
println!("{:?}", ("Hello", "World"));

("Hello", "World")


## Concatenate a String and a Char

In [2]:
let mut my_str = String::from("Hello World");
my_str.push('!');
my_str

"Hello World!"

## Concatenate Several Strings Together

The GitHub repo
[dclong/conccatenation_benchmarks-rs](https://github.com/dclong/concatenation_benchmarks-rs)
has a summary of different ways of joining strings 
and their corresponding performance.

### Concatenate Strings in an Array/Vector

In [7]:
["how", "are", "you"].join(" ")

"how are you"

In [8]:
vec!["how", "are", "you"].join(" ")

"how are you"

### Concatenate Strings in an Iterator

In [15]:
let v = vec!["how", "are", "you"];
v.into_iter().collect::<String>()

"howareyou"

In [10]:
let v = vec!["how", "are", "you"];
v.into_iter().collect::<String>()

"howareyou"

In [9]:
let arr = ["how", "are", "you"];
arr.into_iter().collect::<String>()

Error: a value of type `String` cannot be built from an iterator over elements of type `&&str`

In [11]:
let arr = ["how", "are", "you"];
arr.into_iter().copied().collect::<String>()

"howareyou"

In [7]:
let v = vec!["how", "are", "you"];
v.into_iter().intersperse(" ")

Error: use of unstable library feature 'iter_intersperse': recently added

## Indexing a String

  
Indexing into a string is not available in Rust. 
The reason for this is that Rust strings are encoded in UTF-8 internally, 
so the concept of indexing itself would be ambiguous and people would misuse it. 
Byte indexing is fast, 
but almost always incorrect 
(when your text contains non-ASCII symbols, 
byte indexing may leave you inside a character, 
which is really bad if you need text processing) 
while char indexing is not free because UTF-8 is a variable-length encoding, 
so you have to traverse the entire string to find the required code point.

There are 2 ways to get chars out of a string.
First, 
you can call the `chars` method which returns an iterator. 
This ways is not efficient of course if you want random access.
Second, 
you can get the underlying bytes representation of a string 
by calling the `as_bytes` method 
(which returns a byte slice `&[u8]`. 
You can then index the byte slice and convert a `u8` variable to `char` 
using the `as` keyword.


In [6]:
let s = String::from("how are you");
s[0]

Error: the type `String` cannot be indexed by `{integer}`

In [7]:
let s = String::from("how are you");
s.chars().next()

Some('h')

In [8]:
let s = String::from("how are you");
s.as_bytes()[2] as char

'w'

## References

- [char in Rust](http://www.legendu.net/misc/blog/rust-char)

- [Official Doc on String](https://doc.rust-lang.org/std/string/struct.String.html)

- [What’s the difference between &String and &str?](https://users.rust-lang.org/t/whats-the-difference-between-string-and-str/10177/2?from=singlemessage&isappinstalled=0)

- [Rust: str vs String](https://www.ameyalokare.com/rust/2017/10/12/rust-str-vs-String.html)

- [How to index a String in Rust](https://stackoverflow.com/questions/24542115/how-to-index-a-string-in-rust)

- [String Conversions](https://cheats.rs/#string-conversions)