# Strings

- Rust provides two main String data types for the following reasons:
    - memory safety
    - no accidental copying
    - performance control
    - zero-cost abstractions

- Languages like Python/Jave hide these details -- Rust makes it explict
- Rust strings are not complicated - they're explict about the ownership!

## &str - String Reference (Borrowed Fixed Size)

- the most common way to handle string literal
- a reference to UTF-8-text stored somewhere in memory (usually the program's binary)
- the variable doesn't own the text, it is just pointing to it
- it is immutable (cannot change its contents)
- stored in read-only memory when it's a literal
- very fast and lightweight
- zero allocation
- function parameters almost always use &str

In [2]:
let text: &str = "Hello World!";
println!("{text}")

Hello World!


()

In [3]:
fn greet(name: &str) {
    println!("Hello, {name}!");
}

In [4]:
greet("John Smith");

Hello, John Smith!


In [6]:
let name: &str = "Michael Jordan";
greet(name);

Hello, Michael Jordan!


## String - Owned, growable, heap-allocated

- full, editable/mutable string type
- an owned, resizable UTF-8 buffer stored on the heap
- "I own the text and can modify it."
- Properties:
    - mutable
    - dynamically sized
    - can grow/shrink
    - more expensive than &str

In [None]:

// Create
let mut s = String::from("Hello");
let s2 = " world"; // &str

// Append
s.push('!');
s.push_str(s2);

// Concatenate / format
let combined = format!("{}{}", s, " üôÇ"); //&str

In [9]:
println!("{combined}")

Hello! world üôÇ


()

In [10]:
combined.push_str("Good bye!");

Error: cannot borrow `combined` as mutable, as it is not declared as mutable

In [None]:
s.push_str("Good bye!");

## Relationship between the Two

- A `String` can become a `&str`
- but a `&str` cannot become a `String` without allocation

In [12]:
let slice = "Hello";
let mut text = slice.to_string(); //allocates
text.push_str(" World!");

println!("{text}");

Hello World!


In [21]:
fn shout(text: &str) -> String {
    let mut result = String::from(text);
    result.push(' ');
    result.push_str("world");
    result.push('!');
    result
}

In [22]:
fn main() {
    let msg = "hello";
    let loud = shout(msg);
    println!("{loud}")
}

In [23]:
main();

hello world!


## Unicode not array of ASCII chars

- in many languages (C++, Java, Python* sometimes), a character = 1 slot
- Rust Strings are array of UTF-8 characters with variable length characters
- can't use `[index]` to access character in Strings

| Character | Bytes   |
| --------- | ------- |
| `a`       | 1 byte  |
| `√©`       | 2 bytes |
| `‡§®`       | 3 bytes |
| `ü¶Ä`      | 4 bytes |

```text
"ü¶Äa√©"

bytes:
[F0 9F A6 80] [61] [C3 A9]
   crab        a     √©
```

- Rust has 3 different "Levels" of text

### Bytes (u8)

- raw memory representation
- fastest but not human readable

In [None]:
// UTF-8 text
let s = "ü¶Äa√©";

for b in s.bytes() {
    println!("{b}");
}


240
159
166
128
97
195
169


()

In [None]:
// F0 hex == 240
let val: u32 = 0xF0;

In [27]:
val

240

### Unicode scalar values (char)

- what most people mean by "characters"

In [None]:
// ü¶Ä counts as 1 character eventhough it's 4 bytes
let s = "ü¶Äa√©";

for c in s.chars() {
    println!("{c}");
}


ü¶Ä
a
√©


()

### Graphmeme clusters (user-preceived characters)
 - what humans actually see

```text
 "üá∫üá∏"  -> 2 Unicode scalars
"eÃÅ"   -> 'e' + accent
```
- Rust stdlib does not include this (too expensive + complex)

## Proper Way to "Index" a String

- you must explictly choose your intent
- use `.chars().nth(index)` method
    - returns `Option<char>` because it may not exist
- using `Option<T>` is a bit of work because data may not exist!

In [48]:
let s = "ü¶ÄRust";

let third = s.chars().nth(2);
if let Some(ch) = third {
    println!("{ch}");
}


u


()

In [51]:
println!("{}", third.unwrap_or('?'));

u


In [54]:
// functional style (very Rusty)
third.map(|ch| println!("{ch}"));
// No output if None

u


## Extract nth character (ASCII only)

- breaks for UTF-8 text

In [55]:
let s = "Rust";
let c = s.as_bytes()[2] as char;

println!("{c}");


s


In [56]:
// Loop over ASCII substrings

let s = "abcdef";

for i in 0..s.len() {
    println!("{}", &s[i..i+1]);
}


a
b
c
d
e
f


()

### How to GUARANTEE it's ASCII

- If input may be Unicode, check first:

In [57]:
if text.is_ascii() {
    let first = &text[0..1];
    println!("{first}");
}


M


()

### Slice part of string

- you can slice only at valid UTF-8 boundries
- slice syntax:

```rust
&str[startIndex..endIndex]
```
- startIndex is inclusive
- endIndex is exclusive
- String uses 0-based indexing

In [None]:
// All ASCII text
let text = "Mississippi";
let miss = &text[0..4];
println!("{miss}");

Miss


In [None]:
// UTF-8: careful and must know the boundries
let s = "‡§®‡§Æ‡§∏‡•ç‡§§‡•á";

// let bad = &s[0..1];   ‚ùå panic
let ok = &s[0..3];      // first character
println!("{ok}");

‡§®


### Performance Tip

- For heavy parsing (protocols, file formats, tokens), prefer bytes:

In [60]:
let mut text = "Hello!";
let bytes = text.as_bytes();

if bytes[0] == b'H' {
    println!("Starts with H");
}


Starts with H


()

### Rule of Thumb

| Text type     | Safe slicing       |
| ------------- | ------------------ |
| ASCII         | `&s[a..b]` ‚úî       |
| UTF-8         | `.chars()` needed  |
| Unknown input | check `is_ascii()` |
