<style> ul,p { max-width: 500px; } </style>

# [Common Collections](https://doc.rust-lang.org/book/ch08-00-common-collections.html)

It's been a while since my last learning session so I've reread the previous notes. 🤔

Looks like it's time to learn about collections.

Everyone's gotta have 'em. 

I've already learnt the basics of tuples, arrays, and string slicing.

![collections](data/000_collections.png)

But it looks like these are the ones on the stack rather than preallocated.

Vectors? Hash maps? 



<style> p { max-width: 500px; } </style>

So they're created with:
```rust
let v: Vec<i32> = Vec::new();
```

Huh... well the latter half seems redundant. Obviously it's a new vector.

I suppose you'd usually use a constant or existing vector.

Anyway, it's a familiar ♦️diamond♦️ syntax when defining the type contained. 

Vectors can only hold one type. No sushi belts.

We can usually avoid this type annotation when vectors:

```rust
let Δtimes = vec![10.4, 10.3, 10.9]
```

In this case type is inferred, though apparently you have to use a macros.

So that's great, but how are these different from arrays?

They both seem to be a group of the same type in contiguous memory.



<style> ul,p { max-width: 500px; } </style>

## [Updating a Vector](https://doc.rust-lang.org/book/ch08-01-vectors.html#updating-a-vector)

Ah, here we go:
```rust
    let mut v = Vec::new();

    v.push(5);
    v.push(6);
    v.push(7);
    v.push(8);
```

So arrays are a fixed length known at compile-time whereas vectors can be extended.

They also use the heap rather than the stack.

<small>It might them pay to minimize the use of these in embedded development...</small>

So if you can `push()` can you also `pop()`? Or push multiple items?

How is this working internally? 
 * Is memory being reallocated and copied to when you add items?
 * They can't be a linked list because it said they're in contiguous memory.
 * Maybe they grow in chunks.

Remember they need to be declared as mutable or the values are immutable.

You also can't `push` new values onto immutable vectors.

Which seems to make them exactly like immutables arrays... but I suppose they're on the heap so it might make sense if they're huge. Or you just wanted to borrow one.




<style> p { max-width: 500px; } </style>

## [Reading Elements of Vectors](https://doc.rust-lang.org/book/ch08-01-vectors.html#reading-elements-of-vectors)

```rust
    let v = vec![1, 2, 3, 4, 5];

    let third: &i32 = &v[2];
    println!("The third element is {third}");

    let third: Option<&i32> = v.get(2);
    match third {
        Some(third) => println!("The third element is {third}"),
        None => println!("There is no third element."),
    }
```

They slice just like arrays. 

One things to remember is that in the line:
```rust
    let third: &i32 = &v[2];
    let third: Option<&i32> = v.get(2);
```
The result of getting a single item is the <em>base type</em> and not a vector of one!

So they show type possible options here. 

Using `[n]` or `get(n)`.

The first one is dangerous because could potentially be out of bounds. 

Fortunately, (as we learnt) Rust never lets you read an out of range value.

So it's not UB, but it will causes a panic.

The second one returns a `Some` type.

In the interests of safety, it's probably best to use the second one unless you know the size for sure. You know what happens when you assume: you make an arse out of you...rself and crash the program. 🧨

Make sure your vector references are mutable if you intend to change or add to them!

They have a reference to *The Rustonomicon* to learn more, but I'm not ready for those kind of dark arts just yet.

<style> p { max-width: 500px; } </style>

## [Iterating Over the Values in a Vector](https://doc.rust-lang.org/book/ch08-01-vectors.html#iterating-over-the-values-in-a-vector)

You can use the `for i in &vector {...}` syntax to iterate through a vector.



In [2]:
fn main() {
    let v = vec![100, 32, 57];
    for i in &v {
        println!("{i}");
    }
}

<style> p { max-width: 500px; } </style>

Hmm, you need a reference to iterate over it. 

Was that the case with arrays as well? I must have forgotten, but probably.

Other than that it's a pretty standard `for` loop. 

You are allowed to changed the value during a loop:


In [None]:
fn main() {
    let mut v = vec![100, 32, 57];
    for i in &mut v {
        *i += 50;
    }
}


<style> p { max-width: 500px; } </style>

But again, **your reference must be mutable.**

It's interesting that you have to use a dereference operator `*` here.

In other cases Rust "figures it out for you".

You may not add or remove values during a `for` loop.


<style> p { max-width: 500px; } </style>

## [Using an Enum to Store Multiple Types](https://doc.rust-lang.org/book/ch08-01-vectors.html#using-an-enum-to-store-multiple-types)

You can use an enum if you need multiple types in a vector:

```rust
    enum SpreadsheetCell {
        Int(i32),
        Float(f64),
        Text(String),
    }

    let row = vec![
        SpreadsheetCell::Int(3),
        SpreadsheetCell::Text(String::from("blue")),
        SpreadsheetCell::Float(10.12),
    ];
```

I'm not sure if they really counts as multiple types... this is just an object vector with all the same type.

But... an object vector is useful all the same!🔧

I'm not sure about the memory efficiency of this.

Vectors drop out of memory like other elements if they go out of scope!

I added program #12 to try this out.



<style> p { max-width: 500px; } </style>

## [Storing UTF-8 Encoded Text with Strings](https://doc.rust-lang.org/book/ch08-02-strings.html#storing-utf-8-encoded-text-with-strings)

Looks like we're about to get a lot more information about what a string really... is.

    New Rustaceans commonly get stuck on strings for a combination of three reasons: Rust’s propensity for exposing possible errors, strings being a more complicated data structure than many programmers give them credit for, and UTF-8.

I feel like that last word is going to be the most important: **UTF-8**

When you look under the hood of a nice string class, there's encodings. 

They're messy, especially *variable length encodings*.

So I worder string indexes are actually implemented in Rust.

Is it:
 * (a) A nice Python-style interface where indexs line up with characters and you don't have to worry about the boundries.
 * (b) A more lower level implementation where indexes line up with bytes and you have worry about spliting code points.

(a) is really nice to work with, but most languages implement (b) for performance and design simplicity reasons.

It sounds like it might be (b) is this case.

![Strings](data/001_string.png)


<style> p { max-width: 500px; } </style>

## [What Is a String?](https://doc.rust-lang.org/book/ch08-02-strings.html#what-is-a-string)

    Rust has only one string type in the core language, which is the string slice str that is usually seen in its borrowed form &str. In Chapter 4, we talked about string slices, which are references to some UTF-8 encoded string data stored elsewhere. String literals, for example, are stored in the program’s binary and are therefore string slices.

So these ones are purely static data in the binary right?

Or somewhere else.

And they aren't mutable, you have to convert them into a `String`.


    The String type, which is provided by Rust’s standard library rather than coded into the core language, is a growable, mutable, owned, UTF-8 encoded string type. When Rustaceans refer to “strings” in Rust, they might be referring to either the String or the string slice &str types, not just one of those types. Although this section is largely about String, both types are used heavily in Rust’s standard library, and both String and string slices are UTF-8 encoded.

Another reminder `str`≠`String`.

So the latter is basically a subtype of collections?

A byte vector with lots of extra bells and whistles?

<style> p { max-width: 500px; } </style>

## [Creating a New String](https://doc.rust-lang.org/book/ch08-02-strings.html#creating-a-new-string)

Okay, so this create a `String`
```rust
    let mut s = String::new();
```

But using literal creates a `str`, which <u>must be converted into a string to modify</u>.

```rust
    let data = "initial contents";

    let s = data.to_string();

    // the method also works on a literal directly:
    let s = "initial contents".to_string();
```

This is going to be difficult to remember, though I'm sure the type checker will make it impossible to forget.

Or another alternative:
```rust
    let s = String::from("initial contents");
```

I like this version a lot better. 

Using an object method on a literal seems kinda... weird?

Well there's a test of operating system's Unicode support.
```rust
    let hello = String::from("السلام عليكم");
    let hello = String::from("Dobrý den");
    let hello = String::from("Hello");
    let hello = String::from("שלום");
    let hello = String::from("नमस्ते");
    let hello = String::from("こんにちは");
    let hello = String::from("안녕하세요");
    let hello = String::from("你好");
    let hello = String::from("Olá");
    let hello = String::from("Здравствуйте");
    let hello = String::from("Hola");
```

I don't know why I keep expecting half of these to be boxes or do something weird.

It isn't a problem for modern operating systems.

It's just the flashbacks to older operating systems where it was an unmitigated shitshow. There'd be square boxes everywhere, you'd be prompted to download character packs, code page mismatches, or sometimes the browser or entire OS would just crash trying to render a foreign language.

Anyway.

<style> p { max-width: 500px; } </style>

## [Updating a String](https://doc.rust-lang.org/book/ch08-02-strings.html#updating-a-string)

So more hints a string is essential just a fancy vector class.

```rust
    let mut s = String::from("foo");
    s.push_str("bar");
```

Okay... so now I understand where this `push_str()` name came from.

We're pushing (appending) values like you would in a vector.

This is using an `&str` to add to a string, but what if we wanted to add two `String`s? The ownership system might complicate that.

It also accepts a character, though they still haven't explain how overloading works in Rust.


<style> p { max-width: 500px; } </style>

## [Concatenation with the + Operator or the format! Macro](https://doc.rust-lang.org/book/ch08-02-strings.html#concatenation-with-the--operator-or-the-format-macro)

```rust
    let s1 = String::from("Hello, ");
    let s2 = String::from("world!");
    let s3 = s1 + &s2; // note s1 has been moved here and can no longer be used
```
Ah, so that's how.

Instead of mutating either of them you just create a whole new string.

It feel like adding strings should require `mut`, but I can understand why they don't have it. 

The use of a reference on only one of them however, it both unexpect **and** unexplained.

Apparently the class has an add method.

```
fn add(self, s: &str) -> String {
```

But that just raises more questions. Is `String` *implicitly* converted to `&str`?

So the answer seems to be: yes

    The reason we’re able to use &s2 in the call to add is that the compiler can coerce the &String argument into a &str ... 
    Because add does not take ownership of the s parameter, s2 will still be a valid String after this operation.

There is implicit coersion, but it still has to be a `&String`, not a `String`.

So what happens it we add a third option? Does that also have to be an `&String`? It seems like the signature wouldn't work either way.

    Second, we can see in the signature that add takes ownership of self because self does not have an &. This means s1 in Listing 8-18 will be moved into the add call and will no longer be valid after that. So, although let s3 = s1 + &s2; looks like it will copy both strings and create a new one, this statement actually takes ownership of s1, appends a copy of the contents of s2, and then returns ownership of the result. In other words, it looks like it’s making a lot of copies, but it isn’t; the implementation is more efficient than copying.

Oh I see. So it is reusing the buffer of the first string for some efficiency gains, which is why it takes ownership of the first one but not the second.

```rust
    let s1 = String::from("tic");
    let s2 = String::from("tac");
    let s3 = String::from("toe");

    let s = s1 + "-" + &s2 + "-" + &s3;
    let s = format!("{s1}-{s2}-{s3}");
```

So it does seem you can just keep adding more `&str` forever.

Seems like you can use this to enumate the print syntax anywhere.

I like this `format!()` method, the curly-braced variable style is very easy to read an efficient.

Though if I recall correctly, this is a very limited version of it, the moment you want to do anything complicated you have to drop down into some printf-style syntax.




In [11]:
let hpv = 135;
let uhpv = 276;
let mut output = String::new();
let category = "P-platers under 25";
let power_unit = "kW/tonne";

output += &format!("{category} are prohibited from using vehicles ");
output += &format!("above the output power of {hpv} {power_unit}.\n");
output += &format!("Fully licenced drivers must register to use a vehicle ");
output += &format!("with an output power above {uhpv} {power_unit}.\n");

// Rust will not accept a String as the format argument.
// Which is fair enough, it's bad practice to pass a regular string argument as the format string anyway.
print!("{}", &output);


P-platers under 25 are prohibited from using vehicles above the output power of 135 kW/tonne.
Fully licenced drivers must register to use a vehicle with an output power above 276 kW/tonne.
Fully licenced drivers must register to use a vehicle with an output power above 276 kW/tonne.


<style> p { max-width: 500px; } </style>

## [Indexing into Strings](https://doc.rust-lang.org/book/ch08-02-strings.html#indexing-into-strings)

Apparently this code should give an error:
```rust
    let s1 = String::from("hello");
    let h = s1[0];
```
As: <q>Rust strings don’t support indexing.</q>

What? Why?

Ah, so they explained this is because vectors use a fixed size unit but UTF-8 does not. This which would likely result in errors when naıve programmers do something like `s[10]` and expecting the eleventh character but it will cause bugs are garbled text the moment multibyte characters are involved.




In [6]:
let s = "いいお天気ですか？";
// It will however let you do len() on a str.
// This returns the wrong answer (27 instead of 9).
// However the byte length may be useful for copying the buffer.
println!("Length of {s} is {} bytes", s.len());
// But how do you get the actual character length?

Length of いいお天気ですか？ is 27 bytes


<style> p { max-width: 500px; } </style>

So they also point out the distinction between scalar values (UTF-8 code points) and Grapheme Clusters (letters).

Scalar values are not the same since some code points are modifiers!

You can use slicing, but this will panic if it splits a characters so it's not a great method.







<style> p { max-width: 500px; } </style>

## [Methods for Iterating Over Strings](https://doc.rust-lang.org/book/ch08-02-strings.html#methods-for-iterating-over-strings)

So how do you iterate of a string then?

You can either use the `.bytes()` or `.chars()` method.

The former is raw bytes, and the latter is a scalar values.

Graphemes are not provided since this method is more complicated, but available in external crates.

In [9]:
let s = "いいお天気ですか？";

for c in s.chars() {
    println!("Char: {c}");
}



Char: い
Char: い
Char: お
Char: 天
Char: 気
Char: で
Char: す
Char: か
Char: ？


()

<style> p { max-width: 500px; } </style>

This should work fine.

Most of the time the scalars are all you need.

Well, I'm very happy they've put so much thought into proper Unicode support, and eliminating common sources of error.

