# Collection Data Types

In addition to the primitive types, C++ also offers built-in collection types. A **collection** data type is a grouping of some number of items (possibly 1 or 0) that have some shared significance or need to be operated upon together.

**Arrays**, **vectors**, **strings**, **sets**, and **hash tables** are among these useful C++ collection types.

## Arrays

An **array** data structure is an ordered arrangement of values located at equally spaced addresses in **contiguous computer memory**. The fact that array elements are stored in memory in contiguous memory locations making look-up via index very, very fast. In computing, a **word** is the unit of data used by a particular processor design, such as 32 or 64 bits. For example, an array of 100 integer variables, with indices 0 through 99, might be stored as 100 words at memory addresses `20000`, `20004`, `20008`, … `20396`. The element with index `i` would be located at the address `20000 + 4 × i`.

C++ arrays can be allocated in two different ways:

| Allocation | Description | Use case |
| :-: | :-- | :-- |
| **Static** | The array size is fixed at compile-time and cannot change | Speed is essential or where hardware constraints exist (real-time or low-level processing) |
| **Dynamic** | Pointers are used in the allocation process so the size can change at run-time | Typically used when more flexibility is required | 


**Remark.** As a Python programmer, you can think of the array as the ancestor of the Python list, and you might remember that Python lists are actually implemented via an underlying array consisting of references.

**Static arrays.** This can be initialized by indicating both type and size (explicit or implicit):

```c++
double a[10];
string s[] = {"this", "is", "an", "array", "of", "strings"};
```

Note that as tradeoff for efficiency, C++ arrays dont offer the same protections as Python:

In [1]:
from utils import runcpp

Here we did out of bounds access:

In [11]:
%%runcpp --exitcode=true
#include <iostream>
using namespace std;

int main() {
    int A[] = {1, 2, 3, 4, 5};
    cout << A[4] << " " << &A[4] << endl;
    cout << A[5] << " " << &A[5] << endl;      // out of bounds access
    return 0;
}

g++ -std=c++23 ./code/tmp.cpp -o ./code/tmp
./code/tmp



    7 |     cout << A[5] << " " << &A[5] << endl;      // out of bounds access
      |             ^ ~
./code/tmp.cpp:5:5: note: array 'A' declared here
    5 |     int A[] = {1, 2, 3, 4, 5};
      |     ^


5 0x16b7ae930
1 0x16b7ae934
0


It's nice that we get a warning from our compiler. But the exit code is still `0`. 

**Remark.** Observe that the addresses are spaced 4 apart. These are **bytes** which are the basic addressable unit of memory in most modern computer architectures. A byte consist of 8 bits (0 or 1). Primary data tpes are stored in multiple bytes:

- `char`: 1 byte = 8 bits
- `int` / `float`: 4 bytes = 32 bits
- `double`: 8 bytes = 64 bits