Here's a quick comparison between these 4 container data types:

| Feature          | List                                  | Dictionary                           | Set                                | Tuple                             |
|------------------|---------------------------------------|--------------------------------------|------------------------------------|-----------------------------------|
| Syntax           | `[item1, item2, ...]`                 | `{'key1': value1, 'key2': value2}`   | `{item1, item2, ...}`              | `(item1, item2, ...)` or `item,`  |
| Type of Data     | Sequence                              | Mapping                              | Set                                | Sequence                          |
| Order            | Ordered                               | Unordered                            | Unordered                          | Ordered                           |
| Indexing         | Yes (by index)                        | Yes (by key)                         | No                                 | Yes (by index)                    |
| Duplicate Values | Allowed                               | Values can be duplicated, keys cannot| Not allowed                        | Allowed                           |
| Mutability       | Mutable                               | Mutable                              | Mutable                            | Immutable                         |
| Usage            | For a collection of ordered items     | For key-value pairs                  | For unique items                   | For fixed data                    |


# List

- Used to store multiple ordered items in a single variable.
- Created using `[` and `]`.
- We won't be going into everything that we can do in a list.
- Common data types in lists: Integer, Float, String, Boolean, List, Dictionary, Tuple, Set, Object.
- You can include lists within lists.
- Easy to store information.  

## Importance

Versatile for storing sequences of data. Pandas can convert lists into Series or DataFrame objects for analysis.


### Examples

Create a list of job skills that are common to data science roles.

``` python
# Define a list of data science jobs
job_skills = ['sql','tableau','excel']
job_skills
```



In [6]:
# Define a list of data science jobs
job_skills = ['sql','tableau','excel']
job_skills

['sql', 'tableau', 'excel']

### Indexing

What if we want to get a specific item in a list? We'd use list indexing.

Lists are indexed, which means each item has a numerical position, so you can access it by referring to the index number.

**Note: The first item has index 0.**

[Visual Example](https://drive.google.com/file/d/1VGn2YJVhhcnJFPz98QUcFgU8OeZA1zKK/view?usp=drive_link)

For this example if want to get 'tableau' in this list we would use the index `1`.

In [None]:
# Get a specific item in the list
job_skills[1]

'tableau'

### Change Value

To change the value of a specific item, refer to its index number. Below we will change the 'tableau' skill (at index `1`) to 'bigquery'.

In [4]:
# Change the value of an item
job_skills[1] ='bigquery'

job_skills

['sql', 'bigquery', 'excel']

### Methods

* **Methods** are functions that belong to an object
    * We learned a little about functions before but a reminder: it's a block of code designed to do a specific task.
    * In a bit we'll be creating our own functions. But now we'll just be using functions given to us.
* But all you need to know right now is that **methods** have this notation: `object.method()`


### Append()

If you need to add an item at the end of a list you can use `append()` method. Below we'll add in 'looker' as a skill at the end of our list.

We'll get into methods more later.

In [None]:
## Add a job to the list
job_skills.append('looker')

job_skills

['sql', 'bigquery', 'excel', 'looker']

### Length()

If you want to see how many items are in a list use the `len()` function.

In [None]:
len(job_skills)

4

### Insert

To insert a list item at a specified place (index), use `insert`. Here we are inserting the 'python' skill after 'bigquery'. So we will insert it at the index of `2`.

🪲 **Debugging**

**This is an intentional mistake**

This is used to demonstrate debugging.

Error: Incorrect syntax for `insert`. Forgot the comma.

```python
job_skills.insert(2'python')
```


This is the correct one ✅

In [None]:
# Insert an item into the list
job_skills.insert(2,'python')

job_skills

['sql', 'bigquery', 'python', 'excel', 'looker']

### Remove()

To remove a specific item use the `remove()` method. Let's remove the 'looker' skill.

In [None]:
# Remove an item from the list
job_skills.remove('looker')

job_skills

['sql', 'bigquery', 'python', 'excel']

### Join Lists

There are a few ways to join (or concatenate) two or more lists.

1. Concatenate using the `+` operator.


In [None]:
# Concatenate
skills1 = ['SQL', 'Tableau']
skills2 = ['Excel']

skills3 = skills1 + skills2
skills3

['SQL', 'Tableau', 'Excel']

2. Appending all items from the first list to the second using `.append()`. Here we're appending all items from `skills3` to `skills4` list.


In [None]:
# Append
skills4 = ['Python', 'Power BI']

# Append all items from skills3 to skills4 list
for x in skills3:
    skills4.append(x)

skills4

['Python', 'Power BI', 'SQL', 'Tableau', 'Excel']

3. Use `extend()` method to add elements from one list to another list. We are adding elements from the `skills4` list to the `skills5` list.



In [None]:
# Extend
skills5 = ['Statistics', 'Machine Learning']

skills5.extend(skills4)

skills5

['Statistics',
 'Machine Learning',
 'Python',
 'Power BI',
 'SQL',
 'Tableau',
 'Excel']

### Join()

The `.join` method is used for concatenating a sequence of strings together with a specified separator. You need to add a specific separator (e.g. `', '`) between each element during the concatenation process. This is used to format output for readability.

You can use this for lists, tuples and more. But in our case we'll mostly be using it for either lists or tuples. For lists specifically we are combining multiple strings from a list.

In [2]:
skills = ['Python', 'SQL', 'Excel']

In [10]:
# Use concatenate print formatting
print('I have these skills: ' + ', '.join(skills))

I have these skills: Python, SQL, Excel


### Slicing Lists

Slicing Syntax:
* Syntax: `list[start:end:step]`  
    * `start`: The starting index (inclusive)
    * `end`: The ending index (exclusive)
    * `step`: Steps to take between items

In [None]:
skills = ['Python', 'SQL', 'Excel']

# Extract the first two items
first_two = skills[0:2]
first_two

['Python', 'SQL']

`start` has a default value of `0` and `stop` has a default value of the last value in the list.

Therefore: you can omit either when one of these values.

In [None]:
full_list = skills[:]
full_list

['Python', 'SQL', 'Excel']

In [None]:
also_first_two = skills[:2]
also_first_two

['Python', 'SQL']

In [None]:
last_two = skills[1:]
last_two

['SQL', 'Excel']

In [None]:
last_one = skills[-3:]
last_one

['Python', 'SQL', 'Excel']

For `step` the default value is `1`, but if we want to change it up:

In [None]:
skills = ['Python', 'SQL', 'Excel', 'R', 'Java']

# Extract every second item starting from the first
every_second = skills[0::2]
every_second

['Python', 'Excel', 'Java']

### Unpack List

To unpack a list (unpacking is when you assign each value in a list to a variable in a single statement) you can assign the list elements to variables directly.

In [None]:
# Unpacking the list
skill1, skill2, skill3, skill4 = job_skills

# Printing the unpacked variables
print(skill1)
print(skill2)
print(skill3)
print(skill4)

sql
bigquery
python
excel


### Extend Unpack List

To extend unpack a list, is when you assign a subset of elements to a variable as a list.

In [None]:
# Unpacking the sql skills together and then unpacking the rest of the skills
*sql_skills, skill3, skill4 = job_skills

print(sql_skills)  # List of the SQL skills
print(skill3)
print(skill4)

['sql', 'bigquery']
python
excel




---



#Dictionary

* Used to store data values in *key:value* pairs.
* It is:
  * ordered (as of Python version 3.7)
  * changeable
  * Doesn't allow duplicates
* Created using `{` and `}` and has keys and values.

## Importance

Useful for labeling data or creating pandas DataFrames (typically a .CSV file) with named columns, enhancing data readability.

## Create a Dictionary

We'll create a dictionary of the type of data science skills. For example 'postgresql' is considered a 'database' skill. We will have the type of skill as the key, and then the actual skill as the value.

🪲 **Debugging**

**This is an intentional mistake**

This is used to demonstrate debugging.

Error: Forgot a comma `,` at the end of the second dictionary item.

```
'languages': 'python'
```

Steps to Debug:

1. Look at the actual error, can you tell what the problem is?
2. If not, then look it up:
  1. Use a chatbot like ChatGPT or Claude
  2. Look it up using Google

In [None]:
# Define a dictionary of data science skills
job_type_skills = {
    'database': 'postgresql',
    'language': 'python'
    'library': 'pandas'
}

job_type_skills

SyntaxError: invalid syntax (2774884211.py, line 5)

This is the correct code ✅.

In [None]:
# Define a dictionary of data science skills
job_type_skills = {
    'database': 'postgresql',
    'language': 'python',
    'library': 'pandas'
}

job_type_skills

{'database': 'postgresql', 'language': 'python', 'library': 'pandas'}

## Get Items

To get an item in the dictionary there are two methods:

### Keys

#### Get Key

Refer to its key name inside square brackets.

In [None]:
# Define a dictionary of data science skills
job_type_skills = {
    'database': 'postgresql',
    'language': 'python',
    'library': 'pandas'
}

job_type_skills['language']

'python'

#### Get()

Or you can also use `get()` to retrieve the value for a given key.

In [None]:
job_type_skills.get('language')

'python'

### Keys()

If you want to see all the keys in the dictionary use `keys()`.

In [None]:
job_type_skills.keys()


dict_keys(['database', 'language', 'library'])

If the key doesn't exist, you can specify a default value to return instead.

In [None]:
job_type_skills.get('analytics', 'not found')

'not found'

### Values()

If you want to see all the values in the dictionary use `values()`.

In [None]:
job_type_skills.values()


dict_values(['postgresql', 'python', 'pandas'])

### Items()

To return the dictonary's key-value pairs.

In [None]:
job_type_skills.items()

dict_items([('database', 'postgresql'), ('language', 'python'), ('library', 'pandas')])

## Pop()

To remove a key from the dictionary and return its value use `.pop()`.

In [None]:
job_type_skills.pop('library')
job_type_skills

{'database': 'postgresql', 'language': 'python'}

## Add Items

You can add items by either:

### Direct Assignment

Direct assignment which assigns a value to a new or existing key directly.

In [None]:
# Direct assignment
job_type_skills['framework'] = 'flask'

# Return dictionary
job_type_skills

{'database': 'postgresql', 'language': 'python', 'framework': 'flask'}

### Update()

Use the `update()` method which can add multiple items at once.

🪲 **Debugging**

**This is an intentional mistake**

This is used to demonstrate debugging.

Errors:

Forgot a colon `:` in the first key:value pair.

```python
'language' 'python'
```

Forgot the end bracket `}`.

  ```python
  job_type_skills.update({'language' 'python', 'version_control': 'git')
  ```

Steps to Debug:

1. Look at the actual error, can you tell what the problem is?
2. If not, then look it up:
  1. Use a chatbot like ChatGPT or Claude
  2. Look it up using Google

In [None]:
# Update
job_type_skills.update({'cloud': 'google cloud', 'version_control': 'git')

# Return dictionary
job_type_skills

SyntaxError: invalid syntax (564925376.py, line 2)

This is the correct code ✅.

In [None]:
# Update
job_type_skills.update({'cloud': 'google cloud', 'version_control': 'git'})

# Return dictionary
job_type_skills

{'database': 'postgresql',
 'language': 'python',
 'framework': 'flask',
 'cloud': 'google cloud',
 'version_control': 'git'}

## Any object can be stored in a dictionary

You can also have lists within dictionaries. We'll update the skills with adding a list of programming languages in the `language` key using the `update` function.

In [None]:
# Define a dictionary of data science skills

job_type_skills.update({'language': ['python', 'r']})

job_type_skills

{'database': 'postgresql',
 'language': ['python', 'r'],
 'framework': 'flask',
 'cloud': 'google cloud',
 'version_control': 'git'}



---



# Set

* For unique items
* Used to store multiple items in a single variable
* It is:
  * unorderd
  * Unchangeable
  * Unindexed
* Set *items* are unchangeable but you can still remove items and add new items

## Importance

Employed for removing duplicates and for set operations, aiding in data cleaning and preparation.

## Examples

First let's create a set of data science skills.

In [None]:
# Define a set of data science skills
job_skills = {'tableau', 'sql', 'python', 'statistics'}

job_skills

{'python', 'sql', 'statistics', 'tableau'}

### Unordered and No Index

Sets are unorderd and have no index so don't behave completely like lists.

Therefore, we can't use an index to access items.

In [None]:
job_skills[1]

TypeError: 'set' object is not subscriptable

### Add()

Add an item using `add()`. Now we're going to add a skill to the list, 'looker'.  

In [None]:
# Adding a skill
job_skills.add('looker')

job_skills

{'looker', 'python', 'sql', 'statistics', 'tableau'}

What if we add a skill that's already in the list like `sql`? It won't duplicate so the skill won't be added. The set is the exact same as before.

In [None]:
# Adding a skill that already exists doesn't duplicate
job_skills.add('sql')

job_skills

{'looker', 'python', 'sql', 'statistics', 'tableau'}

### Remove()

Remove an item in the set using `remove()`. Let's remove the 'statistics' item from this set.

🪲 **Debugging**

**This is an intentional mistake**

This is used to demonstrate debugging.

Error: Tried to remove a skill that does not exist in this set.

```python
job_skills.remove('r')
```

Steps to Debug:

1. Look at the actual error, can you tell what the problem is?
2. If not, then look it up:
  1. Use a chatbot like ChatGPT or Claude
  2. Look it up using Google

In [None]:
# Removing a skill
job_skills.remove('r')

job_skills

KeyError: 'r'

This is the correct code ✅.

In [None]:
# Removing a skill
job_skills.remove('statistics')

job_skills

{'looker', 'python', 'sql', 'tableau'}

### Set()

Sets are great for removing repetitive values. Let's create a list of skills called `skill_list` but these will have repeating skills. If we use `set()` on this new `skill_list`. It will show the list as a set (aka remove repeated values).

In [None]:
# make a list of skills but some repeated values
skill_list = ['python', 'sql', 'statistics', 'tableau', 'python', 'sql', 'statistics', 'tableau']

set(skill_list)

{'python', 'sql', 'statistics', 'tableau'}

### List()

We can even convert this set back to a list. This will still have the duplicates removed, it will have the unique skills.

In [None]:
skill_list = list(set(skill_list))

skill_list

['python', 'statistics', 'tableau', 'sql']



---



# Tuples

* Used to store multiple items in a single variable
* It is:
  * Ordered
  * Unchangeable
* Written with `(` and `)`

## Importance

* Speed up operations because it's immutable
* Immutable sequences, often used for fixed data, like column names or coordinates in Matplotlib plots

## Examples

We're going to create a tuple of data science skills.

In [None]:
# Define a tuple of data science skills
job_skills = ('python', 'sql', 'statistics', 'tableau')

job_skills

('python', 'sql', 'statistics', 'tableau')

### Get item

Getting an element from a tuple using indexing. To get the first item from a tuple we'll get it at the index of `0`.

In [None]:
# Accessing an element
job_skills[0]

'python'

### Slicing

To only get part of a the tuple we'll use slicing. To only get 'python' and 'sql' values we'll get everything before the index of `2`.

In [None]:
# Define a set of data science skills
job_skills = ('python', 'sql', 'statistics', 'tableau')

job_skills[:2]

('python', 'sql')

### Add Items

Since tuples are immutable (not changeable), they don't have a built-in `append()` method. But there are two methods:

In [None]:
job_skills.append('excel')

AttributeError: 'tuple' object has no attribute 'append'

**Method 1: Turn tuple into a list.**

* First turn it into a list
* Then use the `append()` method
* Convert the list to a tuple

In [None]:
# Turn tuple into a list
job_skills_list = list(job_skills)

# Add skill to the list
job_skills_list.append('excel')

# Convert list to a tuple
job_skills_tuple = tuple(job_skills_list)

job_skills_tuple

('python', 'sql', 'statistics', 'tableau', 'excel')

**Method 2: Add tuple to a tuple.**

You can add tuples to tuples, which is good if you want to add one or more items.
* Create a new tuple with the items
* Add it to the existing tuple

🪲 **Debugging**

**This is an intentional mistake**

This is used to demonstrate debugging.

This won't run an error but it's not returning what we intended. Remember to double check which variable you are returning.

Mistake: We are accidentally returning `job_skills_new_tuple` instead of the original tuple `job_skills`

```python
job_skills.remove('r')
```

Steps to Debug:

1. Look at the actual error, can you tell what the problem is?
2. If not, then look it up:
  1. Use a chatbot like ChatGPT or Claude
  2. Look it up using Google

In [None]:
job_skills_new_tuple = ('r', 'postgresql',)
job_skills += job_skills_new_tuple

job_skills_new_tuple

('r', 'postgresql')

Note: To fix this just call the correct variable `job_skills`.

In [None]:
job_skills

('python', 'sql', 'statistics', 'tableau', 'r', 'postgresql')

If you rerun the entire cell (below) with the correct variable called at the end you will accidentally add a tuple to your tuple, and create duplicates. This is because you are adding the tuple to it again with these lines:

```python
job_skills_new_tuple = ('r', 'postgresql',)
job_skills += job_skills_new_tuple
```

In [None]:
job_skills_new_tuple = ('r', 'postgresql',)
job_skills += job_skills_new_tuple

job_skills

('python',
 'sql',
 'statistics',
 'tableau',
 'r',
 'postgresql',
 'r',
 'postgresql')

### Join()

We already went over this for lists but remember the `.join` method is used for concatenating a sequence of strings together with a specified separator. You need to add a specific separator (e.g. `', '`) between each element during the concatenation process.

For tuples specifically we are combining multiple strings from a tuple.

In [None]:
skills = ('Python', 'SQL', 'Excel')

In [None]:
# f-string formatting
print(f'I have these skills: {", ".join(skills)}')

I have these skills: Python, SQL, Excel


### Remove Items

Since tuples are **unchangeable** you can't remove items for it. But you can use a similar workaround that we used for changing and adding tuple items.

In [None]:
skills.remove('Excel')

AttributeError: 'tuple' object has no attribute 'remove'

#### Remove()

Convert the tuple into a list, remove the item using `remove()` method, and convert it back to a tuple.

In [None]:
# Turn tuple into a list
job_skills_remove = list(job_skills)

# Remove skill from the list
job_skills_remove.remove('tableau')

# Convert list to a tuple
job_skills_tuple = tuple(job_skills_remove)

job_skills_tuple


('python', 'sql', 'statistics', 'r', 'postgresql', 'r', 'postgresql')

#### Del

Or you can use the `del` keyword to delete the tuple completely. If we try to show the job_skills are we delete it, it will return an error.

In [None]:
del job_skills
job_skills

NameError: name 'job_skills' is not defined

## range()

Now these 4 container data types we covered over the past few notebooks are not the only container data types.

The `range()` function generates a sequence of numbers. It is commonly used for looping a specific number of times in for loops.

In [None]:
range(5)

range(0, 5)

Note: `range()` isn't a tuple, it's its own datatype; but it's dang close and short, so we'll cover it here.

In [None]:
type(range(5))

range

But let's actually see inside of this object by converting it to a tuple.

In [None]:
tuple(range(5))

(0, 1, 2, 3, 4)

Syntax: The `range()` function can be called with one, two, or three arguments:  
* `range(stop)`: Generates numbers from 0 to stop-1 (read: stop minus 1)
* `range(start, stop)`: Generates numbers from start to stop-1
* `range(start, stop, step)`: Generates numbers from start to stop-1

In [None]:
list(range(2,5))

[2, 3, 4]

In [None]:
list(range(0, 100, 2))

[0,
 2,
 4,
 6,
 8,
 10,
 12,
 14,
 16,
 18,
 20,
 22,
 24,
 26,
 28,
 30,
 32,
 34,
 36,
 38,
 40,
 42,
 44,
 46,
 48,
 50,
 52,
 54,
 56,
 58,
 60,
 62,
 64,
 66,
 68,
 70,
 72,
 74,
 76,
 78,
 80,
 82,
 84,
 86,
 88,
 90,
 92,
 94,
 96,
 98]