# String

Manipulating strings is one of the most common tasks in data science and machine learning. In fact, there's a whole field of machine learning called **Natural Language Processing** that uses strings as input.

Through out the course, you have seen plenty of `string` variables. In python, strings are surrounded by quotes. The quotes can either be single quote (`'`) or double quote (`"`).

For example, the string `'Hello World'` and `"Hello World"` are the same.

Personally, I prefer to use double quotes (`"`) because it's easier to read, and there are often occasions where the string contains a single quote (`'`). In this case, if you use "'" to wrap around the string, you need to provide an extra backslash (`\`) before the single quote.

Let's look at an example. The 2 strings below are the same.

```python
first_string = 'Welcome to \'Python for Data Science course\'!'
second_string = "Welcome to 'Python for Data Science course'!"
print(first_string == second_string)
```

**Exercise**

What would happen if you forget a backslash (`\`) before the single quote (`'`) in the first string?

```python
print('Welcome to 'Python for Data Science course'!')
```

In [None]:
# [TODO]


**Exercise**

If we define a `third_string` as follow, is it the same as the `first_string` and `second_string`?

```python
third_string = 'Welcome to \"Python for Data Science course\"!'
```

In [None]:
third_string = 'Welcome to \"Python for Data Science course\"!'

# [TODO]


**Exercise**

Fill in the summary table below.

|`example_str`|Output of `print(f"{example_str}")`|
|:-:|:-:|
|`"This course is awesome!"`|**WRITE YOUR ANSWER HERE!**|
|`"How\'re you?"`|**WRITE YOUR ANSWER HERE!**|
|`"Does this /\\ look like a mountain to you?"`|**WRITE YOUR ANSWER HERE!**|

<font size="5">[TODO] 📖</font>


`\n` is a newline character. It's used to create a new line.

The following example will demonstrate the difference between using and not using `\n`:

```python
print("Welcome to Python for Data Science course!")
print("I am your instructor")

print("------------------------------------------")

print("Welcome to Python for Data Science course!\n")
print("I am your instructor")
```

Python also allows **multiline strings**. You can assign a multiline string to a variable by using triple quotes (either `"` or `'`).

For example, below is a multiline string.

```python
multiline_str = """
I am
learning
Python for Data Science
with Leo
a Data Scientist
"""

print(multiline_str)
```

Let's define another variable called `another_str`.

```python
another_str = "I am\nlearning\nPython for Data Science\nwith Leo\na Data Scientist"

print(another_str)
```

**Exercise**

Is `another_str` the same as `multiline_str`? In other words, what would be the result of the following code?

```python
another_str == multiline_str
```

In [None]:
# [TODO]


**Strings** can be thought of as a list of characters. Let's see if we can apply what we know about **list** on **string**.

The following exercises will all use this `example_str` variable.

```python
example_str = "I am old enough to drive a car!" 
```

🏎️🏎️🏎️🏎️

**Exercise**

What's the length of `example_str`?

In [None]:
# [TODO]


**Exercise**

What is the index of the first `o` character in `example_str`?

In [None]:
# [TODO]


**Exercise**

- Can you print the characters from index `2` to index `10`?
- **BONUS**: Can you print the last 4 characters of `example_str`?

In [None]:
# [TODO]


**Exercise**

Try looping through the first 5 characters of `example_str` and print each character.

In [None]:
# [TODO]


**Exercise**

Try modifying the 1st character of `example_str` to `"X"`.

In [None]:
# [TODO]


**Exercise**

Can you `append()` a character to `example_str`? **YES** or **NO**?

In [None]:
# [TODO]


From the 2 exercises above, you can see that we **CANNOT** modify a string. In other words, **string is immutable**.

What if we still want to modify the value of our string variable `example_str`? 🤔

In this case, we can simply reassign our string variable `example_str` to the desired value.

```python
example_str = "I am old enough to drive a car!"
print(example_str)

example_str = "We are old enough to drive a car!"
print(example_str)
```

**String methods**

In `lesson 2`, you have learned to:
- concatenate multiple strings together using the `+` operator.
- duplicate a string using the `*` operator.

In this lesson, you'll learn about **other string methods**.
- Capitalise the first letter of a string using `.capitalize()`.
- Check if a string starts with a certain character using `.startswith()`.
- Check if a string ends with a certain character using `.endswith()`.
- Check if a string contains a certain character using `.in`.
- Check if a string can be converted to an `int` using `.isnumeric()`.
- Convert a string to **UPPERCASE** using `.upper()`.
- Convert a string to **LOWERCASE** using `.lower()`.
- Replace a string with another string using `.replace()`.

Let's perform all of the above-mentioned transformations on a our string variable `example_str`.

```python
print(f"Original:                   {example_str}")
print("-------------------------------------------------------------")
print(f"Capitalise 1st letter:      {example_str.capitalize()}")
print("-------------------------------------------------------------")
print(f"Check if starts with 'I':   {example_str.startswith('I')}")
print("-------------------------------------------------------------")
print(f"Check if ends with '!':     {example_str.endswith('!')}")
print("-------------------------------------------------------------")
print(f"Check if contains 'old':    {'old' in example_str}") 
print("-------------------------------------------------------------")
print(f"Check if is numeric:        {example_str.isnumeric()}")
print("-------------------------------------------------------------")
print(f"Convert to UPPERCASE:       {example_str.upper()}")
print("-------------------------------------------------------------")
print(f"Convert to LOWERCASE:       {example_str.lower()}")
print("-------------------------------------------------------------")
print(f"Replace 'o' with '0':       {example_str.replace('o', '0')}")
```

In [None]:
example_str = "i am old enough to drive a car!"



**Exercise**

What's the output of the following code?

```python
print(f"Check if contains 'Old': {'Old' in example_str}")
```

**Hint**: Python is **CASE-SENSITIVE**.

In [None]:
# [TODO]


Thus, to ensure that we cover for all forms of the word `Old`, it is common practice to use `upper()` or `lower()` method. 

Our code should be:
```python
print(f"Check if contains 'Old': {'Old'.lower() in example_str.lower()}")
```

**Exercise**

**HARD** 🤯

Write a function that takes in 2 strings and returns the number of times the first string appears in the second string.
- The function should be **Case-Insensitive**.
- If the second string is not in the first string, return 0.
- If the second string is empty, return the length of the first string.

In [None]:
# [TODO]


In [None]:
# [TODO]


It's very common in **Natural Language Processing** to break a large sentence into component words. 

We'll do that using the `split()` method.

Let's first use the `help()` function to view the documentation for the `split()` method.

```python
help(str.split)
```

From the above output, we can see that `split()` takes a single argument `sep` which is the character or string that we want to split the string on. The result of running the function is a list of smaller strings.

We know that our `example_str` is a sentence having words separated by a space ` ` character. Let's split `example_str` based on the space character.

```python
example_str.split(" ")
```

In [None]:
example_str = "I am old enough to drive a car!"


**Exercise**

Without running the code below, can you tell me what the output of it is?

```python
print(len("2022-07-25".split("-")))
```

In [None]:
# [TODO]


The opposite of `split()` is the `join()` method, which can be used to concatenate multiple strings together using a certain character or string as a separator.

Let's look at the documentation for the `join()` method.

```python
help(str.join)
```

From the above example `'.'.join(["ab", "pq", "rs"])` returns `"ab.pq.rs"`. The 3 strings are joined together using the `'.'` character.

**Exercise**

Let's reassemble the following words into a sentence!

```python
words = ["I", "love", "Python", ". It's my", "favourite", "language"]
```

In [None]:
# [TODO]


**Exercise**

You know that strings can be concatenated using the `+`. What if we want to concatenate a `str` and an `int`?

Will the following code run without error?

```python
age = 21
print("I am " + age)
```

In [None]:
# [TODO]


There are multiple ways to solve this problem. 
- We can convert the `int` to a `str` using the `str()` function.
- We can also use the `format()` method.
- OR we can use the `f-string` syntax as you have seen in countless occasions.

I prefer using `f-string` syntax since it is **more readable** and **easy to understand**.

Nonetheless, it's worth noting that `format()` method is still a useful tool for formatting strings.
The `format()` method takes in an unlimited number of arguments, and you can use **index numbers** to specify which argument you want to use.

```python
age = 21
characteristic = "rich"
assets = "BTCs and ETHs"

print("I am {1} years old. I am super {0} and I have lots of {2}".format(characteristic, age, assets))
```

# Dictionary

**Python dictionary** is an unordered collection of items. Each item is a `key-value` pair.
- A dictionary is denoted by the curly braces `{}`. 
- Items inside a dictionary are separated by a comma `,`.
- Keys and values are separated by a colon `:`.
- While values can be of any data type, **keys must be of immutable** data type and must be **unique**.

Let's look at a few examples of dictionaries.
1. Empty dictionary:

    ```python
    empty_dictionary = {}
    ```

1. Dictionary with string keys and mixed type values:

    ```python
    employee_details = {
        "name": "Leo",
        "age": 21,
        "is_data_scientist": True,
        "is_programmer": False,
    }
    ```
    
1. Dictionary with mixed type keys:

    ```python
    random_dictionary = {
        1: "a",
        "b": 2,
    }
    ```

**Exercise**

Define an empty dictionary and print it to the console! What is the output of your `print()`?

```python
empty_dictionary = {}
print(empty_dictionary)
```

In [None]:
# [TODO]


Let's define a dictionary named `movie_info` and see how we can access its keys and values.

```python
movie_info = {
    "title": "Em Va Trinh",
    "year": 2021,
    "cast": [
        "Avin Lu",
        "Hong Ha",
        "Lan Thy",
    ],
    "ost": "Ballad to the dead",
}
```

We can access **the value** of the dictionary by wrapping a pair of square brackets `[]` around the **corresponding key**.

For example, if we want to retrieve the value of the `"title"` key, we can use the following code:

```python
print(movie_info["title"])
```

**Exercise**

Who are the casts of the movie `Em Va Trinh`? Can you print each cast member on a separate line in the console?

**Hint**: What's the data type of `movie_info["cast"]`?

In [None]:
# [TODO]


**Exercise**

We know that `Tran Luc` plays the older version of `Trinh Cong Son`. How do we add `Tran Luc` to the `cast` list?

**Hint**: How do you add an element to a list?

In [None]:
# [TODO]


What if we want to add a new `key-value` pair to our existing dictionary?

Let's see how to add `director: "Phan Gia Nhat Linh"` to the `movie_info` dictionary.

```python
movie_info["director"] = "Phan Gia Nhat Linh"
```

Another way to add a new `key-value` pair to an existing dictionary is to use the `update()` method.

```python
movie_info.update({
    "director": "Phan Gia Nhat Linh"
})
```

**Exercise**

We know that the movie `Em Va Trinh` was released in `2022`. Nonetheless, the value of key `year` is `2021`. How do we update the value of key `year` to the correct value of `2022`?
- Access the value of the key `year` and print it to the screen.
- Assign the value of `2022` to the key `year`.
- Print the new value of the key `year` to the screen.

In [None]:
# [TODO]


The `.update()` method can also be used to update the value of a key.

```python
movie_info.update({"year": 2022})
```

If we want to remove a `key-value` pair from a dictionary, we can use the following methods:
- `pop()` method: prodive a `key` and the `value` of that `key` will be removed from the dictionary.
- `del` keyword: `del movie_info["year"]` will remove the `year` `key-value` pair from the dictionary.

**Exercise**

Remove the `ost` `key-value` pair from the `movie_info` dictionary.

In [None]:
# [TODO]


What if we want to access a non-existent key in the dictionary?

```python
movie_info["producer"]
```

In order to access all the keys in the dictionary, we can use the `keys()` method.

```python
movie_info.keys()
```

In order to not get the `KeyError`, it's recommended to check if the `key` exists before accessing it. 

Similar to checking if an element exists in a list, you can do this by using the `in` keyword.

```python
print("producer" in movie_info.keys())
```

Another way to check if a `key` exists in a dictionary is to use the `get()` method.

```python
movie_info.get("producer")
```

If there's no value associated with the `key`, the `get()` method returns `None`. Otherwise, `get()` method returns the corresponding `value` of the `key`.

In order to access all the values in the dictionary, we can use the `values()` method.

```python
movie_info.values()
```

**Exercise**

Given the following dictionary, find the key with the largest value. If there are multiple keys with the same largest value, print all of them.

```python
final_scores = {
    "Alice": 90,
    "Bob": 80,
    "Homelander": 50,
    "Jayden": 20,
    "Leo": 10,
    "Starlight": 50,
    "Tony": 90,
}
```

**Hint**: `max()` function can be used to find the largest value in a list.

In [None]:
# [TODO]


**Exercise**

Can you solve the above exercise using list comprehension?

In [None]:
# [TODO]


In order to access all the items in the dictionary, we can use the `items()` method.

```python
movie_info.items()
```

As you can see, the `items()` method returns a list of tuples. Each tuple contains a key-value pair.

We won't be learning about the `tuples` type in this course, but you can just remember that tuples are **immutable** and are wrapped in parentheses `()`.

**Exercise**

Write a function that takes in a dictionary and returns a list of all the keys in the dictionary.

In [None]:
# [TODO]


Let's loop over the `items` in the dictionary and print each `key-value` pair on a separate line to the screen.

```python
for k, v in movie_info.items():
    print(k, v)
```

In addition to **list comprehension**, Python also has **dictionary comprehension**.

Let's use **dictionary comprehension** to create a dictionary named `initials` having:
- keys as the casts of the movie `Em Va Trinh`
- values as the first letter of each cast member.

```python
initials = {
    k: k[0] for k in movie_info["cast"] if len(k) > 0
}

print(initials)
```

**Exercise**

Use **dictionary comprehension** to create the following dictionary:

```bash
cubes = {1: 1, 2: 8, 3: 27, 4: 64, 5: 125}
```

**Hint**: `value` = `key`**3

In [None]:
# [TODO]


**Exercise**

Given the list `numbers` below:

```python
numbers = list(range(1, 10))
```

Create a dictionary named `even_numbers_doubled` that has:
- keys as the even numbers in `numbers`
- values as the doubled value of the even numbers.

```python
even_numbers_doubled == {2: 4, 4: 8, 6: 12, 8: 16}
```

In [None]:
# [TODO]
