# Built-in Data Structures I: Strings and Lists

## Contents
- [Strings](#section1)
    - [Create strings](#subsection1.1)
    - [Sequence operations](#subsection1.2)
    - [Methods of strings](#subsection1.3)
- [Lists](#section2)
    - [Create lists](#subsection2.1)
    - [Comparison between lists and strings](#subsection2.2)
    - [List methods](#subsection2.3)
    - [Lists as iterables](#subsection2.4)

## Strings <a id="section1"></a>

### Create strings <a id="subsection1.1"></a>

A `str` type object can be created by enclosing characters in either single or double quotation marks. 

In [1]:
this = 'Hello'      # Create a string by single quotes
that = "World"      # Create a string by double quotes

print(this)
print(that)

Hello
World


It is also possible to create a multi-line string by three single or double quotes, as shown by the sample code below.

In [2]:
shining = """
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
         All work and no play 
         makes Jack a dull boy
         All work and no play
         makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
"""

print(shining)


All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
         All work and no play 
         makes Jack a dull boy
         All work and no play
         makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy



In previous lectures, we also learned the following ways to create strings:
1. The output of the `input()` function is always a `str` type object;
2. Convert objects of other types to strings by the `str()` function;
3. Expressions of strings involving `+` (for concatenating strings) or `*` (for duplicating strings).

### Sequence operations <a id="subsection1.2"></a>

A string can be viewed as a sequence of characters, so it supports operations that assume a positional ordering among items.

#### Length of a string
The length of a string gives the total number of characters, which can be returned by the built-in function `len()`.

In [3]:
greetings = "Hello World"

print(len(greetings))

11


#### Indexing and slicing of strings 
Each character in a string corresponds to an index number which starts from **0**. Take the string `"Hello World"`, for example, the indexes are coded as the table below. 

`H` |   `e` |  `l` | `l` | `o` | ` `  | `W` | `o` | `r` | `l` | `d`
:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 

All characters can be accessed via the associated indexes, as shown below.

In [4]:
greetings = "Hello World"

letter_e = greetings[1]         # Access the second item via index 1
print(letter_e)

letter_r = greetings[8]         # Access the ninth item via index 8
print(letter_r)

e
r


In Python, we can also index backward, from the end—positive indexes count from the left, and negative indexes count back from the right.

`H` |   `e`  |  `l` | `l` | `o` | ` `  | `W` | `o` | `r` | `l` | `d`
:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 
-11| -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 

Characters in a string can be access by the backward indexes likewise. 

In [5]:
greetings = "Hello World"

letter_d = greetings[-1]        # Access the last item 
print(letter_d)

letter_l = greetings[-2]        # Access the second to last item
print(letter_l)

d
l


Besides accessing individual characters of a string, We can also call out a range of characters from the string, by creating a **slice** in the form of <code>[<i>start</i>:<i>stop</i>:<i>step</i>]</code>, where *`start`*, *`stop`*, and *`step`* are three input arguments that define the values of the indexes.

Arguments | Remarks | Default Values
:--------|:-------|:--------------
*`start`*   | The first index of the slice | 0 
*`stop`*    | The index before which the slice stops | length of the string
*`step`*    | The step length of the slice | 1

Some examples are provided below to illustrate how the slice works.

In [6]:
greetings = "Hello World"

print(greetings[0:5:1])     # Print the first five characters

print(greetings[6:11:1])    # Print the last five characters

print(greetings[0:11:2])    # Print the 1st, 3rd, 5th, ... characters

Hello
World
HloWrd


The default value is applied in defining the slice expression if a particular argument is not specified. Notice that positions of these arguments are indicated by the `:` symbol, and by properly placing the `:` symbols, a slice can be created with some arguments omitted, such as:
- *`[:]`*: *`start`*`= 0`, *`stop`* is the length of the string, and *`step`*`= 1`, by default, so the slicing expression takes all characters of the string;
- *`[start:]`*: *`stop`* is the length of the string and *`step`*`=1`, by default, so the slicing expressing takes all characters from *`start`* to the end of the string;
- *`[:stop]`*: *`start`*`= 0` and *`step`*`= 1`, by default, so the slicing expression takes all characters from the first one until *`stop`*`-1`.
- *`[start:stop]`*: *`step`*`= 1`, by default, so the slicing expression takes all characters from the *`start`* until *`stop`*`- 1`.
- *`[::step]`*: *`start`*`= 0` and *`stop`* is the length of the string, by default, so the slicing expression takes the first character, the third one, ..., until *`stop`*`- 1`. 

As a result, slices in the previous code cell can be rewritten as follows.


In [7]:
greetings = "Hello World"

print(greetings[:5])        # Print the first five characters

print(greetings[6:])        # Print the last five characters

print(greetings[::2])       # Print the 1st, 3rd, 5th, ... characters

Hello
World
HloWrd


In the case that *`step`*`= - 1`, the slice takes characters in the opposite direction, so there is a special slicing expressiong `[::-1]`, which takes all characters of the string in the opposite direction. 

In [8]:
greetings = "Hello World"

print(greetings[::-1])

dlroW olleH


### Methods of strings <a id="subsection1.3">
A method is a special function associated with an object. All methods are called via the syntax <code><i>object</i>.<i>method</i>()</code>. You may check [Python string methods](https://www.programiz.com/python-programming/methods/string) for the full list of methods for strings. In this lecture, you are supposed to understand the following frequently used methods.

#### Case conversion methods. 

In [9]:
line = "all work and no play makes Jack a dull boy"

line_upper = line.upper()       # Convert all letters to upper case
line_lower = line.lower()       # Convert all letters to lower case
line_cap = line.capitalize()    # Convert the first letter to upper case
line_swap = line.swapcase()     # Swap upper and lower case
line_title = line.title()       # Capitalize the 1st letter of each word

print(line_upper)
print(line_lower)
print(line_cap)
print(line_swap)
print(line_title)

ALL WORK AND NO PLAY MAKES JACK A DULL BOY
all work and no play makes jack a dull boy
All work and no play makes jack a dull boy
ALL WORK AND NO PLAY MAKES jACK A DULL BOY
All Work And No Play Makes Jack A Dull Boy


<div class="alert alert-block alert-success">
<b>Example 1:</b>  
Write a program to count the number of letter "a"s (either upper case or lower case) in a given string.
</div>

In [10]:
string = """
Many years later, as he faced the firing squad, Colonel 
Aureliano Buendía was to remember that distant afternoon 
when his father took him to discover ice. At that time 
Macondo was a village of twenty adobe houses, built on 
the bank of a river of clear water that ran along a bed 
of polished stones, which were white and enormous, like 
prehistoric eggs. 
"""

count = 0                       # Initialize the count to be 0
for char in string:
    if char.lower() == 'a':     # If the character is "a" or "A"
        count += 1              # The count is increased by 1

print(count)

30


The program above iterates each character in the string, and the value of `count` is increased by one if the character is an "a" or "A". Here the character `char` is converted to lower case by calling the `lower()` method, so we only need check if it is the same as the lower case letter "a". This method is more concise compared with the direct comparison `char == 'a' or char == 'A'`. 

#### The `count()` method

As the name suggests, the `count()` method of strings returns the number of appearances of a specif value in the string object. By using the `count()` method, **Example 1** can be easily solved by the code segment below without using a loop. 

In [11]:
string = """
Many years later, as he faced the firing squad, Colonel 
Aureliano Buendía was to remember that distant afternoon 
when his father took him to discover ice. At that time 
Macondo was a village of twenty adobe houses, built on 
the bank of a river of clear water that ran along a bed 
of polished stones, which were white and enormous, like 
prehistoric eggs. 
"""
    
count = string.lower().count('a')
print(count)

30


#### The `format()` method
The method `format()` of `str` type objects provides a convenient way to control the format of a string. Take a look at the example below.

In [12]:
exam = 85                   # Final exam marks of a course
grade = 'A+'                # Final grade of the course

text = 'Your exam marks: {}, your grade: {}'.format(exam, grade)
print(text)

Your exam marks: 85, your grade: A+


The curly brackets within a string (called format fields) are replaced with the objects passed into the `format()` method. In the example above, the first pair of curly brackets is replaced by the first object `final=85`, and the second pair is replaced by the second object `grade='A'`. The `format()` method also allows users to index the format fields in a string, as demonstrated by the following example. 

In [13]:
print('{0}, {1}, and {2}'.format('apple', 'orange', 'banana'))
print('{1}, {0}, and {2}'.format('apple', 'orange', 'banana'))
print('{0}, {2}, and {1}'.format('apple', 'orange', 'banana'))

apple, orange, and banana
orange, apple, and banana
apple, banana, and orange


You may notice that the curly brackets indexed by `0` are replaced by the first object given in the method `format()`, the brackets indexed by `1` are replaced by the second object, and the ones index by `2` are replaced by the third object, and so on so forth. 

The `format()` method is also able to control the specific displayed format of numbers and objects of other types. Interested readers may refer to [format](https://www.programiz.com/python-programming/methods/string/format) for more details.

Besides using the `format()` method, we could also use the `f`-string, indicated by an `f` letter in front of the string, to insert values to the string. The `f`-string enables users to use any valid variables or expressions to replace curly brackets in the string, as shown by the sample code below.

In [14]:
name = 'John'
balance = 25678.95

print(f'Hello {name}, you have ${balance} in your account.')   # Note that there is an 'f' in front of the string

Hello John, you have $25678.95 in your account.


#### The `replace()` method
The `replace()` method creates a new string where a part of the given string is replaced with a user-specified value. Here are a few examples.

In [15]:
string_us = 'Coffee enhances my modeling skills.'         # A sentence in American English
print(string_us)

string_uk = string_us.replace('modeling', 'modelling')    # A sentence in British English
print(string_uk)

string_sg = string_uk.replace('Coffee', 'Kopi')           # A sentence in Singlish
print(string_sg)

Coffee enhances my modeling skills.
Coffee enhances my modelling skills.
Kopi enhances my modelling skills.


Note that in the default setting, if the target value appears multiple times in the string, then the `replace()` method replaces all of them by the user-specified value.

## Lists <a id="section2"></a>

### Create lists <a id="subsection2.1"></a>
The Python list is a collection of data (with the same or mixed types), created by placing all the items (elements) inside a square bracket `[]`, separated by commas.

Here is a list of strings.

In [16]:
furious_five = ['Tigress', 'Crane', 'Mantis', 'Monkey', 'Viper']    
print(furious_five)
print(type(furious_five))

['Tigress', 'Crane', 'Mantis', 'Monkey', 'Viper']
<class 'list'>


A list of numbers, which are a mixture of `int` and `float` type items. 

In [17]:
numbers = [1, 2.0, 3.0, 4, 5, 6.0]                                       
print(numbers)

[1, 2.0, 3.0, 4, 5, 6.0]


A list of items that are strings and floating point numbers.

In [18]:
condo = ['SEASCAPE',      # Name of the condo project (str)
         'CCR',           # Region segment of the condo (str)
         'Resale',        # Type of the condo (str)
         4388000.0]       # Price of the condo (float)
print(condo)

['SEASCAPE', 'CCR', 'Resale', 4388000.0]


Note that in defining creating the list `condo`, we break a long line of code into aligned shorter lines, and this is allowed when the expression is enclosed in parentheses, brackets or braces. Check the [PEP 8 Style Guide](https://www.python.org/dev/peps/pep-0008) for more details.


<div class="alert alert-block alert-warning">
<b>Coding Style: </b> Limit all lines of code to a maximum of 79 characters.
</div>

> *The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses.* -[PEP 8 Style Guide](https://www.python.org/dev/peps/pep-0008/#maximum-line-length)

A list can also be empty, in the sense that there is no item in it. it can be simply created by empty brackets.

In [19]:
feel_empty = []
print(feel_empty)

[]


Lists can also be created by converting objects of other types to lists.

In [20]:
print(list('abcd'))    # Convert a string into a list
print(list(range(5)))  # Convert the range type object into a list

['a', 'b', 'c', 'd']
[0, 1, 2, 3, 4]


### Comparison between lists and strings <a id="subsection2.2"></a>

#### Similarities

- The arrangement of data is similar: data items in a list are organized as an ordered sequence, just like how characters are organized in a string. so the `len()` function can also be used to return the number of items in a list.

In [21]:
print(len(furious_five))
print(len(numbers))
print(len(condo))
print(len(feel_empty))

5
6
4
0


-  Items in a list can be accessed via the same indexing and slicing system as strings. 

In [22]:
last_warrior = furious_five[-1]         # The last item from the list
first_two_warriors = furious_five[:2]   # The first two items

print(last_warrior)
print(first_two_warriors)

Viper
['Tigress', 'Crane']


- The `+` and `*` operators can be used to concatenate or duplicate items in a list.

In [23]:
letters = ['A', 'B', 'C']       # A list of letters
numbers = [2, 2.5]              # A list of numbers

mixed = letters + numbers*3     # A mixed list

print(mixed)                    # Print the new mixed list
print(len(mixed))               # Print the length of the mixed list

['A', 'B', 'C', 2, 2.5, 2, 2.5, 2, 2.5]
9


From the sample code above, you may notice that a list can also be created as 1) a slice of another list; and 2) the result of concatenation or duplication operations. 

#### Differences

- Lists are **mutable**, meaning that you can modify part of the list elements. The string, on the other hand, is **immutable** as the characters cannot be partially changed. If you try to change a subset of string, an error message would be given. 

In [24]:
my_answers = ['B', 'C', False, True, 0.256, 2]
print(my_answers)                   # Print the original list

my_answers[1] = 'D'                 # Modify the 2nd item in the list
print(my_answers)                   # Print the modified list

my_answers[2:4] = [True, False]     # Modify the 3rd and the 4th item
print(my_answers)                   # Print the modified list

['B', 'C', False, True, 0.256, 2]
['B', 'D', False, True, 0.256, 2]
['B', 'D', True, False, 0.256, 2]


### List methods <a id="subsection2.3"></a>

Similar to strings, the `list` type objects can also be processed by a number of methods. The full details of list methods can be found on [Python list methods](https://www.programiz.com/python-programming/methods/list). In this lecture, we will focus on a few frequently used list methods. 

#### Adding items by `append()`, `extend()`, and `insert()`

Python list methods <code>append()</code> and <code>extend()</code> are used to modify a list by adding a single item or multiple items to the end of it, as shown by the following code. 

In [25]:
games = ['Halo infinite', 'Deathloop', 'Cyberpunk 2077']
print(games)

games.append('AOE IV')          # "AOE IV" is added to the list
print(games)

other_games = ['Battlefiled 2042', 'Fifa 22']
games.extend(other_games)       # Items in other_games are added to the list
print(games)

['Halo infinite', 'Deathloop', 'Cyberpunk 2077']
['Halo infinite', 'Deathloop', 'Cyberpunk 2077', 'AOE IV']
['Halo infinite', 'Deathloop', 'Cyberpunk 2077', 'AOE IV', 'Battlefiled 2042', 'Fifa 22']


Besides adding elements at the end of the list, we can also insert an item in an arbitrary position of the list, by using the method <code>insert()</code>, as demonstrated by the following code.

In [26]:
games = ['Halo infinite', 'Deathloop', 'Cyberpunk 2077', 'AOE IV']
print(games)

games.insert(2, 'Dota 2')       # Insert "Dota 2" at position 2
print(games)

['Halo infinite', 'Deathloop', 'Cyberpunk 2077', 'AOE IV']
['Halo infinite', 'Deathloop', 'Dota 2', 'Cyberpunk 2077', 'AOE IV']


It can be seen that for the method <code>insert()</code>, the first argument is the index of the position that the new item is inserted, and the second input argument is the data item to be inserted. 

#### Deleting items by `remove()` and `pop()`
List methods `remove()` and `pop()` are used to delete an item from a list, given the value or index of the item, respectively. Examples are provided below to illustrate these two methods.

In [27]:
games = ['Halo infinite', 'Deathloop', 'Cyberpunk 2077', 'AOE IV']
print(games)

games.remove('Deathloop')       # Remove the item "Deathloop"
print(games)

['Halo infinite', 'Deathloop', 'Cyberpunk 2077', 'AOE IV']
['Halo infinite', 'Cyberpunk 2077', 'AOE IV']


It can be seen that the item `'Deathloop'` is deleted from the list by the `remove()` method. Please note that
- The `remove()` method only removes the first occurrence of the input argument. 
- An error message is raised if the given value does not appear in the list. 

Besides specifying the value to be removed, we could also use the `pop()` method to remove items according to their position indexes. 

In [28]:
games = ['Halo infinite', 'Deathloop', 'Cyberpunk 2077', 'AOE IV']
print(games)

item = games.pop(2)             # Remove and return the item at position 2
print(item)
print(games)

['Halo infinite', 'Deathloop', 'Cyberpunk 2077', 'AOE IV']
Cyberpunk 2077
['Halo infinite', 'Deathloop', 'AOE IV']


If the position index is not specified, then the `pop()` method removes the last item of the list. 

In [29]:
games = ['Halo infinite', 'Deathloop', 'Cyberpunk 2077', 'AOE IV']
print(games)

item = games.pop()              # Remove and return the last item
print(item)
print(games)

['Halo infinite', 'Deathloop', 'Cyberpunk 2077', 'AOE IV']
AOE IV
['Halo infinite', 'Deathloop', 'Cyberpunk 2077']


#### Searching for an item by `index()`

The `index()` method returns the position index of a given item in the list, as demonstrated by the following example.

In [30]:
num_list = [3.5, 2.6, 0.2, 3.30, 1.8, 2.9, 5, 3.3]

print(num_list.index(3.3))

3


Please note that
- The `index()` can only find the index of the first appearance of the given value.
- It gives an error message if the given value does not appear in the list.

### Lists as iterables <a id="subsection2.4"></a>
In Python programming, lists and strings are called **iterables**. An iterable is a compound data object that each of its elements is returned in an iteration of a `for` loop. 

<div class="alert alert-block alert-success">
<b>Example 2:</b>  
    The list <span style='font-family:Courier'><b>usd</b></span> contains four money transactions in US dollars. Create another list named <span style='font-family:Courier'><b>sgd</b></span> that transfers each transaction into Singapore dollars. 
</div>

In the following code cell, data items in the list `usd` are iteratively printed using a `for` loop. It can be seen that iterating items in a list is very similar to iterating each character in a string. 

In [31]:
usd = [2, 3.60, 2.05, 13.50]

for trans in usd:
    print(trans)

2
3.6
2.05
13.5


Hence we have the program below to create a new list with all money transactions in Singapore dollars.

In [32]:
exchange_rate = 1.37
usd = [2, 3.60, 2.05, 13.50]

sgd = []                                # Create an empty list named sgd
for trans in usd:                       # Iterate each item in the usd
    sgd.append(trans*exchange_rate)     # Append each item to the sgd list

print(sgd)

[2.74, 4.932, 2.8085, 18.495]


In the code above, we firstly create an empty list `sgd`, then use a `for` loop to iterate each transaction (`trans`) in the list `usd`. In each iteration, each transaction in Singapore dollars is calculated as `trans * exchange_rate` and is appended to the list `sgd`. After all iterations, the list `sgd` has all five transactions in Singapore dollars.

A more Pythonic way of creating such a list is using **list comprehension**, which enable you to replace the loop statements with a single-line expression in the format of <code>[<i>expression</i> for <i>item</i> in <i>iterable</i>]</code>. 

In [33]:
exchange_rate = 1.37
usd = [2, 3.60, 2.05, 13.50]

sgd = [trans*exchange_rate for trans in usd]

print(sgd)

[2.74, 4.932, 2.8085, 18.495]


In creating the list `sgd`, the expression of each item is written as `trans*exchange_rate`, where the variable `trans` is defined by the statement `for trans in usd`, meaning that it takes the value of each item in the list `usd`. 

In fact, we may also add `if`-statement into list comprehensions to process a subset of items in a given list. 

<div class="alert alert-block alert-success">
<b>Example 3:</b> 
    Given a list of words, create a new list that includes all words starting with a vowel letter (A, E, I, O, or U).
</div>

In [34]:
words = ['AI', 'machine learning', 'analytics', 'prediction', 
         'inference', 'regression', 'optimization']

We first create the new list with the `for` statements. 

In [35]:
new = []
for word in words:
    if word[0].lower() in 'aeiou':
        new.append(word)

print(new)

['AI', 'analytics', 'inference', 'optimization']


Equivalently, we can use comprehensions to create the new list. 

In [36]:
new = [word for word in words if word[0].lower() in 'aeiou']

print(new)

['AI', 'analytics', 'inference', 'optimization']


<div class="alert alert-block alert-warning">
<b>Coding Style: </b>   
List comprehension is preferred to a loop in creating new lists.
</div>