# Recap
---

## List & Tuples

**Properties in general:**

* Both are able to store different datatypes.
* Both uses indexes to access elements.
* Lists uses the square brackets `[]` & Tuples uses the rounded brackets `()`.
* Lists are variable length (meaning that the length can change) & Tuples are fixed length (meaning once created, length cannot be changed).
* Lists are mutable & Tuples are immutable.
* Lists has more functionalities compared to Tuples.


**Indexing**

| ![list_indexes.png](attachment:list_indexes.png) |
|:---:|
| **Figure 1:** Reading indexes of Iterable Objects. |

**Slicing properties:**

* uses the square brackets `[]`.
* done using the colon(`:`) operator.
* 3 parts: start, end and step.
* depending on placement of the numeral, it determines the slice direction and splitting distance.

## Exercise

What are the output of the codes in the table, given the lists below:
```python
a = [5,6,9,1,8,61,91,33,7,0,56,74,30,36,15,14,4,3]  # length is 18
b = ('boy', 'girl', 'dog', 'cat', 'horse', 'pig', 'chicken', 'goat', 'lamb', 'duck')
```
| No. | Code | Output |
|:---:|:---|:---|
| 1. | `print(a[-2])` | 4 |
| 2. | `b[5] = 'elephant'` | Error: because b is a tuple!|
| 3. | `a[-5:]` |[36,15,14,4,3] |
| 4. | `b[6] + b[-3]` | 'chickengoat' |
| 5. | `91 in a` | True |
| 6. | `b[4] * 2` | 'horsehorse' |
| 7. | `sorted(a[:5])` |[1,5,6,8,9]  |
| 8. | `b[3] * a[-1]` | 'catcatcat' |
| 9. | `[b[idx] for idx in range(2,10,3)]` |['dog','pig','lamb']  |
| 10. | `del a[2::2]; print(a)` |[5,6,1,61,33,0,74,36,14,3]  |

---
# Strings

They are the text that you see in a physical and digital media. It can range from human readable to machine readable forms and their main purpose is to convey information to the readers.

**Problem Statement:** How do we construct and utilize Strings in Python?

## Topics Covered

* Strings Basics
* Indexing & Slicing
* Escape Characters
* Special Operators
* String Formatting
* String Functions
* String with Conditionals and Loops

---
### String Basics

Strings are any number of characters that are enclosed within 1 or 3 double or single quotation marks and they are **immutable**. Python allows the opposing pair of quotation marks to be used within the string and treats it as literals.

**Example 1: Strings in Python**

In [None]:
str_var_01 = "a"
str_var_02 = 'a'
str_var_03 = '''blue'''
# different quotation marks are used here, the inner ones are treated as literals
str_var_04 = 

print(str_var_04)


multiline_str= '\nThis is a ' + \
               'long string enclosed with ' + \
               'single quotation marks.'
print(multiline_str)

In addition, Python has several different types of strings each denoted differently:

* the letter **f** preceeding a string means that it has formatting data in it.    
 *Example:* `f'There are {num} students in class today'`
* the letter **r** preceeding a string means that it is a Raw String. This is used in Regular Expressions which we will study later.    
 *Example:* `r'[\w.-]+@[a-z-]+(\.[a-z]{2,3}){1,2}$'`
* the letters **fr** preceeding a string means that it is a raw f-string.    
 *Example:* `fr'C:\Users\{user}\Downloads'`
* the letter **u** preceeding a string means that it is a Unicode string.    
 *Example:* `u'El Niño'`
* the letter **b** preceeding a string means that it is a byte string.    
 *Example:* `b'r\xc3\xa9sum\xc3\xa9'`
 
**Example 2: Different types of Strings in Python**

In [None]:
# string with formating
num = 23
str_var_01 = f'There are {num} students in class today'
print(type(str_var_01))
print(str_var_01)

In [None]:
# raw string
str_var_01 = r'[\w.-]+@[a-z-]+(\.[a-z]{2,3}){1,2}$'
print(type(str_var_01))
print(str_var_01)

In [None]:
# raw formated string
user = 'Tom'
str_var_01 = fr'C:\Users\{user}\Downloads'
print(type(str_var_01))
print(str_var_01)

In [None]:
# unicode string
str_var_01 = u'El Niño'
print(type(str_var_01))
print(str_var_01)

In [None]:
# byte string
str_var_01 = b'r\xc3\xa9sum\xc3\xa9'
print(type(str_var_01))
print(str_var_01.decode('utf-8'))

On top of the different string types, we also need to know how Strings are being translated into sequences of bits for the computer to read and this is done using **Character Encoding**. There are 2 main types of character encoding: ASCII and Unicode.

**ASCII**    
ASCII is common one where it consist of the lower and upper case English letters, some punctuation, symbols, whitespace and some non-printable characters. It's most famous form is the [ASCII Table](http://www.asciitable.com/). However, the ASCII character encoding is too small accommodate all the world’s set of languages, dialects, symbols, and glyphs.

**Unicode**
Unicode is an standard where it consists of several different character encodings schemes such as UTF-8, UTF-16, UTF-32 and several others. The stardard itself acts a map of characters to code points (which are distinct positive integers) therefore it is able to contain virtually every character from every language including non-printable ones. The most common scheme is the `utf-8`. The different between the schemes are the memory used to store those characters.

| ![utf.png](attachment:utf.png) |
|:---:|
| **Figure 2:** Differences between `utf-8`, `utf-16` and `utf-32` encoding schemes from [Javarevisited](https://javarevisited.blogspot.com/2015/02/difference-between-utf-8-utf-16-and-utf.html). |

When working with unicode characters, we need to use the functions `encode()` or `decode()` in order to view it.

**Example 3: Encoding and decoding unicode characters**

In [None]:
str_var_01 = u'El Niño'.encode('utf-8')
print("Encode unicode characters for storage")
print(str_var_01)

print()
str_var_02 = b'El Ni\xc3\xb1o'.decode('utf-8')
print("Decode unicode characters for reading")
print(str_var_02)

The good thing is that Python 3 uses `utf-8` but care needs to be taken when converting between the different encoding schemes as the encoding codes differs between them.

**Example 4: Difference between `utf-8` and `utf-16`**

In [None]:
letters = "αβγδ"
rawdata = letters.encode("utf-8")

print('utf-8 encoding and decoding -------------')
print(rawdata.decode("utf-8"))

print('\nutf-8 encoding but utf-16 decoding ------')
print(rawdata.decode("utf-16"))

---
### Indexing & Slicing

From the previous day and the recap, we should have an understanding of indexing and slicing for `list`. Guess what? This concept also works in strings! The operators are also the same!

| ![strings.png](attachment:strings.png) |
|:---:|
| **Figure 3:**  Strings indexing from [webucator](https://www.webucator.com/how-to/how-index-strings-python.cfm). |

And slicing works exactly the same way as it does in lists. Slicing a string returns a new string that is a substring of the original string. In essence, Python treats the `string` object like a `list` without brackets but bear in mind a `string` is not a `list`.

**Recall**    
There is a `list` function call `extend()` that extends the content of a current `list` with the content of another `list` like so

In [None]:
lst_01 = ['a', 'b', 'c']
lst_02 = [5, 8, 9]

lst_01.extend(lst_02)
print(lst_01)

What do you think would happen if instead of using 2nd `list`, we use a `string` object?

In [None]:
lst_02 = [5, 8, 9]


### Exercise
Given the following text string, what are the output?

```python
str_var = 'Peter Piper picked a pack of pickled peppers'
          '01234567890123456789012345678901234567890123'
          '          1         2         3         4   '
```

| Code | Output |
|:---|:---|
| `str_var[12:18]` | 'picked' |
| `str_var[0:20:3]` | 'PePepk ', take every third letter |
| `str_var[50]` |Error: index out of range  |
| `str_var[::-1]` | string printed in reverse |
| `str_var[6]` |'P'  |
| `str_var[-7:]` | 'peppers' |
| `str_var[-15:-8]` |'pickled'  |

In [None]:
str_var = 'Peter Piper picked a pack of pickled peppers'


---
### Escape Characters

There are times where we need to display or use certain characters like when using backslashes (`\`) for file paths or using the double quotation marks for displaying quoted text. The solution to this is to use an escape character which is denoted by the starting backslash character (`\`). 

The table below shows some of the commonly used escape characters for Python 3.

| Escape Sequence | Meaning |
|:---:|:---|
| `\\` | Backslash (`\`) |
| `\'` | Single quote (`'`) |
| `\"` | Double quote (`"`) |
| `\b` | Backspace functionality |
| `\n` | ASCII Newline |
| `\t` | ASCII Horizontal Tab |
| `\r` | ASCII Carriage Return |

**Example 5: Some escape characters usages**

---
### Special Operators

Remember the operators that we learnt in chapter 2 (PY-L02)? Some of those operators can be used on strings.

| Operators | Description | Usage |
|:---:|:---|:---|
| `+` | Concatenate 2 strings. | `'Tom' + ' Jones'` |
| `*` | Works like multiplication meaning that will multiply the string *n* number of times. | `'blah ' * 3` |
| `[n]` | Square brackets for the **index** of character in a string. | `str_var[9]` |
| `[m:n]` | Square brackets and colon for **slicing** of 1 or more characters from a string. | `str_var[9:13]` |
| `in` | Membership, checks if a certain character/string is in a string. Returns a `True` or `False` | `'one' in 'Hello everyone'` |
| `not in` | Opposite of `in`. Returns a `True` or `False` | `'world' not in 'Hello everyone'`|
| `%` | Used for legacy which we will learn in the next section. | `%` |


---
### String formatting

Think of string formatting as similar to working on those old newspaper printing press but much less hassle.

| !![printing_press.jpg](attachment:printing_press.jpg) | ![printing_plates.jpg](attachment:printing_plates.jpg) |
|:---:|:---:|
| **Figure 4:** Printing press machine. | **Figure 5:** Printing plates. |

In programming, string formatting uses placeholders to define the where to place substitutable data. This results in the messages (such as `input()` function prompts, error, warning, status messages, etc) displayed on screen for the users. String formatting is also an important part in saving of text to files.

There are 3 forms of string formatting in Python 3.
* using the `%` operator (legacy)
* using the `str.format()` function
* using `f-strings`

<br>

**String formatting using the `%` operator**     
This is the legacy style of string formatting from the C language where the `%` operator is used. As per Python documentation, it is best **not to use** this form string formatting.
> The formatting operations described here exhibit a variety of quirks that lead to a number of common errors (such as failing to display tuples and dictionaries correctly).

Although this version of string formatting is discouraged, we do need to know the mapping of some legacy datatypes shown in the table below.

| Conversion | Description | Usage |
|:---:|:---|:---|
| `d` or `i` | Signed integer decimal. | `%d` or `%i` |
| `o` | Signed octal value. | `%o` |
| `x` | Signed hexadecimal (lowercase). | `%x` |
| `X` | Signed hexadecimal (uppercase). | `%X` |
| `e` or `E` | Floating point exponential format (lowercase and uppercase). Default to 6 decimal places. | `%.2e` |
| `f` or `F` | Floating point decimal format. Default to 6 decimal places. | `%.2f` |
| `s` | String. | `%s` |

**Example 6: Legacy string formatting**

In [2]:
print('%.2e' %1e4)
print('%.2f' %5.658594923)

1.00e+04
5.66


**String formatting using the `str.format()` function**     

This is the 2nd way strings can be formatted. It uses the curly brackets `{}` with either a variable name or a number (index) as a placeholder then followed by `.format()` with it's arguments at the end of the string. This style is **backward compatible up tp Python 2.6**.

The drawback of this method is that for strings with a lot of placeholders, arguments in the `.format()` portion can get long and cumbersome to keep track of.

**Example 7: String formatting with `str.format()`**

In [9]:
name = 'John'
age = 54
person = {'name':'Tom', 'age':85}

# method 1: first come, first served replacement
print("{},{}".format(5,4))

# method 2: replacement via index referencing
# 'name' is index 0 and 'age' is index 1
print('name {1}, age {0}'.format(85, 'Tom'))

# method 3: replacement via variable names
print('name {name}, age {age}'.format(name=name, age=age))

# method 4: replacement via dictionary (2 ways)
print('name {name}'.format(name=person['name']))

5,4
name Tom, age 85
name John, age 54
name Tom


**String formatting using `f-strings`**      

This style was mentioned briefly at the start of the chapter where the letter `f` is placed at the start of a string. This style is also the preferred way of string formatting for **Python 3.6 onwards**. It uses a similar style compared to the previous `str.format()` function (it also uses the curly brackets `{}`) but it also extends the functionality by allowing expressions to be evaluated within its curly brackets.

The general syntax is dependent on what is being displayed:

1. `{<expression>}` - evaluates the expression before displaying the returned value. The expression can be a single variable, a function or equation.
2. `{<expression>:010.4f}` - this has 2 parts, namely the evaluation of the expression and the style of formatting the result. The 2 parts are separated by the colon (`:`) operator. In this case, the style of the result is a 10 digit floating point number that is padded with leading zeroes and has 4 decimal places. The dot (`.`) of the floating point number is included in the 10 digit count.

**Example 8: String formatting with `f-strings`**

In [16]:
# variables
var_01 = 5
var_02 = 10000000
var_03 = 7
dash = '-'
var_05 = 86.156749214

print(f'{dash*(var_01*var_03)}')   # for printing dividers
print(f'Scientific notation: {var_02:.3e}')
print(f'{dash*(var_01*var_03)}')   # for printing dividers

print(f'Zero padded number: {var_01:04}')
print(f'{dash*(var_01*var_03)}')   # for printing dividers

print(f'4 Decimal places number: {var_05*var_01:010.4f}')
print(f'{dash*(var_01*var_03)}')   # for printing dividers

print(f'This could be a very very very ' + \
      f'long string')

-----------------------------------
Scientific notation: 1.000e+07
-----------------------------------
Zero padded number: 0005
-----------------------------------
4 Decimal places number: 00430.7837
-----------------------------------
This could be a very very very long string


### Exercise

Given the following variables below:

```python
int_var = 30
str_var_01 = 'I scream, you scream, we all scream for ice cream'
str_var_02 = 'Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. Fuzzy Wuzzy wasn’t fuzzy, was he?'
dash = '-'
```

Produce the following outout:

<pre>
------------------------------
Fuzzy Wuzzy screams for ice cream
Fuzzy Wuzzy had no hair cream
Was there 00030 ice creams or 045.0 bears?
------------------------------
</pre>

In [41]:
int_var = 30
str_var_01 = 'I scream, you scream, we all scream for ice cream'
str_var_02 = 'Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. Fuzzy Wuzzy wasn’t fuzzy, was he?'
dash = '-'

print(f"{dash*30}")
print(f"{str_var_02[:11]+' '+str_var_01[29:35]+'s'+str_var_01[35:]}")
print(f"{str_var_02[:11]}{str_var_02[35:47]}{str_var_01[43:]}")
print(f"{str_var_02[:11]}{str_var_01[2:8]}s {str_var_01[-13:]}")
print(f'{str_var_02[24:47]} {str_var_01[-5:]}')
print(f'Was there {int_var:05} {str_var_01[-9:]}s or {int_var*1.5:05.1f} {str_var_02[18:22]}')

------------------------------
Fuzzy Wuzzy screams for ice cream
Fuzzy Wuzzy had no hair cream
Fuzzy Wuzzyscreams for ice cream
Fuzzy Wuzzy had no hair cream
Was there 00030 ice creams or 045.0 bear


---
### String Functions

As we have learnt from the section on *Standard Datatypes* in Python chapter 2, `Strings` are objects in Python therefore they have their own attributes and behaviours. The Python [documentation](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) has a long lists of functions available for strings but we will be looking some of the most commonly used.

1. **`len()`** - this is a built-in function that not part of the string object but it is widely used to get the length of iterable objects including strings. Returns an `int` for the length of the string.

 **Example of `len()` usage**

In [None]:
str_var = 'hello foo bar'
print(len(str_var))

2. **`split()`** - this is used for spliting a string based on a given delimiter and the maximum number of splits. A delimiter is a sequence of one or more characters used for specifying the boundary between separate, independent regions in plain text. CSV (Comma Separated Values) files is the most common file type that uses delimiters to separate independent data. Returns a `list` of separated strings.

 **Example of `split()` usage**

In [None]:
sentence = "The, quick, brown, fox, jumps, over, the, fence"
# delimiter is a whitespace character and comma
print(sentence.split(", "))  

3. **`join()`** - this function concatenates a `List` of strings with the given separator. This separator must also be of type `String`. Returns a `string`. **Note** that the `.join()` function is applied to separator and not the list of strings.

 **Example of `join()` usage**

In [None]:
str_list = ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'fence']
# separator is a whitespace character
print(" ".join(str_list))

4. **`replace()`** - this function replaces *n* number of occurrences of an old string with a new string. By default all occurrences are replaced unless a maximum count is defined. Returns a `string`. This function works exactly like the "Find and Replace" function in any Word Processing software.

 **Example of `replace()` usage**

In [None]:
sentence = "The quick brown fox jumps over the fence"

# replace the colour of the fox
print(sentence.replace("brown", "red"))

# replace the first 3 whitespace in the string with dashes
print(sentence.replace(" ", "-", 3))

5. **`strip()`** or **`rstrip()`** or **`lstrip()`**- this family of functions strips the unwanted characters from both ends or from right the end or from the left end of a string, respectively. These functions are used mainly to remove characters used for "padding" a string. Returns a `string`.

 **Example of `strip()`, `rstrip()` and `lstrip()` usage**

In [None]:
str_val = '!!!!!!boo!!!!!!!'
print(str_val.strip('!'))
print(str_val.rstrip('!'))
print(str_val.lstrip('!'))

6. **`find()`** or **`index()`** - these 2 functions have the same functionalities in that they are used to find the starting index of a given character or substring within the whole search string or a defined part of the search string but their return result differs when the given character or substring is not found within the search string.

 `find()` will return a `-1` when the string is not found whereas `index()` will raise a `ValueError` exception.

 Each of the `find()` and `index()` functions have their related `rfind()` and `rindex()` functions. Note that there is **no function for searching in the left direction** as the normal `find()` and `index()` already starts from the left.
 
 **Important Note**: Only use these methods if you need to find the index of the substring not to search if substring is part of the text. Use the `in` operator for that purpose.
 
 **Example of `find()` and `index()` usage**

In [None]:
quote = "Now cracks a noble heart. Good-night, sweet prince;" + \
        "And flights of angels sing thee to thy rest."

print(quote.find("noble", 0, 30))
print(quote.rindex("angels"))

7. **`lower()`** and **`upper()`** - these functions are used to either change a string to fully lower case or upper case characters. These functions are especially useful for string comparisons between externally sourced string data and internally string data used for processing. 

 **Example of `lower()` and `upper()` usage**

In [None]:
str_val = 'BanKing InDustrY'

print(str_val.lower())
print(str_val.upper())

### Exercise

Given the following variable below:

```python
sentence = "The quick brown fox jumps over the fence"
```

Produce the following output:
<pre>
The *-* slow *-* brown *-* FOX *-* climbs *-* under *-* the *-* fence
</pre>

In [44]:
sentence = "The quick brown fox jumps over the fence"
sentence.replace('quick','slow')
sentence.replace('jumps over', 'climbs under')
split_sen = sentence.split()
split_sen[3] = split_sen[3].upper()
print(' *-* '.join(split_sen))


The *-* quick *-* brown *-* FOX *-* jumps *-* over *-* the *-* fence


---
### String with Conditionals and Loops

Now that we have gain an understanding of strings and what can be done with them, we are now going to combine them with conditionals and looping statements.

Strings are often used in `if` statements to validate input from the user, the terminal or from externally sourced data. The operators most often used are the comparison equality (`==`) and the membership `in` & `not in` operators.

**Example 9: Strings with the `if` statements**

In [45]:
def get_user_input():
    '''
    Function to check for user input
    '''
    list_of_words = ['actor','courage','revenue','platform','childhood','promotion','drama','effort','homework',
                'patience','storage','discussion','pollution','session','requirement','lab','client','definition',
                'organization','region','instance','confusion','exam','quantity','association','argument','opportunity',
                'ability','virus','mood']
    
    print(f'List of words:\n{", ".join(list_of_words)}\n')
    user_word = input('Enter a word from the list of words: ')
    
    if user_word.lower() in list_of_words:
        print('Open Sesame')
    else:
        print("The genie will not like you.")

get_user_input()

List of words:
actor, courage, revenue, platform, childhood, promotion, drama, effort, homework, patience, storage, discussion, pollution, session, requirement, lab, client, definition, organization, region, instance, confusion, exam, quantity, association, argument, opportunity, ability, virus, mood

Enter a word from the list of words: happy
The genie will not like you.


**Example 10: Strings with the looping statements**

In [46]:
def count_str(substr, text, delim=' '):
    '''
    Function to count the number of occurrences of a particular substring
    Input:
        substr - substring to count
        text - text from which the substring is to originate from
        delim - delimiter used in the text
    Return:
        integer count of the substring
    '''
    cnt = 0
    s_text = text.split(delim)
    for word in s_text:
        if substr in word: # Use 'in' and not '==' because sea is a substring
            cnt+=1
            
    return cnt
    

text = '''She sells seashells on the seashore.
The shells she sells are seashells, I’m sure.
And if she sells seashells on the seashore,
Then I’m sure she sells seashore shells.'''

print(text)
print()
print(f'The number of occurrences of the word "sea" is {count_str("sea", text)}')
print(f'The number of occurrences of the word "sells" is {count_str("sells", text)}')


She sells seashells on the seashore.
The shells she sells are seashells, I’m sure.
And if she sells seashells on the seashore,
Then I’m sure she sells seashore shells.

The number of occurrences of the word "sea" is 6
The number of occurrences of the word "sells" is 4
