# Module 2, Part 2: Python Strings and Lists

This module consists of 3 parts:

- **Part 1** - Introduction to Python.
- **Part 2** - Python Strings and Lists.
- **Part 3** - Python Tuples, Dictionaries, Reading data from a file, Formatting print output.

This is **Part 2** of the module's notebooks.

## Strings

A __string__ is a sequence of characters. This type is called __str__. In Python, a string is enclosed with a single (') or double (") quotes. Also, in Python, triple quotes (''') can be used to enclose strings that span multiple lines. Strings can contain letters, numbers or special characters. They can also be empty, e.g. `""`, or contain a space character `" "`. A string is an ordered sequence of characters. Individual characters in a string can be accessed using the character's index with the first character having an index value of 0.

The string characters can be accessed with a negative index. Negative indices count from the right-hand side with the last character at index -1, the second last at index -2 and so on. 

In [1]:
s1 = "Python is the Number 1 choice for social media hacking"
print(s1[0])
print(s1[1])
s1[-1]


P
y


'g'

Python provides many useful methods (or built-in functions) to work with strings. One of them is `len()`, which returns the number of characters in a string.

In [2]:
len(s1)

54

### Slicing.

Slice is a segment of a string from the start index up to but not including the end index.

In [3]:
s1[2:6]

'thon'

The end index can be omitted in which case it defaults to `len(str)`. For example, `s1[2:]` is equivalent to `s1[2:len(s1)]` substring from index 2 to the end of string. Similarly, if the start index is omitted the string will be sliced from index 0.

In [None]:
s1[:2]

In Python, strings are __immutable__ - once created, strings cannot be changed. Instead, a new string must be created, which may contain a slice or slices of the existing string. For example, the string `'get'` cannot be changed to `'got'`. The `"I learn Python"` cannot be changed to `"I learnt Python"` by an in-place change.

In [4]:
s2 = "get"
s2[1] = "o"

TypeError: 'str' object does not support item assignment

In [5]:
s2.replace('e', 'o')

'''Note, that this does not change the s2 variable.
Instead, the new object is returned,
which can be assigned to new variable.''' 

'Note, that this does not change the s2 variable.\nInstead, the new object is returned,\nwhich can be assigned to new variable.'

In [6]:
s3 = s2[0] + "o" + s2[2]
print(s3)

'''Note that s2 was not modified'''

print(s2)

'''Similarly'''
learn_str = "I learn Python"
learn_str[:7] + "t" + " " + learn_str[8:]

got
get


'I learnt Python'

### String operators

In the last example, the '+' operator was used. This operator, when used between two strings, __concatenates__ them: `"abc123" + "def456'` returns `"abc123def456"`. Also, a string can be multiplied by an integer, `str * int` will concatenate `int` number of copies of `str`. But a string cannot be multiplied by another string. All other mathematical operators will also result in a `TypeError`.

Strings can be __compared__ using the equality (`==`) and inequality (`!=`) operators. Operators `'>'` and `'<'` compare two strings by alphabetical or dictionary order, not by length. `'a' < 'b'` returns `True` and `'aaaaa' < 'bbb'` is also `True`. The capital letters are "less" than lowercase letters.

The operator `in` returns `True` if a substring appears anywhere inside a string.


In [7]:
print('a'<'A')
print('b' > 'aaaaa')
'on' in 'Python'

False
True


True

We have seen method `len()` above. To find other methods for string, the function `dir(str)` can be used. This function will output a list of all methods available:

In [8]:
'''print() function is used for a better printout formatting'''

print(dir(str))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


To learn more about a particular function, the `help` function can be used or ? typed before the name of the function.  

For example, to find information about the `find` method, one can call the `help()` function: `help(str.find)` or type `?str.find`. Here, `str` indicates that we are looking for the method applicable to string objects.

In [9]:
help(str.find)

Help on method_descriptor:

find(...)
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.



__NOTE:__ Square brackets around the description of the function indicate optional arguments.  In the description of the `find` method, the start and end indices are optional. Moreover, this function can be called with the start index only, omitting the end index, as indicated by the second pair of square brackets.

### Traversing a string.

Often we need to access all the items in a string exactly once each.
The `for` loop allows for compact traversal structures. In the example below, `for` loop selects each character in string `s1` and executes the body of the loop for each character.

In [10]:
'''Traversing a string'''

for item in s1:
    if item in 'aeiouAEIOU':
        print(item)

o
i
e
u
e
o
i
e
o
o
i
a
e
i
a
a
i


In [None]:
help(str.split)

### **EXERCISE 2:** Index of the second occurrence.

The method `find` can be used in an expression like `s1.find(s2)` to find the index of the first occurrence of the substring `s2` in the string `s1`.  

This method also takes an optional start argument, and the expression `s1.find(s2, start)` will return the first occurrence of the `s2` substring after the start index.

Write an expression to produce the second occurrence of the `s2` substring in the `s1` string. 

In [None]:
# Type your code here

In [11]:
s1

'Python is the Number 1 choice for social media hacking'

s2

In [14]:
s2 = 'ia'

In [15]:
s1.find(s2)

37

In [17]:
s1.find(s2,s1.find(s2)+1)

44

## Lists

The last method mentioned in the previous subsections introduced list. **List** is a sequence of values similar to a string. If a string is a sequence of characters, list is a collection of values of any type. The values in a list are called __elements__ or __items__. Lists are versatile, and are sometimes called the "workhorses" of Python. 

Lists can even contain other lists as an element thereby nesting lists. The easiest way to create a list is to enclose the sequence of values in square brackets `[  ]`. Just like strings, lists can be empty. Working with lists is similar to working with strings. For example, method `len()` will return the length of the list. Indexing and slicing of a list is also very similar to strings.

In [21]:
'''List examples'''

empty_list = []
list_of_numbers = [1, 2.3, 4.5, 6]
list_of_strings = [""]

In [18]:
'''A string can be split into a list with `split` method.
For the string 's1' a space (" ") can be used as a separator.'''

s1.split(" ")

['Python',
 'is',
 'the',
 'Number',
 '1',
 'choice',
 'for',
 'social',
 'media',
 'hacking']

In [20]:
'''Please uncomment the line below and execute this command to see a list
of all methods available for a list object.'''

dir(list)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

Unlike strings, lists are __mutable__. Any item in a list can be changed; see example below: 

In [22]:
list_of_numbers[2] = 100
list_of_numbers

[1, 2.3, 100, 6]

Python provides several methods that modify lists:    

|__method__     |__Description__        |
|:--- |:---|
| list.append(item)    | Append an _item_ to the end of the list.                                                           |
| list.extend(list_1)    | Append the _items_ in the `list_1` to the list.                                       |
| list.pop([index])    | Remove the item with that _index_ from the list or at the end of the list if an index is not given.|
| list.remove(item)    | Remove the first occurrence of the item.                                                     |
| list.reverse( )      | Reverse the list.                                                                           |
| list.insert(int, item)| Insert an item at the given index, following items are shifted to the right. |

In [23]:
list1 = [1,2,3,4]
list1.append(5)
list1.sort(reverse=True)
list1


[5, 4, 3, 2, 1]

In [24]:
list1.remove(3)

In [25]:
list1

[5, 4, 2, 1]

In [26]:
list2 = [11, 12]
list1.extend(list2)
list1

[5, 4, 2, 1, 11, 12]

### Traversing a list

The `for` loop can be used to access each item in a list, one at a time.  

The structure of the code:

    for elem in list_A:     
        ...

Also, one can traverse over a list using indices. This can be done with two built-in functions: `len()` and `range()`.  

The documentation for `range()` states that the syntax for the function is as follows (Python Software Foundation, 2018):

    range([start,] stop[, step])
    
The `range()` returns a virtual sequence of numbers from `start` to `stop` with the `step`.

Thus, the `range()` function is useful for iteration over a sequence of numbers. The start and steps are optional, the function call with only one parameter will return a sequence from 0 up to but not including the stop value.
* `range(5)` returns the sequence 0, 1, 2, 3, and 4;
* `range(1,4)` returns 1, 2, 3;
* `range(1, 10, 3)` returns 1, 4, 7.

Combining `range()` and `len()` functions we can build a structure looping over a list by indices:

    for i in range(len(list)):     
        ...

### __EXERCISE 3:__  Lists

1). Generate a list of integers from 0 to 5, reverse the list and print it out. Explain how you reversed the list.     
**Hint:** the `append()` function will help you to generate the list by adding one integer to the list at a time

In [None]:
# Type your code here

In [63]:
list = []
for i in range(6):
    list.append(i)
    
list    
    

[0, 1, 2, 3, 4, 5]

In [64]:

for i in range(len(list)):
    list.append(5-i)
    
list[6:]

[5, 4, 3, 2, 1, 0]

2). Generate a list of strings. Add the items from the list of strings to the list of integers from part 1.
Explain your choice of string method. 

In [None]:
# Type your code here

### List Comprehension

**List comprehension** in Python provides an easy and elegant way of creating a new list. A common application of list comprehension is to create a new list based on an existing sequence(s), e.g. one or more lists. While creating a new list with list comprehension, we might want to include only certain elements that satisfy a certain condition, or transform elements of an original list using an operation, or a calculation, applied to these elements.

For example, we might have a list of numbers, `nums`. We need to create a new list, let's call it `squares`, where each element is a square of the corresponding element from the list `nums`. We can write a `for` loop which will look as follows:

In [None]:
nums = [0, 1, 2, 3, 4, 5, 6, 7]
squares = []
for x in nums:
    squares.append(x ** 2)
print(squares) 

However, a more simple and elegant way would be to write the same loop in one line of code using list comprehension:

In [None]:
squares = [x ** 2 for x in nums]
print(squares)

In [None]:
'''Another example: create a new list from 2 lists,
where condition is - use only those numbers that are common in both lists, listA and listB'''

listA = [15, 35, 76, 83, 910, 1234]
listB = [1234, 234, 83, 3, 4, 5]

new_list = []

for a in listA:
    for b in listB:
        if a == b: 
            new_list.append(a)

print(new_list) 

In [None]:
'''The same result achieved using list comprehension:'''

[a for a in listA for b in listB if a == b]

In [None]:
'''Another example:
return a list of doubled numbers only if the number is an odd number'''

[n * 2 for n in listA if n % 2 == 1]

The general structure of a list comprehension can be written as follows:

    [<output expression> <loop expression <input expression>> <optional predicate expression>]
    
A list comprehension is always enclosed in brackets. It starts with an expression followed by a `for` expression, then zero or more `for` or `if` clauses. 

The list comprehension can return a list of pairs, or tuples. We will learn about tuples in the next section of this module. For now, and to wrap up the discussion about the list comprehension, here is how to create a list of numbers paired with their square:

In [None]:
[(x, x**2) for x in range(6)]

---

__End of Part 2.__


This notebook makes up one part of this module. Now that you have completed this part, please proceed to the last notebook in this module.

---


__References__


McKinney, W. (2017). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (pp. 15-50). O'Reilly Media.

Python Software Foundation, (2018). Built-in Functions. Retrieved from (https://docs.python.org/3/library/functions.html#func-range).