# Lesson 3: Errors, strings and readable code

## Recall from last week

### Arithmetic
We can do basic mathematical operations in python with `+`, `-`, `*`, `/`, `**`, `//` and `%`.

We can call other mathematical functions and operations from the `math` module, e.g. `math.sqrt(x)`.

### Flow control: _if_-statements
We can control the flow of a program by conditioning it into doing certain things.

These conditions are written as boolean expressions with a truth value.

Such expressions can be combined into more complex expressions with `and`, `or` and `not`.

# Errors and how to handle them

```
  File "<ipython-input-1-7a8a49ad5eea>", line 11
    for item in my_list
                      ^
SyntaxError: invalid syntax
```

## Syntax errors

Syntax errors are things in the code which Python does not understand.

They are thrown before the code is executed.

These are often caused by e.g.:

- wrong use of operators
- missing or superfluous brackets, e.g. `print('a string'` or `print('a string))'`
- wrong indentations
- stray symbols from deleted code
- missing symbols

In [1]:
for i in range(10)
    print(i)

SyntaxError: invalid syntax (<ipython-input-1-7a8a49ad5eea>, line 1)

## Runtime errors

Runtime errors occur when Python tries to do something which it cannot do/is not allowed to do.

They are thrown during execution of the code.

There are numerous kinds of runtime errors, depending on the thing that goes wrong.

An example is `TypeError` which occurs when the type of a variable's value does not allow for a certain operation.

In [2]:
first_number = '23'
second_number = 54
print(first_number + second_number)

TypeError: must be str, not int

In [3]:
for i in second_number:
    print(i)

TypeError: 'int' object is not iterable

## What to do?

Look at the error message and see if you can work out what is going wrong.

Python gives three hints: line number, error type and error message.

The internet often has answers, but it can also provide over-technical answers which can be confusing.

# Strings

## The basics

Strings are created as
- `'a string'`
- `"another string"`
- ```"""a string
which spans multiple lines"""```

They can be **concatenated** (= glued together) with `+`, e.g. `'monty' + ' ' + 'python'` -> `'monty python'`

They can also be **multiplicated** with `*`, e.g. `'-' * 5` -> `'-----'`

## What is a string ***really***?

A string is basically a **list** of characters (though there are some differences)

Therefore, we can do a lot of things with strings which we can also do with lists:

### The characters of a string have indices

This means we can point to certain characters:

In [4]:
# indices start at 0!
# Why? Because that how computer scientists like it
a_string = 'hello'
print('the character with index [0]:', a_string[0])
print('the character with index [2]:', a_string[2])

the character with index [0]: h
the character with index [2]: l


In [5]:
# we can also index from the end of the string
# here it begins with -1 (can someone say why?)
another_string = 'how are you?'
print('the character with index [-1]:', another_string[-1])
print('the character with index [-6]:', another_string[-6])

the character with index [-1]: ?
the character with index [-6]: e


### Getting **slices** of strings

We can point to the endpoints of a certain section of a string:

In [6]:
a_new_string = 'we are doing fine'
print('the slice with indices [3:8]:', a_new_string[3:8])

the slice with indices [3:8]: are d


In [7]:
# we can leave out an endpoint
print('the slice with indices [:9]:', a_new_string[:9])
print('the slice with indices [5:]:', a_new_string[5:])

the slice with indices [:9]: we are do
the slice with indices [5:]: e doing fine


Notice which characters are included and which are not. What is going on?

Does this remind you of something?

### The _in_ operator

This was mentioned last time. But we didn't do much with it. Let's see how it works!

In [8]:
# if the character occurs in the string, we get True
'a' in 'a string'

True

In [9]:
# it can also be sequences of characters
'es' in 'test'

True

In [10]:
# the actual sequence matters
'tt' in 'two unvoiced alveolar stops'

False

### Iterating over a string

Just like we can do it with lists, we can iterate over a string:

In [11]:
for char in a_string:
    print(char)

h
e
l
l
o


In [12]:
# what's going on here?
for i in range(3):
    print(a_string[i])

h
e
l


## The **methods** in a string

A string is a co-called **object** with internal **methods**.

These methods can be accessed with the **dot-operator** `.`:

In [1]:
a_capitalized_string = 'I AM YELLING!'
a_capitalized_string.lower()

'i am yelling!'

What happens is that, from within the string itself, a **new** string is created and **returned**. The string is not changed internally.

What does this mean?

In [14]:
# the value of the previous variable is still the orginal value
a_capitalized_string

'I AM YELLING!'

In [15]:
# we can save the returned string in another variable
a_decapitalized_string = a_capitalized_string.lower()
a_decapitalized_string

'i am yelling!'

In [16]:
# still, this does not change the internal state of the original string
a_capitalized_string

'I AM YELLING!'

The methods are many! They can return different kinds of values. Some even take an input.

In [17]:
# tells whether the string is made up solely of alphabetic characters
a_string.isalpha()

True

In [18]:
# find the first occurence of a character
# the return value is the index of the character
a_string.find('e')

1

In [19]:
a_string.endswith('o')

True

# Readable code

Any good guesses on what's going on here?

In [20]:
from string import punctuation

var = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."

print(
    sum(
        len(w) for w in ''.join(
            c for c in var if c not in punctuation
        ).split()
    )
    / len(var.split())
)

5.21978021978022


What about now?

In [21]:
from string import punctuation

# this short program prints the mean number of characters per word

# load some text
text = var # we'll use the text from before

# clean the text from punctuation
clean_text = ''
for char in text:
    # add the char only if it's not punctuation
    if char not in punctuation:
        clean_text += char

# make the text into a list
tokens = clean_text.split()

# calculate total number of characters
n_chars = 0
for token in tokens:
    n_chars += len(token)
    
# the number of words
n_words = len(tokens)

# report mean number of characters per word
print('Mean length of words in the text:', round(n_chars / n_words, 2))

Mean length of words in the text: 5.22


## Exercise 1, ch. 6

Compare your code from this exercise.

First discuss how it can be done.

Then, discuss how it can be made clear what is going on.

In [None]:
user_string = input('Please, provide a string: ')

# we use len() to retrieve the number of characters
print('The total number of characters in the string:', len(user_string))

# we can multiplicate strings
print('The string repeated 10 times:')
print(user_string * 10)

# retrieve the char with index 0
print('The first character of the string:', user_string[0])

# use slicing to get the first three chars of the string
print('The first three characters of the string:', user_string[:3])

# and the last three
print('The last three characters of the string:', user_string[-3:])

# loop over the string backwards by backwards indexing with a decreasing range
print('The string backwards: ', end='')
for i in range(-1, -(len(user_string) + 1), -1):
    print(user_string[i], end='')
print()  # just for newline

# or we can cheat
print('The string backwards (by cheating):', ''.join(reversed(user_string)))

# check if there are seven characters. If so, print it; else tell it
if len(user_string) >= 7:
    print('The seventh character of the string: ', user_string[6])
else:
    print('The string is not long enough for me to print the seventh character.')

# the string with first and last character removed - using slicing
print('The string with its first and last characters removed: ', user_string[1:-1])

# we use methods to make the string into all caps
print('The string in all caps:', user_string.upper())

# use a for loop to build the string
a_instead_of_e = ''
for char in user_string:
    if char == 'a':  # switch any a's
        a_instead_of_e += 'e'
    else:  # if not an a, just put the original char on the string
        a_instead_of_e += char
print('The string with all a\'s replaced with e\'s:', a_instead_of_e)