## 5.2 String Methods

Strings and numbers can be thought of as **objects**, "things you can do stuff with". In Python language each object has a set of **methods/functions** attached to it. If objects can be thought of as **nouns**, then methods/functions serve as **verbs**, they are the tools that operate on (do something with) these objects. 

In general the methods (or functions) appear in these forms:
- `function(object,argument)`
- `object.method(arguments)`
    
In the example below, we applied the `len()` function to measure the number of characters in a string; the `.lower()` methods lowercases all characters. 

Both are called **fruitful** functions, as they return something (i.e. a number and a string respectively)

In [None]:
print('HELLO'.lower())
print(len('HELLO'))

For sure the methods can also be applied to variables:

In [None]:
word = 'HELLOOOOOO'
print(word.lower())
print(len(word))

Python comes with many useful **tools for text processing**. You can list and inspect them with `dir()` or `help()` functions.

In [None]:
book = 'Pride and Prejudice' # Let's pretend we stored a whole book in this variable

`dir()` shows all the methods you can apply to the string variable `book`. Please scroll down. You can ignore the elements starting with double underscores.

In [None]:
dir(book)

All these methods allows you to do things with strings. Some of the most useful methods are
- `split()`
- `lower()`
- `len()`
- `find()`

## .split()

#### --Exercise--

Inspect the example below, and figure out how the `split()` method works.

In [None]:
print('1,2,3,4,5'.split(','))
print('Hello how are you today?'.split(' '))

`.split()` converts a string of characters to a **list of words** (approximately, we come back to lists later on in this course): it **returns** a list of items seperated by the delimiter (the split character). 

Let's have a closer look at the output of this method in the exercise below. As in the code cells above, we can save the output of a `.split()` in a new variable. 

In [None]:
csv = '1,2,3,4,5'
numbers = csv.split(',')
print(numbers)

#### --Exercise--

- `split()` the sentence variable with white space
- assign the output to a new variable `words`
- get the last items of the `words` list using index notation.

In [None]:
sentence = "Alice was beginning to get very tired of sitting by her sister on the bank."
# Insert your code here

#### --Exercise--

For more information, print the Python **documentation** on the `.split()` method using the `help` function.

In [None]:
# search for help here
name = "Kaspar"
help(name.split)
# or
help(str.split)

## .lower()

#### --Exercise--

Experiment with the `lower()` function. 
- Create a string variable;
- Pass the lowercased variable to another one;
- Print the lowercased and the original variable.

In [None]:
# Experiment with lower
# Declare a string variable with capitals



# Look for documentation on `lower`


# Apply lower to the variable AND assign the lowercased string to a new variable



# print the variables before and after applying the lower method



#### --Exercise--

- Lowercase the sentence
- Split by the character `a`

In [None]:
sentence = "Alice was beginning to get very tired of sitting by her sister on the bank."
# Insert code here

## .find()

Run the cell below to understand what the `.find()` method does.

In [1]:
help(str.find)

Help on method_descriptor:

find(...)
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.



#### --Exercise--

Find the position of the first 'e' in the title "Naturkatastrophenkonzert".

In [None]:
title = 'Naturkatastrophenkonzert'
# use the find() method here
title.find('e')

#### --Exercise--

The code cell below downloads [Romeo and Juliet](http://www.gutenberg.org/cache/epub/1777/pg1777.txt) from the Gutenberg Project.

In [2]:
import requests # run but ignore these lines
randj = requests.get('http://www.gutenberg.org/cache/epub/1777/pg1777.txt').text

`randj` contains the full text of Romeo and Juliet, you can inspect the variable by printing the first hundred characters.

#### --Exercise--

Print the first hundred character of Romeo and Juliet.

In [None]:
# Insert code here

Find the **first** occurence of the word **`love`** in Shakespeare's Rome and Juliet. 

**HINT**: Do not forget to first lowercase all words!

In [5]:
# Insert code here
first_love = randj.find('love')

You can print the context around `first_love` using the [index](https://www.oreilly.com/learning/how-do-i-use-the-slice-notation-in-python) notation. (Please follow link for more information.)

In [6]:
context_size = 50 # the number of character around the word
start_at = first_love-context_size # indicate the starting position
stop_at = first_love+context_size+len('love') # indicate where to stop
print('Start printing at character with position=',start_at)
print('Stop printing at character with position=',stop_at)
print('\n')
print(randj[start_at:stop_at]) # print with context

Start printing at character with position= 10705
Stop printing at character with position= 10809


ins of these two foes
    A pair of star-cross'd lovers take their life;
    Whose misadventur'd piteo


#### \*\*\*--Exercise--

- Can you find the **second** occurence of **"love"** in this play? And print the context?
- Can you print the second occurence of love with 50 characters context?

HINT: Inspect the `help()` function. Reuse information from the above code cells (`first_love`).
HINT II: Use slicing to print the local context of a word.

In [8]:
# add and copy-pcontext_size = 50 # the number of character around the word
randj_cut = randj[first_love+len('love'):]
second_love = randj_cut.find('love')
start_at = second_love-context_size # indicate the starting position
stop_at = second_love+context_size+len('love') # indicate where to stop
print('Start printing at character with position=',start_at)
print('Stop printing at character with position=',stop_at)
print('\n')
print(randj_cut[start_at:stop_at]) # print with contextaste your code here

Start printing at character with position= 116
Stop printing at character with position= 220


e.
    The fearful passage of their death-mark'd love,
    And the continuance of their parents' rage,


In [16]:
randj.find('love',first_love+4)

10925

# Recap

- Variables are boxes in which you can store information.
- Variables can be of a different type: Text (strings) or Numbers (Integers).
- Methods/Function allow you to manipulate the content of these boxes (e.g. `.lower()`)

In [None]:
# Experiment a bit here

## len()

`len()` counts the number of elements the argument contains. If you pass a string as an argument, it counts the number characters.

Note: the syntax is slighly different here (for reasons that fall outside the scope of this course.)

In [None]:
word = 'supercalifragilisticexpialidocious'
print(len(word))

In [None]:
# How many characters does your full name contain?

#### --Exercise--

How many character and words does Romeo and Juliet contain (approximately)? 
> HINT: Use `split()` and `len()` in combination.

In [None]:
# download Romeo and Juliet from Gutenberg
import requests
randj = requests.get('http://www.gutenberg.org/cache/epub/1777/pg1777.txt').text
# add your code here

### --Exercise--

Can you find other useful string methods?

In [None]:
# if yes, play with them here!