# Lab 3: Strings

## Overview

Welcome to your third lab! The goal for today is to familiarize you with strings (or more precisely in python, `str`). Manipulating textual data is a frequent operation in day-to-day proramming &mdash; even more so for us in NLP.

As usual, you will have to submit two exercises. You will find them and the submission instructions at the end of this notebook.

## What's in a string?

Strings are inherently an ordered sequence of characters. Which is why you can iterate over a string using a for-loop, or retrieve specific characters using a slice. 

Before executing the next cell, have a guess at what the output will produce!

In [27]:
from twisted.python.util import println

str_a = "I am a chimp. I love peanuts. I have a neat, though slightly old, typewriter."
for char in str_a:
    print(char)
    
print(str_a[2:-2:2])

I
 
a
m
 
a
 
c
h
i
m
p
.
 
I
 
l
o
v
e
 
p
e
a
n
u
t
s
.
 
I
 
h
a
v
e
 
a
 
n
e
a
t
,
 
t
h
o
u
g
h
 
s
l
i
g
h
t
l
y
 
o
l
d
,
 
t
y
p
e
w
r
i
t
e
r
.
a  hm.Ilv ent.Ihv  et huhsihl l,tpwie


String objects in python actually accept a number of operators you might not expect!

 - `str_a + str_b` denotes the concatenation of the two strings `str_a` and `str_b`.
 - `str_a * i` corresponds to the concatenation of `i` copies of the string `str_a`.
 - `str_a % value` is the basic syntax for string formatting, which we've briefly covered in the first lab.

### Exercise #1: Again and again

Implement two functions, `self_mul(s, n=3)` and `self_add(s, n=3)`, that both take a `str` as argument, and return it concatenated `n` times. The first may only use the multiplication operator `*`, whereas the latter may only use the addition operator `+`.

For instance:

```
>>> self_add("Figaro! ", n=3)
'Figaro! Figaro! Figaro! '
>>> self_mul("One! Two! ", n=2)
'One! Two! One! Two! '
>>> self_add("whatever", n=42) == self_mul("whatever", n=42)
True
```

In [28]:
def self_add(s, n=3):
    t = ""
    while n >= 1:
        t = t + s
        n -= 1
    return t

def self_mul(s, n=3):
    return s*n

In [29]:
print(self_add("Figaro! ", n=3))
print(self_mul("One! Two! ", n=2))
print(self_add("whatever", n=42) == self_mul("whatever", n=42))

Figaro! Figaro! Figaro! 
One! Two! One! Two! 
True


## Formatting strings

One of the neatest features of `str` values in python is the ability to format them: embed the value of some other variable within the string itself.

There is a very complete mini-language regarding string formatting, which you can read [here](https://docs.python.org/3/library/string.html#formatstrings)

In short, there are three main ways of formatting strings:

- Using the modulo operator `%`:

In [30]:
name = 'James'
'Hello %s, welcome to my evil lair!' % name

'Hello James, welcome to my evil lair!'

 - Using the `str.format()` method:

In [31]:
'My name is {1}, {0} {1}'.format('James', 'Bond')

'My name is Bond, James Bond'

- Referring to existing variables within a format string `f"..."`:

In [32]:
age = 21

# compare this:
print('I am {age} year old')
# with that:
print(f'I am {age} year old')

I am {age} year old
I am 21 year old


Note that this barely scratches the surface! For instance the `str.format()` method also accepts keywords:

In [33]:
print("My plan, {hero}?".format(hero=name))
print("I shall destroy {target} using {weapon}!".format(target="Paris", weapon="nut-deprived chimps"))

My plan, James?
I shall destroy Paris using nut-deprived chimps!


### Exercise #2: Tell me what you've got

Write a function called `dict_contents(d)` that takes as argument a dictionary `d` with `str` keys and `int` values, and lists its contents within a silly message. "Which silly message", you ask? Make sure you return the following:

```
>>> dict_contents({"chimps": 42, "peanuts": 0})
"As for chimps, I've got 42, sadly. As for peanuts, I've got 0, sadly."
```

In [34]:
def dict_contents(d):
    print("As for chimps, I've got {chimps}, sadly. As for peanuts, I've got {peanuts}, sadly.".format(**d))

dict_contents({"chimps": 42, "peanuts": 0})

As for chimps, I've got 42, sadly. As for peanuts, I've got 0, sadly.


## Split and join

Two crucial functions you should know about are `str.join()` and `str.split()`.

 - `str.join()` links together a series of strings:

In [35]:
print("A list of bare necessities: %s." % ", ".join(["peanut", "typewriter", "peanut (important!)", "evil plans"]))

print(" love peanuts! ".join(["Monkeys", "Heroes like James Bond", "But evil blokes..."]))

A list of bare necessities: peanut, typewriter, peanut (important!), evil plans.
Monkeys love peanuts! Heroes like James Bond love peanuts! But evil blokes...


 - `str.split()` breaks down a single string into a list of strings

In [36]:
the_obvious_truth = "Chimps are born rulers and masters at eating peanuts!"

# when given an argument, we create a new string every time we encounter the argument
strings = the_obvious_truth.split("r")
for s in strings:
    print(s)
    
# this prints an empty line
print() 

# without argument, we create a new string upon encountering white-spaces
strings = the_obvious_truth.split()
for s in strings:
    print(s)
    

Chimps a
e bo
n 
ule
s and maste
s at eating peanuts!

Chimps
are
born
rulers
and
masters
at
eating
peanuts!


### Exercise #3: Sifting through many words

Implement a function `every_other_word(s)` that splits its argument string on spaces, joins every other item with an underscore and returns this transformed string. For instance:

```
>>> every_other_word("Figaro, that's a man who loves peanuts. But what about Bond? James Bond?")
'Figaro,_a_who_peanuts._what_Bond?_Bond?'
```

In [37]:
def every_other_word(string_new):
    string_new = string_new.split()
    separator = "_"
    print(separator.join(string_new))

every_other_word("Figaro, that's a man who loves peanuts. But what about Bond? James Bond?")

Figaro,_that's_a_man_who_loves_peanuts._But_what_about_Bond?_James_Bond?


## Exercise #4: Special Words

For each of the following problems, we describe a criterion that makes a word (or phrase!) special.

If you are using macOS or Linux, you should have a dictionary file available at `/usr/share/dict/words`, a 2.5M text file containing over 200 thousand English words, one per line. However, we also mirrored this file [on Arche](https://arche.univ-lorraine.fr/), so you can download the dictionary from there.

Write the method `load_english` to load English words from this file. How many English words are there in this file? Using the Arche file, we got 72165 words, after lowercasing them, removing duplicates and checking if they contain ASCII characters only (i.e. we exclude entries that contain apostrophes or accented letters).

### Submission instructions

Alright, you did it! Enough beers and peanuts for today.

You will need to submit the last two exercises (#7 and #8) on Arche before 9:59am on Friday, 7th October (just before our next lab). Submit either a `.py` or an `.ipynb` file containing the functions you wrote for the two exercises and name it `td3_firstname_lastname.py` or `td3_firstname_lastname.ipynb` accordingly, where `firstname` should be your first name and `lastname` should be your last name (e.h. Jane Doe's submission should be called `td3_jane_doe.py` or `td3_jane_doe.ipynb`, depending on whether Jane submitted a Python script or a Jupyter notebook).

To evaluate your submission, we will be looking at the following criteria:

- Does your code run? (So **run** your program at least once before submitting!)
- Does it run correctly? (So **test** your solution with a few different inputs!)
- Is your code well-commented?
- Is your code well-structured?

## Done Early?

Have a look at the [`string` module in Python](https://docs.python.org/3/library/string.html). It contains a lot of very useful things, such as lists of ascii characters. Another thing you should look into is the [unicode standard in Python](https://docs.python.org/3/howto/unicode.html).

### Other Phrases

On Puzzling.StackExchange, the user [JLee](https://puzzling.stackexchange.com/users/463/jlee) has come up with a ton of interesting puzzles of this form ("I call words that follow a certain rule "adjective" words"). If you like puzzles, optionally read through [these JLee puzzles](https://puzzling.stackexchange.com/search?q=%22I+call+it%22+title%3A%22what+is%22+is%3Aquestion+user%3A463) or [these other puzzles inspired by JLee](https://puzzling.stackexchange.com/search?tab=votes&q=%22what%20is%20a%22%20word%20is%3aquestion).

> With <3, by @sredmond

> With peanuts, monkeys and spies, by tmickus