# Some string methods.

## We already talked a lot about strings, let's see some specific useful functions/methods (even though we still have to define what a function is)

In [11]:
# the join method
word = ['apple', 'pear', 'orange']
sep = '::'

In [12]:
sep.join(word)

'apple::pear::orange'

In [13]:
s = 'apple||pear||orange'
sep = '||'

s.split(sep)

['apple', 'pear', 'orange']

In [17]:
s = 'apple||pear||orange'
s.find('z')

-1

In [18]:
s

'apple||pear||orange'

In [19]:
s.replace('apple', 'banana')

'banana||pear||orange'

# Regular Expression

RegEx is a *language* (a sequence of symbols and characters) used to express a **pattern** to be searched for within a string.

```
PN12W_bio1_tech1
PN12W_bio1_tech2
PN24NW_bio2_tech1
PN24NWW_bio2_tech2
PN48W_bio3_tech1
```

RegEx is a very versitile and powerful tool that can be used anytime the normal `string` functions/methods are not enough. RegEx is a language indipendent from Python and exists in pretty much any programming language.

Online tool [RegEx101](https://regex101.com/)

To use RegEx in Python you will need to import the package `re`.

In [21]:
import re

pattern = '_bio\d_'
my_strings = 'PN12W_bio1_tech1 PN12W_bio1_tech2 PN24NW_bio2_tech1 PN24NW_bio2_tech2 PN48W_bio3_tech1'


In [22]:
re.findall(pattern, my_strings)

['_bio1_', '_bio1_', '_bio2_', '_bio2_', '_bio3_']

In [24]:
pattern = '_bio(\d)_'
my_strings = 'PN12W_bio1_tech1 PN12W_bio1_tech2 PN24NW_bio2_tech1 PN24NW_bio2_tech2 PN48W_bio3_tech1'

In [25]:
re.findall(pattern, my_strings)

['1', '1', '2', '2', '3']

In [27]:
pattern = '_bio\d_'
my_strings = 'PN12W_bio1_tech1 PN12W_bio1_tech2 PN24NW_bio2_tech1 PN24NW_bio2_tech2 PN48W_bio3_tech1'

re.sub(pattern, '_', my_strings)

'PN12W_tech1 PN12W_tech2 PN24NW_tech1 PN24NW_tech2 PN48W_tech1'

In [29]:
pattern = '_bio\d_'
my_strings = 'PN12W_bio1_tech1 PN12W_bio1_tech2 PN24NW_bio2_tech1 PN24NW_bio2_tech2 PN48W_bio3_tech1'

re.split(pattern, my_strings)

['PN12W',
 'tech1 PN12W',
 'tech2 PN24NW',
 'tech1 PN24NW',
 'tech2 PN48W',
 'tech1']

# Control Structures

With variable assignment and Control Structure you have essentially everything you need to build a real program (implement algorithms).

Let's start with the `IF` statement

The syntax is the following

```
if <condition>:
  <True block of code>
else:
  <False block of code>
```

The `condition` is **always** considered a `boolean` value. If it is `True`, the **True block** gets executed otherwise (`False`) the **False block** gets executed.

In [40]:
a = 5
b = 1

if a > b:
    print('a is bigger than b')
else:
    print('b is bigger or equal than a')

a is bigger than b


In [41]:
a > b

True

`a > b` returns a `boolean` value, but we are not forced to have an expression that return a `boolean` value because **anything** in the *guard* (*condition*) will be **casted** (evaluated) as `boolean`.

In [56]:
a = 'hello world'

if a:
    print('the string a is non empty')
    print('here is anothe message')

the string a is non empty
here is anothe message


In [57]:
a = []
if a:
    print('a is not empty')
else:
    print('a is empty')

a is empty


In [58]:
bool(a)

False

In [61]:
season = 'winter'

if season == 'spring':
    print('blooming flowers')
elif season == 'summer':
    print('hot and sunny')
elif season == 'autumn':
    print('falling leaves')
elif season == 'winter':
    print('cold and snow')

cold and snow


In [62]:
if season == 'spring':
    print('blooming flowers')
elif season == 'summer':
    print('hot and sunny')
else:
    print('cold')

cold


In [63]:
if season == 'spring' or season == 'summer':
    print('hot')
else:
    print('cold')

cold


In [65]:
# and -> &&
# or -> ||
# not -> !

In [66]:
a = True

In [68]:
not False

True

In [73]:
False or False

False

In [74]:
True and True

True

In [77]:
season = 'spring'
if season == 'spring':
    print('blooming flowers')
else:
    print('else branch')

print('hello')

blooming flowers
hello


# The FOR statement is by far, together with the IF statement, the control structures you are going to use the most. Let's see the syntax:

```
for <cycle variable> in <iterator or generator>:
  <block of code>
```

In [80]:
lst_a = ['a', 1, 2, 3, 4.0, True, 'hello']

for x in lst_a:
    print('hello', x)

hello a
hello 1
hello 2
hello 3
hello 4.0
hello True
hello hello


In [81]:
print('hello', 'a')
print('hello', 1)
print('hello', 2)
print('hello', 3)
print('hello', 4.0)
print('hello', True)
print('hello', 'hello')

hello a
hello 1
hello 2
hello 3
hello 4.0
hello True
hello hello


In [79]:
# for (int i=0; i < n; i++) {
#  lst_a[i]
# }

In [82]:
a = [0,0,0,0,0,0,0,0]
for x in a:
    print('hello')

hello
hello
hello
hello
hello
hello
hello
hello


In [84]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [88]:
range(10)

range(0, 10)

In [85]:
for x in range(8):
    print('hello')

hello
hello
hello
hello
hello
hello
hello
hello


In [89]:
# print the first ten numbers and print who is odd and who is even
for i in range(10):
    if i % 2 == 0:
        print(i, 'is even')
    else:
        print(i, 'is odd')

0 is even
1 is odd
2 is even
3 is odd
4 is even
5 is odd
6 is even
7 is odd
8 is even
9 is odd


In [92]:
# given the following list
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria']

# print all names that starts with the letter 'd'
for name in names:
    if "d" == name[0]:
        print(name)

# the string method capitalize() will return a string with the first capital letter, i.e. 'ciao'.capitalize() will get Ciao
# create a new list where all names are capitalized

# how many times is the letter 'i' present in the names list (use a loop statement this time)? Can you think of a way to count all unique letters?
# --> your code here <--

# create a list of strings that represent even numbers from 100 to 150 included
# --> your code here <--

daniel
denise


'Ciao'

In [97]:
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria']
names2 = names.copy()
for x in range(len(names)):
    names2[x] = names[x].capitalize()

In [98]:
names2

['Anne', 'Chris', 'Daniel', 'Denise', 'Jacob', 'Lisa', 'Maria']

In [99]:
names

['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria']

In [100]:
for i in names:
    new_names = i.capitalize() 
    print(new_names) 

Anne
Chris
Daniel
Denise
Jacob
Lisa
Maria


In [102]:
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria']
Names = []
for name in names:
    Names.append(name.capitalize()) 

In [103]:
Names

['Anne', 'Chris', 'Daniel', 'Denise', 'Jacob', 'Lisa', 'Maria']

In [104]:
names

['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria']

In [105]:
count = 0
for name in names:
    count += name.count('i') # count = count + name.count('i')
print(count) 

5


In [106]:
'daniel'.count('i')

1

In [107]:
names

['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria']

In [114]:
letters = []
for name in names:
    for letter in list(name):
        letters.append(letter)

In [116]:
set(letters)

{'a', 'b', 'c', 'd', 'e', 'h', 'i', 'j', 'l', 'm', 'n', 'o', 'r', 's'}

In [118]:
set(''.join(names))

{'a', 'b', 'c', 'd', 'e', 'h', 'i', 'j', 'l', 'm', 'n', 'o', 'r', 's'}

In [120]:
list(range(10, 20))

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [122]:
numbers = [] 
for number in range(100,151): 
    if number % 2 == 0: 
        numbers.append(str(number)) 
print(numbers) 

['100', '102', '104', '106', '108', '110', '112', '114', '116', '118', '120', '122', '124', '126', '128', '130', '132', '134', '136', '138', '140', '142', '144', '146', '148', '150']


In [123]:
even_num = list(range(100, 151, 2))

In [126]:
print(even_num)

[100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150]


In [170]:
import re
name_string = "\n".join(names)
pattern = r"^d"
re.findall(pattern, name_string) 

[]

In [169]:
import re
name_string = " ".join(names)
pattern = "\s(d)"
re.findall(pattern, name_string) 

['d', 'd']

In [166]:
name_string

'marco\nchris\ndaniel\njacob\nlisa\nanne\ndenise'

In [130]:
# let's introduce the enumerate function that takes an iterator and return a generator of tuple, the first element of the tuple is the index of the element in the iterator while the second is the element itself
names = ['marco', 'chris', 'daniel', 'jacob', 'lisa', 'anne', 'denise']

print(enumerate(names))

<enumerate object at 0x7fa4e5602100>


In [131]:
list(enumerate(names))

[(0, 'marco'),
 (1, 'chris'),
 (2, 'daniel'),
 (3, 'jacob'),
 (4, 'lisa'),
 (5, 'anne'),
 (6, 'denise')]

In [132]:
i = 0
for name in names:
    print(name, i)
    i += 1

marco 0
chris 1
daniel 2
jacob 3
lisa 4
anne 5
denise 6


In [135]:
for i in enumerate(names):
    print('index', i[0], 'value', i[1])

index 0 value marco
index 1 value chris
index 2 value daniel
index 3 value jacob
index 4 value lisa
index 5 value anne
index 6 value denise


In [136]:
for i, j in enumerate(names):
    print('index', i, 'value', j)

index 0 value marco
index 1 value chris
index 2 value daniel
index 3 value jacob
index 4 value lisa
index 5 value anne
index 6 value denise


In [137]:
names = ['marco', 'chris', 'daniel', 'jacob', 'lisa', 'anne', 'denise']
numbers = ['333123456', '344561233', '33367409390', '3339386722', '344896725', '3339386345', '3449344766']



In [139]:
list(zip(names, numbers))

[('marco', '333123456'),
 ('chris', '344561233'),
 ('daniel', '33367409390'),
 ('jacob', '3339386722'),
 ('lisa', '344896725'),
 ('anne', '3339386345'),
 ('denise', '3449344766')]

In [140]:
for name, number in zip(names, numbers):
    print('Name', name, 'Number', number)

Name marco Number 333123456
Name chris Number 344561233
Name daniel Number 33367409390
Name jacob Number 3339386722
Name lisa Number 344896725
Name anne Number 3339386345
Name denise Number 3449344766


In [141]:
d = dict(zip(names, numbers))
d

{'marco': '333123456',
 'chris': '344561233',
 'daniel': '33367409390',
 'jacob': '3339386722',
 'lisa': '344896725',
 'anne': '3339386345',
 'denise': '3449344766'}

In [149]:
for i in d:
    print(i)

marco
chris
daniel
jacob
lisa
anne
denise


In [150]:
d.items()

dict_items([('marco', '333123456'), ('chris', '344561233'), ('daniel', '33367409390'), ('jacob', '3339386722'), ('lisa', '344896725'), ('anne', '3339386345'), ('denise', '3449344766')])

In [146]:
for k, v in d.items():
    print('key', k, 'value', v)

key marco value 333123456
key chris value 344561233
key daniel value 33367409390
key jacob value 3339386722
key lisa value 344896725
key anne value 3339386345
key denise value 3449344766


# The `WHILE` statement, despite being the first and most powerful iteration statement, I think about it as a combination of `FOR` and `IF` and rarely use it (only for infinite loop).

```
while <condition>:
    <block of code>
```

In [176]:
a = 0
while a < 5:
    print(a)
    a += 1 # <- this one is really important

0
1
2
3
4


In [174]:
5 < 5

False

In [177]:
letters = [chr(i) for i in range(97, 123)] 

In [179]:
print(letters)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']


In [181]:
for l in letters:
    if l == 'f':
        break
    print(l, 'inside the loop')
print('outside the for loop')

a inside the loop
b inside the loop
c inside the loop
d inside the loop
e inside the loop
outside the for loop


In [182]:
for l in letters:
    if l == 'f':
        continue # instead of break
    print(l, 'inside the loop')
print('outside the for loop')

a inside the loop
b inside the loop
c inside the loop
d inside the loop
e inside the loop
g inside the loop
h inside the loop
i inside the loop
j inside the loop
k inside the loop
l inside the loop
m inside the loop
n inside the loop
o inside the loop
p inside the loop
q inside the loop
r inside the loop
s inside the loop
t inside the loop
u inside the loop
v inside the loop
w inside the loop
x inside the loop
y inside the loop
z inside the loop
outside the for loop


In [184]:
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria', 'jack', 'rose', 'adele', 'max', 'sue']

In [199]:
# create a list of all the possible couples
couples = []
for name1 in names:
    for name2 in names:
        if name1 != name2:
            couple = sorted([name1, name2])
            if not couple in couples:
                couples.append(couple)

In [201]:
len(names)

12

In [202]:
len(couples)

66

In [213]:
couples = []
for name1 in names:
    for name2 in names:
        if name1 == name2:
            continue
        couple = tuple(sorted([name1, name2]))
        couples.append(couple)

In [214]:
len(set(couples))

70

In [None]:
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria','jack', 'rose', 'adele', 'max', 'sue']
# create all possible couples

couples = []
for i in names[:-1]:
    for j in names[names.index(i)+1:]:
        couple = [i,j]
        couples.append(couple)
        
print(couples)  

In [232]:
couples = []
for i, e1 in enumerate(names):
    for e2 in names[i + 1:]:
        couples.append((e1, e2))

In [233]:
len(couples)

66

In [236]:
import itertools

len(list(itertools.combinations(names, 2)))

66

In [186]:
# given the two following lists
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria', 'jack', 'rose', 'adele', 'max', 'sue']
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

# create a dictionary where the keys are letters and values are lists of names starting with that letter


In [237]:
d = {}
for letter in letters:
    d[letter] = []

In [239]:
for name in names:
    d[name[0]].append(name)

In [243]:
didid={}
for letter in letters:
    for name in names:
        if letter == name[0]:
            didid[letter] = name
            #didid.append(letter)
            #didid.append(name)
print(didid)

{'a': 'adele', 'c': 'chris', 'd': 'denise', 'j': 'jack', 'l': 'lisa', 'm': 'max', 'r': 'rose', 's': 'sue'}


In [245]:
letters = [chr(i) for i in range(97,123)]
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria','jack', 'rose', 'adele', 'max', 'sue'] 
name_dict = {} # create a dictionary with letters as keys and names starting in that letter as values 
for name in names: 
    first_letter = name[0] 
    if not first_letter in name_dict: 
        name_dict[first_letter]=[name] 
    else: 
        name_dict[first_letter].append(name) 
name_dict

{'a': ['anne', 'adele'],
 'c': ['chris'],
 'd': ['daniel', 'denise'],
 'j': ['jacob', 'jack'],
 'l': ['lisa'],
 'm': ['maria', 'max'],
 'r': ['rose'],
 's': ['sue']}

# List and Dictionary Comprehension

In [246]:
a = []
for i in range(10):
    a.append(i**2)

In [247]:
a

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [248]:
a = [i**2 for i in range(10)]

In [249]:
a

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [253]:
ord('a'), ord('z'), ord('A')

(97, 122, 65)

In [251]:
chr(97)

'a'

In [255]:
letters = [chr(i) for i in range(97, 123)]

In [257]:
print(letters)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']


In [261]:
print([i**2 for i in range(20) if i % 2])

[1, 9, 25, 49, 81, 121, 169, 225, 289, 361]


In [262]:
print([i**2 if i % 2 else -1 * i for i in range(20)])

[0, 1, -2, 9, -4, 25, -6, 49, -8, 81, -10, 121, -12, 169, -14, 225, -16, 289, -18, 361]


In [264]:
print(names)

['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria', 'jack', 'rose', 'adele', 'max', 'sue']


In [265]:
{x[0]:x for x in names}

{'a': 'adele',
 'c': 'chris',
 'd': 'denise',
 'j': 'jack',
 'l': 'lisa',
 'm': 'max',
 'r': 'rose',
 's': 'sue'}

In [267]:
a, type(a)

([0, 1, 4, 9, 16, 25, 36, 49, 64, 81], list)

In [268]:
'a'

'a'

In [272]:
ord('a') # ASCII code

97

In [273]:
# given the two following lists
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria', 'jack', 'rose', 'adele', 'max', 'sue']
#letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

# create a dictionary where keys are items' names and value is its length
# --> your code here <--

# consider the following list
idx = [2, 1, 11, 7, 0, 3, 4, 8, 5, 10, 9, 6]

# reorder names using idx as the index order, that is anne should be moved to third position, index 2 instead of first position, index 0
# --> your code here <--

# create a dictionary with two keys 'even' and 'odd' an put as values the first 25 even and the first 50 odd numbers respectively
# --> your code here <--

In [274]:
{n:len(n) for n in names} 

{'anne': 4,
 'chris': 5,
 'daniel': 6,
 'denise': 6,
 'jacob': 5,
 'lisa': 4,
 'maria': 5,
 'jack': 4,
 'rose': 4,
 'adele': 5,
 'max': 3,
 'sue': 3}

In [280]:
[x[1] for x in sorted(list(zip(idx, names)))]

['jacob',
 'chris',
 'anne',
 'lisa',
 'maria',
 'rose',
 'sue',
 'denise',
 'jack',
 'max',
 'adele',
 'daniel']

In [283]:
{'odd': [i for i in range(25) if i % 2 == 0], 'even': []}

{'odd': [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24], 'even': []}

In [284]:
{   'odd': [],
    'even': []
}

{'odd': [], 'even': []}