<a href="https://colab.research.google.com/github/marcomoretto/physalia_python_2022/blob/main/Lesson_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Some string methods.

## We already talked a lot about strings, let's see some specific useful functions/methods (even though we still have to define what a function is)

In [None]:
# the join method is a string method that takes a list as an argument. This is how it works
separator = '::'
words = ['apple', 'pear', 'orange']
print(separator.join(words))

In [None]:
# the split method works on the other way, takes one string and return a list
s = "apple||pear||orange"
l = s.split('||')
print(l)

In [None]:
# find gives the index of the first occurence of the argument
s = "apple||pear||orange"
f = s.find('p')
print(f)

In [None]:
# count gives the total number of occurences
s = "apple||pear||orange"
c = s.count('p')
print(c)

In [None]:
# replace returns a new string with an old term substituted with a new term
old = 'pear'
new = 'banana'
s = "apple||pear||orange"
r = s.replace(old, new)
print(r)

# Regular Expression

RegEx are a *language* (a sequence of symbols and characters) used to express a **pattern** to be searched for within a string.

```
PN12W_bio1_tech1
PN12W_bio1_tech2
PN24NW_bio2_tech1
PN24NWW_bio2_tech2
PN48W_bio3_tech1
```

RegEx is a very versitile and powerful tool that can be used anytime the normal `string` functions/methods are not enough. RegEx is a language indipendent from Python and exists in pretty much any programming language.

Online tool [RegEx101](https://regex101.com/)

To use RegEx in Python you will need to import the package `re`.

In [None]:
import re

my_strings = 'PN12W_bio1_tech1 PN12W_bio1_tech2 PN24NW_bio2_tech1 PN24NW_bio2_tech2 PN48W_bio3_tech1'

# match that _bio<number>_ pattern
pattern = '_bio\d+_'
re.findall(pattern, my_strings)

In [None]:
# match only the <number> in the _bio<number>_ pattern
pattern = '_bio(\d+)_'
re.findall(pattern, my_strings)

In [None]:
# match PN<number>NW pattern, but N isn't always present
pattern = 'PN\d+N?W'
re.findall(pattern, my_strings)

In [None]:
# we can use RegEx to substitue pattern instead of using string.replace()

print(re.sub('_bio\d+_', '_', my_strings))

# Control Structures

With variable assignment and Control Structure you have essentially everything you need to build a real program (implement algorithms).

Let's start with the `IF` statement

The syntax is the following

```
if <condition>:
  <True block of code>
else:
  <False block of code>
```

The `condition` is **always** considered a `boolean` value. If it is `True`, the **True block** gets executed otherwise (`False`) the **False block** gets executed.

In [None]:
a = 5
b = 1

if a > b:
  print('a is bigger than b')
else:
  print('b is bigger than a')

`a > b` returns a `boolean` value, but we are not forced to have an expression that return a `boolean` value because **anything** in the *guard* (*condition*) will be **casted** (evaluated) as `boolean`.

In [None]:
a = 'hello world'
if a:
  print(a)

In [None]:
a = []
if a: # can you think of another way to do that?
  print('a is not empty')
else:
  print('a is empty')

a.append('1')
if a:
  print('a is not empty', a)
else:
  print('a is empty')


# A small but important detour: code blocks and indentation in Python

Most of the programming languages like **C**, **C++**, and **Java** use braces `{ }` to define a block of code. Python, however, uses **indentation**. The enforcement of **indentation** in Python makes the code look neat and clean. This results in Python programs that look similar and consistent.

Now let's go back to control structures. 

We can also write more than one conditions mimicking what is know as **switch** statement in other programming languages.

In [None]:
season = 'winter'

if season == 'spring':
  print('blooming flowers')
elif season == 'summer':
  print('hot and sunny')
elif season == 'autumn':
  print('falling leaves')
elif season == 'winter':
  print('cold and snow')


In [None]:
if season == 'spring':
  print('blooming flowers')
elif season == 'summer':
  print('hot and sunny')
else:
  print('cold')

In [None]:
if season == 'spring' or season == 'summer':
  print('hot')
else:
  print('cold')

# The FOR statement is by far, together with the IF statement, the control structures you are going to use the most. Let's see the syntax:

```
for <cycle variable> in <iterator or generator>:
  <block of code>
```

In [None]:
lst_a = ['a', 1, 2, 3.14, True, 'hello']

for e in lst_a: 
  print(e)    # <- within the code the variable e represent the current element of the iterator, and it will change value at every iteration

In [None]:
# let's introduce the range function that takes an integer x and return a generator of length x with the first x number
print(range(10))

In [None]:
list(range(10))

In [None]:
for i in range(10):
  if i % 2:
    print(i, ' is odd')
  else:
    print(i, ' is even')

# Exercises

Let's understand the control flow with a debugger

In [None]:
# given the following list
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria']

# print all names that starts with the letter 'd'
# --> your code here <--

# the string method capitalize() will return a string with the first capital letter, i.e. 'ciao'.capitalize() will get Ciao
# create a new list where all names are capitalized
# --> your code here <--

# how many times is the letter 'i' present in the names list (use a loop statement this time)? Can you think of a way to count all unique letters?
# --> your code here <--

# create a list of strings that represent even numbers from 100 to 150 included
# --> your code here <--

In [None]:
# let's introduce the enumerate function that takes an iterator and return a generator of tuple, the first element of the tuple is the index of the element in the iterator while the second is the element itself
names = ['marco', 'chris', 'daniel', 'jacob', 'lisa', 'anne', 'denise']

print(enumerate(names))

In [None]:
for e in enumerate(names):
  print(e)

In [None]:
# instead of using one cycle variable I can use two to identify the first and the second values of each tuple

for i, e in enumerate(names):
  print(e, 'has index (position)', i)

In [None]:
names = ['marco', 'chris', 'daniel', 'jacob', 'lisa', 'anne', 'denise']
numbers = ['333123456', '344561233', '33367409390', '3339386722', '344896725', '3339386345', '3449344766']

zip(names, numbers)

In [None]:
list(zip(names, numbers))

In [None]:
for i,j in zip(names, numbers):
  print(i, 'phone number is', j)

In [None]:
# the for statement is versitile and can be used for dictionaries as well using two cycle variable, one for the key and one for the value

dict(zip(names, numbers))

In [None]:
d = dict(zip(names, numbers))
for k, v in d.items():
  print(k, 'phone number is', v)

# The `WHILE` statement, despite being the first and most powerful iteration statement, I think about it as a combination of `FOR` and `IF` and rarely use it (only for infinite loop).

```
while <condition>:
    <block of code>
```

In [None]:
a = 0
while a < 5:
  print(a)
  a += 1 # without this line, the loop will never end

In [None]:
n = 10
a, b = 0, 1
while n > 0:
  print(b)
  n -= 1
  a, b = b, a + b

In [None]:
n = 0
a, b = 0, 1
while True:
  print(n, b)
  n += 1
  a, b = b, a + b
  if n >= 10:
    break

In [None]:
n = 0
a, b = 0, 1
while True:
  print(n, b)
  n += 1
  a, b = b, a + b
  if n >= 10:
    break

In [None]:
l = [chr(i) for i in range(97, 123)] # ignore this line of code
l

In [None]:
for i in l:
  if i == 'f':
    break
  print(i)


In [None]:
for i in l:
  if i == 'f':
    continue
  print(i)

In [None]:
# do you remember the zip function? We can loop on 2 variables
idx = range(26)

for i, j in zip(idx, l):
  print(i, j) 

In [None]:
# we can obtain the same using enumerate
for i, j in enumerate(l):
  print(i, j)

In [None]:
list(enumerate(l)) == list(zip(range(len(l)), l))

# Exercises

In [None]:
# write the code for the fibonacci series using the for loop
# --> your code here <--

# given the following list
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria', 'jack', 'rose', 'adele', 'max', 'sue']

# create a list of all the possible couples
# --> your code here <--

# given the two following lists
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria', 'jack', 'rose', 'adele', 'max', 'sue']
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

# create a dictionary where the keys are letters and values are lists of names starting with that letter
# --> your code here <--


# List and Dictionary Comprehension

In [None]:
# list comprehension is a faster way to create lists in Python

a = []
for i in range(10):
  a.append(i**2)

print(a)

In [None]:
a = [i**2 for i in range(10)]
print(a)

In [None]:
# chr and ord are two functions that convert respectively a number ascii code into the corresponding character and vice-versa

chr(97), chr(98)

In [None]:
ord('a'), ord('b')

In [None]:
letters = []
for i in range(97, 97 + 26):
  letters.append(chr(i))

print(letters)

In [None]:
letters = [chr(i) for i in range(97, 97 + 26)]
print(letters)

In [None]:
odd_numbers = [i for i in range(10) if i % 2]
print(odd_numbers)

In [None]:
{chr(i): i for i in range(97, 123)} # what is it?

In [None]:
# you can use if and else within the list comprehension but with a bit different syntax
[x for x in range(10) if x % 2 == 1]

In [None]:
[x if x % 2 == 1 else '*' for x in range(10)]

# Exercises

In [None]:
# given the two following lists
names = ['anne', 'chris', 'daniel', 'denise', 'jacob', 'lisa', 'maria', 'jack', 'rose', 'adele', 'max', 'sue']
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

# create a dictionary where keys are items' names and value is its length
# --> your code here <--

# consider the following list
idx = [2, 1, 11, 7, 0, 3, 4, 8, 5, 10, 9, 6]

# reorder names using idx as the index order, that is anne should be moved to third position, index 2 instead of first position, index 0
# --> your code here <--

# create a dictionary with two keys 'even' and 'odd' an put as values the first 25 even and the first 50 odd numbers respectively
# --> your code here <--
