# Discovery of notebooks

This is some Markdown code with *italics* and _bold_ with underscores.

### Title

You can also write math: $$\exp(i \pi) + 1 = 0$$

Can also use code with backticks: 
```python
x = 1
```

In [1]:
import math  # built-on module
# check your python version
import sys
from math import *  # avoid
from math import pi, sqrt

import matplotlib.pyplot as plt
# we use aliases:
import numpy
import numpy as np
import pandas as pd
from numpy import *  # avoid
from numpy import array  # import from top level module
from numpy import pi  # beware that names can be duplicated over packages!
from numpy.linalg import norm  # import from a submodule
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler

x = 2  # comment: notice the space around the equal (PEP8)
print(x)

2


In [2]:
print(sys.version)

3.12.5 | packaged by conda-forge | (main, Aug  8 2024, 18:32:50) [Clang 16.0.6 ]


In [3]:
!python --version

Python 3.12.5


In [4]:
!pwd  # can run bash commands inside python cell using `!`

/Users/sylvaincom/Documents/GitHub/pyds_lectures/cs_exed/ms_ia_confiance/2024_2025/lectures


In [5]:
!pip install numpy



# Declaring variables

In [6]:
y = 3
y  # will print the variable if it is at the last line

3

In [7]:
1
2  # only the value gets displayed

2

In [8]:
print(1)
print(2)

1
2


In [9]:
# if I want them not to be printed, use ; (avoid if possible)
1 + 2;

In [10]:
# you can also assign the result to avoid it being printed:
a = 1 + 2

In [11]:
# python convention: use _ as name for useless variable
_ = 1 + 2

In [12]:
print(3 + 2)

5


In [13]:
x = 3 + 2
print(x)

5


In [14]:
print(type(x))

<class 'int'>


In [15]:
help(type)

Help on class type in module builtins:

class type(object)
 |  type(object) -> the object's type
 |  type(name, bases, dict, **kwds) -> a new type
 |
 |  Methods defined here:
 |
 |  __call__(self, /, *args, **kwargs)
 |      Call self as a function.
 |
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |
 |  __dir__(self, /)
 |      Specialized __dir__ implementation for types.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __instancecheck__(self, instance, /)
 |      Check if an object is an instance.
 |
 |  __or__(self, value, /)
 |      Return self|value.
 |
 |  __repr__(self, /)
 |      Return repr(self).
 |
 |  __ror__(self, value, /)
 |      Return value|self.
 |
 |  __setattr__(self, name, value, /)
 |      Implement setattr(self, name, value).
 |
 |  __sizeof__(self, /)
 |      Return memory consumption of the t

In [16]:
my_variable = 1  # snake_case: everything in lower case, separated by underscores
MyClass = 0  # CamelCase: avoid, only for classes
_my_variable = 0  # this means "private", use at your own risks
my_variable2 = 0

In [17]:
2_variable  # cannot start with a digit

SyntaxError: invalid decimal literal (3026941591.py, line 1)

In [18]:
age = 30  # use meaningful variable names

In [19]:
my_very_long_variable_name = 31

In [20]:
my_very_long_variable_name  # use autocomplete (Tab)

31

In [21]:
type(age)

int

In [22]:
n = 1000000  # not nice
print(n, type(n))
n = 10 * 6  # nice
print(n, type(n))
n = 1e6  # nice, but float
print(n, type(n))
n = 1_000_000  # very nice (for readability)
print(n, type(n))

1000000 <class 'int'>
60 <class 'int'>
1000000.0 <class 'float'>
1000000 <class 'int'>


In [23]:
n = 236617  # not nice
print(n)
n = 236_617  # very nice
print(n)

236617
236617


Declare a variable if the same value is used several times, so that you can change it once, and the changes will apply everywhere (no need to declare if a variable if it is only used once in the whole notebook):

In [24]:
# avoid
print(2 * 100)
print(3 * 100)
print(4 * 100)

# recommended
n = 100
print(2 * n)
print(3 * n)
print(4 * n)

200
300
400
200
300
400


In [25]:
x = 1
print(x)

x += 1  # x = x + 1
print(x)

x -= 3  # x = x - 3
print(x)

x *= 12  # x = x * 12
print(x)

x /= 3  # x = x / 3
print(x)

1
2
-1
-12
-4.0


# For-loops

In [26]:
n = 5
for i in range(n):
    print(i)

0
1
2
3
4


In [27]:
i  # last value of the for loop

4

In [28]:
n = 5
for i in range(n + 1):
    print(i)

0
1
2
3
4
5


In [29]:
n = 5
for i in range(1, n + 1):
    print(i)

1
2
3
4
5


In [30]:
n = 10
for i in range(0, n + 1, 2):
    print(i)

0
2
4
6
8
10


In [31]:
for _ in range(5):  # _ means that it will be discarded, not useful
    print(2)

2
2
2
2
2


In [32]:
my_sum = 0
for i in range(5):  # use `i` if it is reused, a true variable
    my_sum += i

# Manipulating strings

In [33]:
first_name = "John"
last_name = "Doe"  # single quotes also work

In [34]:
type(first_name)

str

In [35]:
print(first_name)
print(first_name, last_name)

John
John Doe


In [36]:
# you should use f-strings (format strings)
print(f"My first name is {first_name} and my last name is {last_name}")
# instead of:
print("My first name is", first_name, "and my last name is", last_name)
# f-strings can be very concise:
print(f"{first_name = }")

My first name is John and my last name is Doe
My first name is John and my last name is Doe
first_name = 'John'


In [37]:
s = "line1\nline2"
print(s)

line1
line2


In [38]:
s = "word1\tword2"
print(s)

word1	word2


In [39]:
s = "word1\n\tword2"
print(s)

word1
	word2


In [40]:
len(first_name)

4

In [41]:
first_name[0]  # python indexes always start at 0

'J'

In [42]:
first_name[1]

'o'

In [43]:
first_name[len(first_name)]

IndexError: string index out of range

get used to reading python errors because they are very informative, always start from the bottom of the message error, copy-paste the last line on Google to understand what is going on

In [44]:
first_name[len(first_name) - 1]

'n'

In [45]:
first_name[-1]  # most commonly used to get the last element of a sequence

'n'

In [46]:
first_name[-2]  # - 2 is interpreted as len(first_name) - 2

'h'

In [47]:
first_name[2:4]  # this is called slicing

'hn'

In [48]:
print(first_name)
print(
    first_name[1:5:2]
)  # first:last:step means from first incl. to last excl. by steps of `steps`

John
on


In [49]:
print(first_name[:5:2])  # nothing means 0 for first
print(first_name[2::2])  # nothing means len(first_name) for last
print(first_name[::2])

Jh
h
Jh


In [50]:
print(first_name[::-1])  # this is the most commonly used to reverse a string
print(
    first_name[-3:1:1]
)  # if first is greater than last and step is >0, we get empty string
print(len(first_name[-3:1:1]))
print(first_name[-3:1:-1])

nhoJ

0



In [51]:
for char in first_name:
    print(char)

J
o
h
n


In [52]:
for i in range(len(first_name)):
    print(first_name[i])

J
o
h
n


In [53]:
first_name.lower()
# this "lower" function is called a method and it's
# "attached" to the variable `first_name`

'john'

In [54]:
lower(first_name)  # does not work, not defined

NameError: name 'lower' is not defined

# Manipulating the types

In [55]:
my_num = 123
my_num[0]

TypeError: 'int' object is not subscriptable

In [56]:
x = 1.0
y = 0.0
print(x, y)

# swap x and y
x, y = y, x  # those are called tuples, you can also write (x, y)
print(x, y)

1.0 0.0
0.0 1.0


In [57]:
my_int = 1
print(type(my_int))
my_float = 1.5
print(type(my_float))
my_second_float = 1.0
print(type(my_second_float))

<class 'int'>
<class 'float'>
<class 'float'>


In [58]:
type(float(my_int))

float

In [59]:
type(int(my_float))

int

In [60]:
int(1.8)  # this take the ceil function of its argument

1

In [61]:
"123" + 1

TypeError: can only concatenate str (not "int") to str

In [62]:
int("123") + 1

124

In [63]:
int("12a")

ValueError: invalid literal for int() with base 10: '12a'

In [64]:
int("1.1")

ValueError: invalid literal for int() with base 10: '1.1'

In [65]:
"123" + "345"  # this does concatenation

'123345'

In [66]:
[0, 1, 3] + [3, 4, -1]  # also concatenation for lists, and tuples

[0, 1, 3, 3, 4, -1]

same `+` operation, but it adapts to the types and does something different

In [67]:
"123" * 3

'123123123'

In [68]:
[0] * 3

[0, 0, 0]

In [69]:
print("#" * 80)  # do *not* write `print("#################")`

################################################################################


In [70]:
len("123")

3

In [71]:
len(123)

TypeError: object of type 'int' has no len()

In [72]:
2.5 + 1

3.5

In [73]:
1 / 2

0.5

In [74]:
print(3 // 2)  # this is integer div
# same as
print(int(3 / 2))

1
1


In [75]:
2 / 3

0.6666666666666666

Python is the best calculator, do not use the calculator of your smartphone when you have python in front of you...

In [76]:
round(4.5)

4

In [77]:
round(3.5139879, 3)  # pass a second argument

3.514

In [78]:
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.

    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



In [79]:
# how to check if a number is even or uneven?

x = 10
print(x % 2)  # rest of the Euclidean division

0


In [80]:
x = 11
if x % 2 == 0:
    print(f"{x = } is even")
else:
    print(f"{x = } is uneven")

x = 11 is uneven


# Manipulating booleans and if-else conditions

In [81]:
# comparing objects
1 == 2

False

In [82]:
1 != 2

True

In [83]:
type(1)

int

In [84]:
type(1.0)

float

In [85]:
1 == 1.0  # this compares the object values

True

In [86]:
if True:
    print("1st answer")

1st answer


In [87]:
if False:
    print("1st answer")
print("without indent, it is not in the condition")

without indent, it is not in the condition


In [88]:
if False:
    print("1st answer")

In [89]:
x = 2
if x == 2:
    print("1st answer")

1st answer


In [90]:
y = 3
if y == 2:
    print("1st answer")
else:
    print("2nd answer")

2nd answer


In [91]:
1 < 2

True

In [92]:
"a" < "z"

True

In [93]:
"abc" < "abe"

True

# Manipulating lists

In [94]:
my_list = ["a", "b", "c", "d"]
type(my_list)

list

In [95]:
my_list

['a', 'b', 'c', 'd']

In [96]:
len(my_list)

4

In [97]:
for i in range(len(my_list)):
    print(i)

0
1
2
3


In [98]:
min([0, 1, -1])

-1

In [99]:
max(["a", "z", "A"])  # upper case comes first in lexicographic order

'z'

In [100]:
sum([0, 1, -1])

0

In [101]:
list("John")  # from string to list

['J', 'o', 'h', 'n']

In [102]:
# from list to string
"".join(["J", "o", "h", "n"])  # this is the "joining" operation

'John'

In [103]:
"_".join(["J", "o", "h", "n"])

'J_o_h_n'

In [104]:
sorted(["a", "z", "A"])  # using the builtin `sorted` function

['A', 'a', 'z']

In [105]:
for i in range(len(my_list)):
    print(my_list[i])

a
b
c
d


In [106]:
for elem in my_list:
    print(elem)

a
b
c
d


In [107]:
my_list = [1, "a", 2.1]  # each element can be of different type

In [108]:
my_list_1 = [1, "a", 2.1]
my_list_2 = [-1, "b"]
my_list_1 + my_list_2  # concatenate lists

[1, 'a', 2.1, -1, 'b']

In [109]:
my_list = [18, 17, 20]

In [110]:
my_list[-1]

20

In [111]:
my_list.append(10)  # add an element at the end
my_list

[18, 17, 20, 10]

In [112]:
my_list[-1]

10

In [207]:
# standard way to build lists:
values = []  # empty list
# values = list()  # also works
print(len(values))

for i in range(5):
    values.append(i)
print(values)

0
[0, 1, 2, 3, 4]


In [113]:
# list in comprehension syntax
my_list = [i for i in range(5)]
my_list

[0, 1, 2, 3, 4]

In [114]:
my_list = [i + 1 for i in range(5)]
my_list

[1, 2, 3, 4, 5]

In [115]:
my_list = [i**2 for i in range(5)]
my_list

[0, 1, 4, 9, 16]

In [116]:
my_list = [i for i in range(10) if True]
my_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [117]:
my_list = [i for i in range(10) if False]
my_list

[]

In [118]:
my_list = [i for i in range(10) if i % 2 == 0]
my_list

[0, 2, 4, 6, 8]

In [119]:
my_list = [i**2 for i in range(10) if i % 2 == 0]
my_list

[0, 4, 16, 36, 64]

In [209]:
nested_list = [[1, 2, 3], [4, 5]]
flattened = [elem for sublist in nested_list for elem in sublist]
print(flattened)
# can chain for (same way you can have nested for loops)

[1, 2, 3, 4, 5]


# Manipulating dictionaries

In [120]:
my_dict = dict()  # or {}
my_dict["matthieu"] = 18
my_dict["valentine"] = 17
my_dict["yanis"] = 20
print(my_dict)
print(my_dict["yanis"])

{'matthieu': 18, 'valentine': 17, 'yanis': 20}
20


In [121]:
my_dict["elise"] = 10  # # sets a new key/value pair (or change existing value)
print(my_dict)

{'matthieu': 18, 'valentine': 17, 'yanis': 20, 'elise': 10}


In [219]:
my_dict["elise"] = 12
print(my_dict)

{'a': 'value 1', 'b': 3, 1: 10, '1': 45, 'elise': 12}


contrary to lists, in dictionaries, you access the value using a key and not an index

In [122]:
for key, value in my_dict.items():
    print(key, value)

matthieu 18
valentine 17
yanis 20
elise 10


In [221]:
# system of key/value pairs
my_dict = {
    "a": "value 1",
    "b": 3,
    1: 10,
    "1": 45,
}
# can have (nearly) all data types as keys
# and (exactly) any datatype as value

In [222]:
my_dict[1]

10

In [223]:
my_dict.get("b")

3

In [225]:
del my_dict["1"]
print(my_dict)

{'a': 'value 1', 'b': 3, 1: 10}


# Manipulating numpy arrays

In [176]:
M = np.array([[1, 2, 3], [4, 5, 6]])
M  # use `M` in capital letters instead of `m` because it is an array and not an integer or vector

array([[1, 2, 3],
       [4, 5, 6]])

an array if a list of lists

In [177]:
type(M)

numpy.ndarray

In [178]:
# always know the shapes of your arrays
print(np.shape(M))
print(M.shape)

(2, 3)
(2, 3)


In [179]:
my_list_of_lists = [[1, 2, 3], [4, 5, 6]]
my_list_of_lists

[[1, 2, 3], [4, 5, 6]]

In [180]:
np.array(my_list_of_lists)

array([[1, 2, 3],
       [4, 5, 6]])

In [181]:
N = np.array([[1, 2, 3], [4, "b", 6]])
N  # all elements must have the same type, contrary to lists

array([['1', '2', '3'],
       ['4', 'b', '6']], dtype='<U21')

In [182]:
my_irregular_list = [[1, 2, 3], [4, 5]]
print(my_irregular_list)

[[1, 2, 3], [4, 5]]


In arrays, each row must have the same number of columns

In [183]:
print(np.array(my_irregular_list))

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

In [213]:
nested_array = np.array([[1, 2, 3], [4, 5, 6]])
nested_array.flatten()

array([1, 2, 3, 4, 5, 6])

In [196]:
M = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(M)
print(M[0, 0])
print(M[1, 2])
print(M[1, 0:2])
print(M[:, -1])
print(M[:, ::2])

[[1 2 3 4]
 [5 6 7 8]]
1
7
[5 6]
[4 8]
[[1 3]
 [5 7]]


In [198]:
print(np.mean(M))
print(np.mean(M, axis=0))
print(np.mean(M, axis=1))

4.5
[3. 4. 5. 6.]
[2.5 6.5]


In [131]:
M = np.ones(5)
M

array([1., 1., 1., 1., 1.])

In [132]:
M = np.ones((3, 5))
M

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [133]:
M = np.zeros((5, 3))
M

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [134]:
M = np.eye(3)
M

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [135]:
np.diag(M)

array([1., 1., 1.])

In [199]:
np.diag([3, 4, 5])

array([[3, 0, 0],
       [0, 4, 0],
       [0, 0, 5]])

In [136]:
M = np.array([[1, 0], [0, 1]])
N = np.array([[4, 1], [2, 2]])
print(M)
print(N)
print(M + N)

[[1 0]
 [0 1]]
[[4 1]
 [2 2]]
[[5 1]
 [2 3]]


In [137]:
print(np.dot(M, N))

[[4 1]
 [2 2]]


Numpy vectorizes computations and thus make them faster:

In [173]:
long_list = list(range(10_000))
%timeit sum(long_list)

220 μs ± 2.94 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [174]:
long_array = np.arange(10_000)
%timeit np.sum(long_array)

1.85 μs ± 17.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [200]:
init_arr = np.arange(12)
print(init_arr)
print(init_arr.shape)

[ 0  1  2  3  4  5  6  7  8  9 10 11]
(12,)


In [201]:
init_arr.reshape(3, 4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [202]:
init_arr.reshape(3, 3)

ValueError: cannot reshape array of size 12 into shape (3,3)

In [203]:
arr = np.array([[1, 2, 3]])
print(arr)
print(arr.shape)

[[1 2 3]]
(1, 3)


In [204]:
arr.T  # transposing arrays of dim > 1 affects them

array([[1],
       [2],
       [3]])

# Functions

In [138]:
# a function always returns something
output = print(1)

1


In [139]:
print(output)
print(type(output))

None
<class 'NoneType'>


In [140]:
# None is the python object to mean: nothing
output is None  # notice the "is" keyword for (identity) object comparison

True

In [141]:
def my_func(x):
    output = 2 * x
    output = output + 1
    return output


my_func(1)

3

In [142]:
# more concise


def my_func(x):
    return 2 * x + 1


my_func(1)

3

In [143]:
# a function transforms an input and returns an output
intput = 1
output = my_func(intput)
output

3

In [144]:
def dummy_function(a):
    print(f"{a} squared is {a ** 2}")

In [145]:
dummy_function(5)

5 squared is 25


In [146]:
result = dummy_function(5)
print(result)  # returns None when return is not specified in the function

5 squared is 25
None


## Several inputs (arguments) or several outputs

In [147]:
# several outputs


def my_func(x):
    y = x + 1
    z = x + 2
    return y, z


y, z = my_func(1)
print(y, z)

2 3


In [148]:
# can return choose to return a dictionary with the outputs


def my_func(x):
    d = dict()
    d["first_out"] = x + 1
    d["second_out"] = x + 2
    return d


d = my_func(2)
print(d)

{'first_out': 3, 'second_out': 4}


In [149]:
# several inputs


def my_func(first_arg, second_arg):
    print(f"{first_arg = }\n{second_arg = }")


my_func(1, "CS")

first_arg = 1
second_arg = 'CS'


## About the arguments of a function

In [150]:
def my_func(first_arg, second_arg):
    print(f"{first_arg = }\n{second_arg = }")

In [151]:
my_func(1, "CS")

first_arg = 1
second_arg = 'CS'


In [152]:
my_func(second_arg=1, first_arg="CS")

first_arg = 'CS'
second_arg = 1


In [153]:
my_func(second_arg=2)

TypeError: my_func() missing 1 required positional argument: 'first_arg'

In [154]:
def my_func_optional(first_arg, second_arg="defaut value"):
    print(f"{first_arg = }\n{second_arg = }")

In [155]:
my_func_optional(first_arg=1)

first_arg = 1
second_arg = 'defaut value'


In [156]:
my_func_optional(1, "something else")

first_arg = 1
second_arg = 'something else'


# Imports

In [157]:
math.pi

3.141592653589793

`math.cos` means using the `cos` function of the `math` library:

In [158]:
math.cos(math.pi)

-1.0

Importing functions from libraries allows to reuse existing stuff. Do not re-invent the wheel!

In [159]:
math.sqrt(math.cos(math.pi / 2) + math.sqrt(5))  # can be long

1.4953487812212205

In [160]:
sqrt(pi)  # instead of math.sqrt(math.pi))

1.7724538509055159

In [161]:
numpy.linalg.norm(numpy.array([0, 1, 2]))

np.float64(2.23606797749979)

Respect conventions do *not* do:
```python
import numpy as yolo
import pandas as np
```

In [162]:
# more concise
np.linalg.norm(np.array([0, 1, 2]))

np.float64(2.23606797749979)

In [163]:
archived_dir = dir()

avoid the following, it will important everything in the `math` module, you won't know where things come from, will saturate your auto-complete

In [165]:
cos(3.14)

-0.9999987317275395

In [166]:
cos  # did not need to import it nominally

<function math.cos(x, /)>

In [168]:
dir()

['False_',
 'In',
 'M',
 'MyClass',
 'N',
 'Out',
 'ScalarType',
 'True_',
 '_',
 '_100',
 '_101',
 '_102',
 '_103',
 '_104',
 '_108',
 '_110',
 '_111',
 '_112',
 '_113',
 '_114',
 '_115',
 '_116',
 '_117',
 '_118',
 '_119',
 '_123',
 '_124',
 '_126',
 '_127',
 '_128',
 '_131',
 '_132',
 '_133',
 '_134',
 '_135',
 '_140',
 '_141',
 '_142',
 '_143',
 '_157',
 '_158',
 '_159',
 '_160',
 '_161',
 '_162',
 '_165',
 '_166',
 '_20',
 '_21',
 '_27',
 '_34',
 '_40',
 '_41',
 '_42',
 '_44',
 '_45',
 '_46',
 '_47',
 '_53',
 '_58',
 '_59',
 '_6',
 '_60',
 '_62',
 '_65',
 '_66',
 '_67',
 '_68',
 '_7',
 '_70',
 '_72',
 '_73',
 '_75',
 '_76',
 '_77',
 '_81',
 '_82',
 '_83',
 '_84',
 '_85',
 '_91',
 '_92',
 '_93',
 '_94',
 '_95',
 '_96',
 '_98',
 '_99',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '__version__',
 '__vsc_ipynb_file__',
 '_dh',
 '_exit_code',
 '_get_promotion_state',
 '_i',
 '_i1',
 '_i10',
 '_i100',
 '_i101',
 '_

In [169]:
len(dir())  # many things

827

In [170]:
len(archived_dir())  # more reasonable before importing `*`

TypeError: 'list' object is not callable

In [172]:
StandardScaler()  # this works because of the first import
preprocessing.StandardScaler()  # this works because of the second import

Beware:
```python
import sklearn 
sklearn.show_versions()
```
but:
```python
!pip install scikit-learn
!pip show scikit-learn
```