## [Modules](https://docs.python.org/3/tutorial/modules.html) and [packages](https://docs.python.org/3/tutorial/modules.html#packages)

**Module**: A file with extension .py.
- It contains definitions and statements.
- If the module is implemented in the file `xyz.py`, then the module can be referenced as `xyz`.
- Modules can be imported from other Python programs.

**Package**: Collection of modules.
- A package can contain subpackages/submodules. The hierarchy is defined by the directory structure of the package.
- Standard packages and modules form the standard library and do not require installation.
- The official repository of external packages is PyPI (https://pypi.python.org/pypi).

In [1]:
# Importing a module/package.
import random

In [2]:
random.randint(1, 100)

26

In [3]:
# Importing a function from a module/package.
from random import randint

In [4]:
randint(1, 6)

5

In [5]:
# Import the full content of a module/package.
# (WARNING: This practice should be avoided in most cases!)
from random import *

In [7]:
normalvariate(0, 1)

-1.2620370117894801

In [9]:
# Importing a function from a submodule/subpackage.
from os.path import dirname
dirname('/tmp/a/b/c.txt')

'/tmp/a/b'

In [11]:
# Importing a module/package using a shorter name.
from os.path import dirname as dn
dn('/tmp/a/b/c.txt')

'/tmp/a/b'

## [NumPy](http://www.numpy.org/)

NumPy is a low level, mathematical package for numerical computations.

- Its fundamental data structure is the [n-dimensional array](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html).
- It was written in C. The usual array operations are implemented efficiently.
- Among others, it contains submodules for linear algebra and random number generation.
- Several higher level packages (e.g. scipy, matplotlib, pandas, scikit-learn) are based on it.

NumPy is an external package, there are multiple methods to install it, for example:
- `pip install numpy --user`
- `sudo apt install python3-numpy`
- `conda install numpy`

In [4]:
# Importing NumPy under the name np.
import numpy as np

In [13]:
# Querying the version number.
np.__version__

'1.23.5'

#### Creating arrays

In [14]:
# Create a 1-dimensional array of integers.
a = np.array([2, 3, 4])

In [15]:
a

array([2, 3, 4])

In [16]:
# Type of the array object.
type(a)

numpy.ndarray

In [17]:
# Number of dimensions.
a.ndim

1

In [18]:
# Size of dimensions.
a.shape

(3,)

In [19]:
# The data type of the array elements.
# Arrays in are homogenous in NumPy (except the object array).
a.dtype

dtype('int64')

In [21]:
# Create a 2-dimensional array of floats.
b = np.array([[2.0, 3.0, 4.0], [5.0, 6.0, 7.0]])
b

array([[2., 3., 4.],
       [5., 6., 7.]])

In [22]:
# Number and size of dimensions, data type.
b.ndim

2

In [23]:
b.shape

(2, 3)

In [24]:
b.dtype

dtype('float64')

In [25]:
# Specifying the data type of elements, example 1.
np.array([2, 3, 4], dtype='uint8')

array([2, 3, 4], dtype=uint8)

In [26]:
# Specifying the data type of elements, example 2.
np.array([2, 3, 4], dtype='float32')

array([2., 3., 4.], dtype=float32)

In [29]:
# Loading an array from text file.
np.loadtxt('matrix.txt', dtype='int8')

array([[0, 1, 1, 0, 1, 0, 1, 1, 0, 1],
       [0, 0, 1, 0, 1, 1, 0, 1, 0, 1],
       [0, 0, 1, 0, 0, 0, 1, 1, 0, 0],
       [0, 1, 0, 0, 1, 0, 1, 1, 0, 0],
       [1, 0, 1, 1, 0, 0, 1, 0, 1, 1],
       [1, 0, 1, 0, 0, 1, 1, 0, 1, 0],
       [1, 1, 1, 0, 1, 1, 1, 0, 1, 1],
       [0, 0, 0, 0, 0, 1, 0, 1, 0, 1],
       [1, 1, 0, 1, 0, 1, 1, 1, 0, 0],
       [1, 0, 1, 0, 1, 0, 0, 1, 0, 1]], dtype=int8)

In [30]:
# Create an array of zeros, example 1.
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [31]:
# Create an array of zeros, example 2.
np.zeros((5, 5), dtype='int32')

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]], dtype=int32)

In [32]:
# Create an array of ones, example 1.
np.ones((2, 7))

array([[1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1.]])

In [33]:
# Create an array of ones, example 2.
np.ones((2, 2, 2))

array([[[1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.]]])

In [34]:
np.ones((2, 2, 2, 2))

array([[[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]],


       [[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]]])

In [35]:
# Create an identity matrix.
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [36]:
# Create a range by specifying the step size.
np.arange(0, 10, 0.5)

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
       6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])

In [37]:
type(np.arange(0, 10, 0.5))

numpy.ndarray

In [39]:
# Create a range by specifying the number of elements.
np.linspace(-2, 2, 13)

array([-2.        , -1.66666667, -1.33333333, -1.        , -0.66666667,
       -0.33333333,  0.        ,  0.33333333,  0.66666667,  1.        ,
        1.33333333,  1.66666667,  2.        ])

In [42]:
# Concatenating vectors.
a = np.array([2, 3, 4])
b = np.array([10, 11])
np.concatenate([a, b])

array([ 2,  3,  4, 10, 11])

In [44]:
# Stacking matrices horizontally.
a = np.array([
    [2, 3, 4],
    [5, 6, 7]
])
np.hstack([a, a])

array([[2, 3, 4, 2, 3, 4],
       [5, 6, 7, 5, 6, 7]])

In [45]:
# Stacking matrices vertically.
np.vstack([a, a, a])

array([[2, 3, 4],
       [5, 6, 7],
       [2, 3, 4],
       [5, 6, 7],
       [2, 3, 4],
       [5, 6, 7]])

#### Elements ans subarrays

In [46]:
# Let's create an example matrix!
a = np.array([
    [2, 3, 4],
    [5, 6, 7]
])

In [47]:
# Select an element (indexing starts from 0)
a[1, 2] # row 1, column 2

7

In [48]:
a[(1, 2)]

7

In [50]:
# ...under the hood, the following happens:
np.ndarray.__getitem__(a, (1, 2))

7

In [52]:
# Selecting a full row.
a[0, :]

array([2, 3, 4])

In [53]:
# We could also do it using a simple index.
a[0]

array([2, 3, 4])

In [54]:
# The selected row is a 1-dimensional array.
a[0].shape

(3,)

In [56]:
# Selecting a column.
a[:, 1]

array([3, 6])

In [58]:
# Selecting a subarray.
a[:, :2]

array([[2, 3],
       [5, 6]])

In [59]:
# Selecting columns with the given indices.
a[:, [0, 2]]

array([[2, 4],
       [5, 7]])

In [62]:
# Selecting elements based on a logical condition.
a[a > 3]

array([4, 5, 6, 7])

In [65]:
# The elements of the array can be modified.
a[0, 0] = 100
a

array([[100,   3,   4],
       [  5,   6,   7]])

In [66]:
# Modifying a column.
a[:, 1] = 10, 20
a

array([[100,  10,   4],
       [  5,  20,   7]])

#### Array operations

In [67]:
# Let's create 2 example arrays of the same shape!
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[2, 2, 2], [3, 3, 3]])

In [69]:
a

array([[1, 2, 3],
       [4, 5, 6]])

In [70]:
b

array([[2, 2, 2],
       [3, 3, 3]])

In [71]:
# Elementwise addition.
a + b

array([[3, 4, 5],
       [7, 8, 9]])

In [72]:
# Elementwise subtraction.
a - b

array([[-1,  0,  1],
       [ 1,  2,  3]])

In [73]:
# Elementwise multiplication.
a * b

array([[ 2,  4,  6],
       [12, 15, 18]])

In [74]:
# Elementwise division.
a / b

array([[0.5       , 1.        , 1.5       ],
       [1.33333333, 1.66666667, 2.        ]])

In [75]:
# Elementwise integer division.
a // b

array([[0, 1, 1],
       [1, 1, 2]])

In [76]:
# Elementwise exponentiation.
a**b

array([[  1,   4,   9],
       [ 64, 125, 216]])

In [77]:
# The operation is not necessarily possible.
np.array([2, 3, 4]) + np.array([4, 5])

ValueError: operands could not be broadcast together with shapes (3,) (2,) 

In [78]:
# Display again the array "a"!
a

array([[1, 2, 3],
       [4, 5, 6]])

In [79]:
# Elementwise functions (exp, log, sin, cos, ...).
np.exp(a)

array([[  2.71828183,   7.3890561 ,  20.08553692],
       [ 54.59815003, 148.4131591 , 403.42879349]])

In [80]:
# Statistical operations (min, max, sum, mean, std).
a.sum()

21

In [82]:
# Columnwise statistics.
# We aggregate along the 0-th dimension (rows), therefore the 0-th dimension will disappear.
a.sum(axis=0)

array([5, 7, 9])

In [84]:
# Rowwise statistics.
# We aggregate along the 1-st dimension (columns), therefore the 1-st dimension will disappear.
a.mean(axis=1)

array([2., 5.])

In [98]:
# Exercise: Create a 3×3 NumPy array of logical True values.
np.ones((3, 3), dtype='bool')

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [89]:
# solution 2
np.array([[True] * 3] * 3)

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [99]:
# Exercise: Print the odd values in the following 2-dimensional NumPy array!
b = np.array([[2, 3, 4],
              [5, 6, 7]])
b[b % 2 == 1]

array([3, 5, 7])

In [101]:
# Exercise: Print the values greater than 3 in the following 2-dimensional NumPy array!
b = np.array([[2, 3, 4],
              [5, 6, 7]])
b[b > 3]

array([4, 5, 6, 7])

In [104]:
# Exercise: Print the values greater than the average value in the following NumPy array!
b = np.array([1, 10, 2, 8, 3, 5, 9])
b[b > b.mean()]

array([10,  8,  9])

In [106]:
# Type conversion.
a = np.array([2, 3, 4])
b = a.astype('uint8')
a, b

(array([2, 3, 4]), array([2, 3, 4], dtype=uint8))

In [107]:
# Matrix transpose.
a = np.array([[2, 3, 4], [5, 6, 7]])
a

array([[2, 3, 4],
       [5, 6, 7]])

In [108]:
a.T

array([[2, 5],
       [3, 6],
       [4, 7]])

In [109]:
# Transposition does not copy. It only creates a new view on the original data.
b = a.T
b[0, 1] = 100
a, b

(array([[  2,   3,   4],
        [100,   6,   7]]),
 array([[  2, 100],
        [  3,   6],
        [  4,   7]]))

In [110]:
a = np.array([[2, 3, 4], [5, 6, 7]])
b = a.T.copy() # if I want b to be independent of a, I make a copy
b[0, 1] = 100
a, b

(array([[2, 3, 4],
        [5, 6, 7]]),
 array([[  2, 100],
        [  3,   6],
        [  4,   7]]))

In [112]:
# Create an example array of size 12!
a = np.arange(12)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [113]:
# Reshaping to size 2 × 6.
a.reshape((2, 6))

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

In [114]:
# It is enough to specify only one dimension's size, we can write (-1) for the other.
a.reshape((2, -1))

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

In [115]:
# Reshaping to size 4 × 3.
a.reshape((-1, 3))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [116]:
# If the total element count cannot be 12, we get an error.
a.reshape((5, -1))

ValueError: cannot reshape array of size 12 into shape (5,newaxis)

In [123]:
# Reshaping to size 2 × 2 × 3.
b = a.reshape((2, 2, -1))
b

array([[[100,   1,   2],
        [  3,   4,   5]],

       [[  6,   7,   8],
        [  9,  10,  11]]])

In [119]:
# The reshaped array is independent from the original.
b[0, 0, 0] = 100
a, b

(array([100,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11]),
 array([[[100,   1,   2],
         [  3,   4,   5]],
 
        [[  6,   7,   8],
         [  9,  10,  11]]]))

In [124]:
# Assignment does not copy in NumPy.
a = np.array([2, 3, 4])
b = a
b[0] = 100
a, b

(array([100,   3,   4]), array([100,   3,   4]))

In [125]:
# We can copy using the copy() method.
a = np.array([2, 3, 4])
b = a.copy()
b[0] = 100
a, b

(array([2, 3, 4]), array([100,   3,   4]))

In [26]:
# Searching.
# Example: What are the indices of the elements less than 5?
a = np.array([3, 10, 11, 4, 7, 8])
idxs = np.where(a < 5)[0]
idxs

array([0, 3])

In [132]:
a[idxs]

array([3, 4])

In [135]:
# Sorting in place.
a = np.array([3, 10, 11, 4, 7, 8])
a.sort()
a

array([ 3,  4,  7,  8, 10, 11])

In [136]:
# Sorting into a new array.
a = np.array([3, 10, 11, 4, 7, 8])
np.sort(a)

array([ 3,  4,  7,  8, 10, 11])

In [139]:
# Sorting into descending order.
np.sort(a)[::-1]

array([11, 10,  8,  7,  4,  3])

In [143]:
# The indices of the sorted elements in the original array.
a = np.array([3, 10, 11, 4, 7, 8])
idxs = a.argsort()
idxs

array([0, 3, 4, 5, 1, 2])

In [144]:
# Sorting the original array using the index array.
a[idxs]

array([ 3,  4,  7,  8, 10, 11])

In [145]:
# If only the extreme elements are needed, do not use sort()!
# Use min() or max() instead!
a.max()

11

In [146]:
np.sort(a)[-1]

11

In [148]:
# There is also an argmin() and argmax() function.
a.argmax() # index of largest element

2

In [151]:
# The sorting operations can be used in a rowwise or columnwise fashion.
b = np.array([[4, 8], [1, 2], [7, 3]])
np.sort(b, axis=0)

array([[1, 2],
       [4, 3],
       [7, 8]])

In [152]:
# Scalar product of two vectors.
a = np.array([2, 3, 4])
b = np.array([0, 1, 2])
a @ b

11

In [154]:
# Matrix multiplication.
a = np.array([[2, 3, 4], [5, 6, 7]])
b = np.array([0, 1, 2])
a @ b

array([11, 20])

```
        0
        1
        2
2 3 4  11
5 6 7  20
```

#### [Broadcasting](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
- Broadcasting is a mechanism to handle operands with different shape.
- Example: 
```
A (4d array):      8 x 1 x 6 x 5
B (3d array):          7 x 1 x 5
Result (4d array): 8 x 7 x 6 x 5
```

In [155]:
# Multiplying a vector by a scalar.
a = np.array([2, 3, 4]) # 3
b = 10                  # -
a * b

array([20, 30, 40])

In [157]:
a + b

array([12, 13, 14])

In [158]:
# Example for non-broadcastable arrays.
a = np.array([2, 3, 4]) # 3
b = np.array([10, 20])  # 2
a * b

ValueError: operands could not be broadcast together with shapes (3,) (2,) 

In [5]:
# Multiplying a matrix by a vector. 
a = np.array([
    [2, 3, 4],
    [5, 6, 7]
])
b = np.array([1, 2, 3])

# a: 2 3
# b: - 3
# NumPy automatically broadcasts the 1D array b across each row of a. Here’s how it applies the element-wise multiplication:
a * b

array([[ 2,  6, 12],
       [ 5, 12, 21]])

In [6]:
a - b

array([[1, 1, 1],
       [4, 4, 4]])

In [164]:
# Rowwise multiplication.
a = np.array([
    [2, 3, 4],
    [5, 6, 7]
])
b = np.array([1, 2])
(a.T * b).T

array([[ 2,  3,  4],
       [10, 12, 14]])

## Exercise: Univariate linear regression

The text file [baseball.txt](baseball.txt) contains data about the height and weight of professional baseball players. Write a program that gives a linear model for predicting the weight from the height! Subtasks:

- Determine the model parameter that gives the lowest RMSE (root mean squared error)!
- Compute the model's RMSE and MAE (mean absolute error) on the training data set!

<img src="../_img/linreg_1d.jpg" width="600"></img>

In [180]:
# load data
A = np.loadtxt('baseball.txt', delimiter=',')
x = A[:, 0] # input vector (height)
y = A[:, 1] # target vector (weight)

In [181]:
x

array([188., 188., 183., ..., 190., 190., 185.])

In [182]:
y

array([82., 98., 95., ..., 93., 86., 88.])

In [183]:
print(x.dtype, y.dtype)

float64 float64


In [184]:
# subtract mean from x and y
xm = x.mean()
ym = y.mean()
x -= xm
y -= ym

In [185]:
x

array([ 0.92836399,  0.92836399, -4.07163601, ...,  2.92836399,
        2.92836399, -2.07163601])

In [186]:
y

array([-9.50338819,  6.49661181,  3.49661181, ...,  1.49661181,
       -5.50338819, -3.50338819])

In [191]:
# compute optimal model parameter
w = (x @ y) / (x @ x)
w

0.8561919786085516

In [195]:
# predict the weight of a player who is 175 cm tall
(175 - xm) * w + ym

81.16775026791032

In [203]:
# root mean squared error (RMSE) of the model
yhat = x * w
((yhat - y)**2).mean()**0.5

8.071205900903676

In [209]:
# mean absolute error (MAE) of the model
(np.abs(yhat - y)).mean()

6.398400208620017

## Even Fibonacci Numbers

<p>Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with $1$ and $2$, the first $10$ terms will be:
$$1, 2, 3, 5, 8, 13, 21, 34, 55, 89, \dots$$</p>
<p>By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.</p>

In [31]:
# 1st solution
s = 0
a, b = 1, 2
while a <= 4_000_000:
    if a % 2 == 0:
        s += a
    a, b = b, a + b # move forward in the sequence
s

4613732

In [30]:
# 2nd solution
s = 0
f = [1, 2]
while f[-2] <= 4_000_000:
    if f[-2] % 2 == 0:
        s += f[-2]
    f.append(f[-2] + f[-1])
s

4613732