# Laboratory 1

Hello and welcome to your first laboratory!

This is an introductory session to Python and a few libraries that we'll frequently use in this course (numpy, matplotlib, opencv, pytorch).

After completing this session, you will:
* gain some basic image manipulation skills
* be able to vectorize code (avoid using for loops)
* get familiar with the concept of broadcasting in numpy

For some of the exercises you'll have the expected output displayed just below the cell, so that you can check if your output is correct.


In [None]:
# library imports
import cv2
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
#The line above is necesary to show Matplotlib's plots inside a Jupyter Notebook

In [None]:
!wget "http://drive.google.com/uc?export=download&id=1fRvlErNtIV-HX9Vat0I6FINBXEafeDC7" -O dance_moves.png
!wget "https://docs.google.com/uc?export=download&id=1zjltpYscUqnDSP6eUlU-gecadGXvQtTz" -O cute_cat.jpg
!wget "http://drive.google.com/uc?export=download&id=1y46KaIsyhgh030Zi9eoAYfO_ezkE4CY3" -O axes.jpg
!wget "http://drive.google.com/uc?export=download&id=11Jzu1t1RVXMWxp0OK3KJaUgKv9exqv2O" -O sum0.jpg
!wget "http://drive.google.com/uc?export=download&id=1LUYh0HtP6Vd2eq7rPXOhXjBKXlvU--L5" -O sum1.jpg
!wget "http://drive.google.com/uc?export=download&id=1q_BsBdLZXxA2fkrY1WXWn8B5RGOFBLUd" -O concat0.jpg
!wget "http://drive.google.com/uc?export=download&id=1491c1NZQMOnHlvVp6eiiIl2l1wYO6o56" -O concat1.jpg

# *numpy*

We'll frequently use the *numpy* library for this lecture; *numpy* is perhaps the most popular library used for scientific computing in Python. The libray works with multidimensional arrays, and provides several operations to efficiently manipulate these arrays.

An array contains elements of the same type, arranged in a grid of values. An array can be accessed by a tuple of non-negative integers, by booleans, by another array, or by integers, as we'll see later in this laboratory.

An array is described by its:
- rank - the number of dimensions of the array
- shape - a tuple that specifies the size of the array along each dimension
- type - the library provides several numeric datatypes (uint8, float32, int32 etc.)

There are several ways you can create an array in numpy:

In [None]:
# create an array with rank 1
a = np.array([1, 2, 3])
print('a is a numpy array ', type(a))
print('the shape of a is ', a.shape, ' and its rank is ', len(a.shape))
print('the type of the elements stored in a is ', a.dtype)
print('---')
b = np.array([[1.0, 2, 3],[4, 5, 6]])
print('the shape of b is ', b.shape, ' and its rank is ', len(b.shape))
print('the type of the elements stored in b is ', b.dtype)
print('---')
# numpy automatically determines the type of the elements that will be stored in the array
# but you can also specify the type in the constructor
c = np.array([[[0]]], dtype=np.uint8)
print('the shape of c is ', c.shape, ' and its rank is ', len(c.shape))
print('the type of the elements stored in c is ', c.dtype)

In [None]:
# there are also other array constructors that you might find useful
a = np.zeros(shape=(1, 2)) # creates an array filled with 0s of shape (1, 2) - 1 row, two columns
print('zeros array {} of shape {}'.format(a, a.shape))

b = np.ones((224, 224, 3), dtype=np.uint8) # creates an array filled with ones of shape (224, 224, 3) and type uint8

c = np.full((4, 4, 3), 255) # creates an array filled with 255 of shape (4, 4, 3)

d = np.eye(3) # creates an identity matrix of size (3x3)
print('identity matrix of size 3x3 ', d)

r = np.random.random(10) # creates an array of 10 elements, filled with random values
print('array of 10 random elements ', r)

zl = np.zeros_like(r) # create a new array with the same shape as r, but filled with 0 values

## Indexing

In [None]:
a = np.array([1, 2, 3])
print('a is ', a)
a[2] = 4
print('Set the 2nd element to 4')
print('Modified a is:')
print(a)

print('---')
b = np.eye(3)
b[1, 1] = 2
print('Modified b is')
print(b)

# you can use normal integer indexing
print('The first row of b is: ', b[0])
print('The second element from the second row of b is ', b[1, 1])

Similar to python lists, *numpy* allows you to <i>slice</i> the array; this is just a  flexible way to access subarrays.

In [None]:
# slicing
a = np.arange(1, 10, 1) # returns an array with evenly spaced values in the interval [1, 10), with a step of 1
[1, 2, 3, 4, ..., 9]
print('a is \n', a)
print('a[3:6] is \n', a[3:6])   # get a slice from index 3 to 6 (exclusive)
print('a[3:] is \n', a[3:])    # get a slice from index 3 to the end
print('a[:3] is \n', a[:3])    # get a slice from the start to index 3 (exclusive)
print('a[4:-1] is \n', a[4:-1])  # get a slice from index 4 start to the last element of the array (exclusive)

In [None]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print('a: ', a)

## ATTENTION! a slice is just a view on an array, so it points to the same data
# modifying it, it will modify the original array
r = a[:1,:]
print('row1:',  r)
print('a[0][0]: ', a[0, 0])
print('r[0][0]: ', r[0, 0])
print('r[0, 0] = 100')
r[0, 0] = 100
print('row1:',  r)
print('a[0][0]: ', a[0, 0])
print('r[0][0]: ', r[0, 0])

*numpy* also support integer array indexing.

**Attention**, there is a slight (and important) difference when using integer array indexing: when using slicing, the result will also be a subarray of the existing array (a view on the existing array), while integer array indexing allows you to create new arrays based on the the data in the original array.

In [None]:
a = np.array(np.arange(0, 6, 1))
indices = [0, 2, 4]
b = a[indices] # this will get the elements from the indices 0, 2, 4 from the array a
print('original array: \n', a)
print('The elements in a from the indices', indices, 'are: \n', b)

In [None]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
indices = [[0, 1, 2], [2, 1, 0]] # this will get the elements from the indices (0, 2), (1, 1) and (2, 0) from the array a
b = a[tuple(indices)]  # b will be [3, 6, 9]
print('original array: \n', a)
print('The elements in the array at indices', indices, 'are: \n', b)

print('---')
# modifying this array, won't modify the original array
print(b)
b[0] = 100
print('b[0] = 100')
print('a = ', a)
print('b = ', b)

More indexing examples:


In [None]:
print('More array indexing examples: ')
# with array indexing you can reuse the same index from the original array
b = a[[0, 0], [1, 1]]
print('b = a[[0, 0], [1, 1]] =', b)
#  equivalent to
b = [a[0, 1], a[0, 1]]
print('b = [a[0, 1], a[0, 1]] =', b)

print('---')
print('Using array indexing to modify an element from each row in a matrix: ')
# modifing an element from each row in a matrix
ind = np.array([1, 0, 1])
a[np.arange(3), ind] = -100
print(a)

Array indexing can be mixed with slicing. When using slicing the resulting array will have the same rank as the original array, while when using array indexing you will get an array with a lower rank than the original array.

In [None]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
last_row_indexing = a[-1, :]
last_row_slicing = a[2:3, :]
print('Last row obtained with array indexing:\n',
      last_row_indexing, 'has shape:', last_row_indexing.shape)
print('Last row obtained with array slicing:\n',
      last_row_slicing, 'has shape:', last_row_slicing.shape)

*numpy* also allows you to use boolean array indexing, in which an array of booleans is used as a mask to select arbitrary elements in the array.
impo


In [None]:
a = np.array([11, 12, 13, 14])
indices = (a > 12)
# a > 12 returns an array of boolean of the same size as a;
# an element in this array is True is the element stored in the same position in a is larger than 12

print('a>12:\n', indices)
print('The numbers larger than 12 in a are:\n', a[indices])

## array operations
*numpy* provides functions and operator overload for various arithmetical operations on the arrays, such addition, subtraction, multiplication, dot products etc.


*Attention!!* **np.multiply** performs elementwise multiplication. If you want to perform matrix multiplication, you should use the **np.dot** function *!!*

In [None]:
a = np.array([[1, 2],
             [3, 4]])
b = np.array([[11, 12],
             [13, 14]])

print('a is: ', a)
print('b is: ', b)
# elementwise operations
print('a+b is: \n', a + b)
print('a+b is: \n', np.add(a, b))

print('a-b is: \n', a - b)
print('a-b is: \n', np.subtract(a, b))

print('The maximum element in a is :', np.amax(a))
print('The position of this element in a is :', np.argmax(a)) # by default it returns the max in the flattened array

print('|a-b| is: \n', np.abs(a - b))

print('a*b (element wise) is: \n', a*b)
print('a*b (element wise) is: \n', np.multiply(a, b))

v = np.array([10, 20])
w = np.array([11, 11])
print('Dot product v x w is:\n', v.dot(w))
print('Dot product v x w is:\n', np.dot(v, w))

print('Dot product a x v (matrix x vector) is:\n', a.dot(v))

print('Dot product a x b (matrix x matrix) is:\n', a.dot(b))

## numpy axes

Another concept that is perhaps confusing for beginners in numpy is the concept of axes. As you'll see several mathematical functions (**np.sum**, **np.mean**, **np.min** etc.) require you to specify the axis along the operation should be applied.

Just as the cartesian coordinate system, numpy arrays have axes. For example, for a 2D array, the rows are the first axis (0 axis), and the columns are the second axis (axis 1).



In [None]:
img_axes = cv2.imread('axes.jpg')
dpi = plt.rcParams['figure.dpi']

height, width, depth = img_axes.shape
figsize = width / float(dpi), height / float(dpi)
plt.figure(figsize=figsize)
plt.imshow(img_axes)

Is is important to understand, for each operation, what the axis element controls.

For the common matemathical operations, which for example aggregate your data, the axis parameter controls which axis will be collapsed.
So, for example, if you have an array a, and you perform the operation np.sum(a, axis = 0), the rows will be collapsed and this will sum down the columns. (It will not sum the rows).


In [None]:
img_sum0 = cv2.imread('sum0.jpg')
dpi = plt.rcParams['figure.dpi']

height, width, depth = img_sum0.shape
figsize = width / float(dpi), height / float(dpi)
plt.figure(figsize=figsize)
plt.imshow(img_sum0)

a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
sum = np.sum(a, axis=0)
print(sum, sum.shape)

Similarly, if you have an array a, and you perform the operation np.sum(a, axis = 1), the columns will be collapsed and this will sum down the rows. (It will not sum the columns).


In [None]:
img_sum1 = cv2.imread('sum1.jpg')
dpi = plt.rcParams['figure.dpi']

height, width, depth = img_sum1.shape
figsize = width / float(dpi), height / float(dpi)
plt.figure(figsize=figsize)
plt.imshow(img_sum1)


a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
sum = np.sum(a, axis=1)
print(sum, sum.shape)

Another example, for the concatentaion operation, the axis operator specifies the axis along which to stack the arrays.

If we specify the axis = 0 for concatenation, the arrays will be stacked along the rows (they will be concatenated vertically).


In [None]:
img_concat0 = cv2.imread('concat0.jpg')
dpi = plt.rcParams['figure.dpi']

height, width, depth = img_concat0.shape
figsize = width / float(dpi), height / float(dpi)
plt.figure(figsize=figsize)
plt.imshow(img_concat0)

If we specify the axis = 1 for concatenation, the arrays will be stacked along the columns (they will be concatenated horizontally).



In [None]:
img_concat1 = cv2.imread('concat1.jpg')
dpi = plt.rcParams['figure.dpi']

height, width, depth = img_concat1.shape
figsize = width / float(dpi), height / float(dpi)
plt.figure(figsize=figsize)
plt.imshow(img_concat1)

In [None]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print('The sum of each column is: \n ', np.sum(a, axis = 0))

print('The sum of each row is: \n ', np.sum(a, axis = 1))

b = np.array([[1, 1, 1, 1], [2, 2, 2, 2]])

print('Concatenate vertically: \n', np.concatenate([a, b], axis = 0))

print('Concatenate horizontally: \n', np.concatenate([a, b], axis = 1))

In [None]:
# example with 1D arrays
a = np.array([0, 0, 0])
b = np.array([1, 1, 1])

print(np.concatenate([a, b], axis=0)) # attention! 1D arrays have only one axis

a = a.reshape((1, 3))
b = b[np.newaxis, :]

print(np.concatenate([a, b], axis=0))

# Good practice: reshape your arrays to (1, dim): a = a.reshape((1, 3))

## Vectorization

"<i>Vectorization is the art of getting rid of for loops in your code.</i>" (Andrew Ng)

*numpy* provides a series of functions that allow the programmer to perform mathematical computations on the elements of the array without having to explicitly loop over the array elements; these functions are much more efficient as python delegates these tasks to compiled and optimized C code.

A formal definiton of vectorization is:
"In the context of high-level languages like Python, Matlab, and R, the term vectorization describes the use of optimized, pre-compiled code written in a low-level language (e.g. C) to perform mathematical operations over a sequence of data. This is done in place of an explicit iteration written in the native language code." (check this tutorial for details: https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html)

Using for loops to access array elements (when dealing with large data) is highly inefficient, as demonstrated in the examples below.
Therefore, especially for this course, when we'll deal with a lot of training data and large neural network architectures, you should always use vectorization when writing your code. Otherwise, it will take a very very very :) long time to get your model to perform a single iteration over your training data.

In [None]:
import numpy as np
import time
a1 = np.random.rand(1000000)
a2 = np.random.rand(1000000)

t1 = time.time()
dp_vectorized = a1.dot(a2)
time_vectorized = time.time() - t1

t1 = time.time()
dp_loops = 0
for i in range(0, a1.shape[0]):
    dp_loops += a1[i]*a2[i]
time_loops = time.time() - t1

print(dp_vectorized)
print(dp_loops)

print('Time to compute dot product using loops: ', time_loops, 'milliseconds')
print('Time to compute dot product using vectorization: ', time_vectorized, 'milliseconds')
print('Speedup ', time_loops/time_vectorized)


arr_size = []
arr_time_vectorized = []
arr_time_loops = []



for sz in range(100, 1000000, 10000):
    t1 = time.time()
    a1 = np.random.rand(sz)
    a2 = np.random.rand(sz)
    dp_vectorized = a1.dot(a2)
    time_vectorized = time.time() - t1

    t1 = time.time()
    dp_loops = 0
    for i in range(0, a1.shape[0]):
        dp_loops += a1[i]*a2[i]
    time_loops = time.time() - t1
    arr_size.append(sz)

    arr_time_vectorized.append(time_vectorized)
    arr_time_loops.append(time_loops)


plt.plot(arr_size, arr_time_vectorized, label='vectorized')
plt.plot(arr_size, arr_time_loops, label='using loops')
plt.legend()
plt.xlabel('array size')
plt.ylabel('execution time (ms)')
plt.show()

## Broadcasting

Broadcasting is a *numpy* features that allows us to perfom operations on arrays with different shapes; frequently we may need to work with arrays with different size and apply some operations on these arrays. With broadcasting, if the arrays don't have the same size, the smaller size array is "broadcast" to match the shape of the larger array. This also helps with vectorizing array operations.

To be able to broadcast, **the size of the arrays in the trailing axes must be the same, or one of these dimensions must be 1**.
If the arrays don't have the same rank, we add 1 dimensions to the left ,i.e. **prepend the shape property with ones, until the arrays have the same rank**.

Always, the result of broadcasting is the maximum size along each dimension from the input arrays.

You can check this tutorial for further information: http://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc

In [None]:
x1 = np.array([[1, 2, 3]]) # shape: (1, 3)
x2 = np.array([
    [1],
    [2],
    [3]]) # shape: (3, 1)

print(x1+x2)

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # shape: (3, 3)
b = np.array([0, 1, 2]) # (3,)

# add the two vectors using broadcasting
s = a + b
print(s)


# the code snippet above is equvalent to the code below (but without making unnecessary copies)
print(b.shape)
b_expanded = np.tile(b, (3, 1)) # this stacks 3 copies of b -> (3, 3)
print(b_expanded.shape)
print(b_expanded)
s = a + b_expanded

In [None]:
# add a vector to each complun of a matrix
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([0, 2])

# transpose the matrix a (a.T) such that is has shape (3, 2), the array b has shape (2,)
# they can be broadcasted together and then we can transpose the result
print('a.T=\n', a.T)
print('b.T=\n', b.T)
print('(a.T+b).T=\n',(a.T+b).T)

In [None]:
a = np.array([0, 1, 2, 3])
b = np.array([4, 5, 6])
print(a.shape)
print(b.shape)

try:
    print(a+b)
except ValueError:
    print('Unable to broadcast arrays with shapes ', a.shape, b.shape)

a = a.reshape((4, 1))
# or you might see this syntax: a = a[:, np.newaxis]
print(a.shape)
print(b.shape)
print(a+b)
print((a+b).shape)

## Simple *numpy* exercise


I am sure that you were already familiar with all the concepts presented above, but a short recap is always welcomed.


Now let's do a very simple exercise with *numpy* arrays.


In a (4, 7) *numpy* array we store data about the calorie intake and expenditure of a person throughout the week, as follows:
- The rows specify the number of calories from protein sources, the number of calories from fat sources, the number of calories from carbohydrates sources and the calorie expenditure through physical activity, respectively.
- The columns specify the day of the week for which the data were recorded.
   
Compute the following (without using any explicit for loop):
- the total number of calories consumed each day;
- the percentage of calories from protein sources and the percentage of calories from carbohydrates (as an array with 2 rows and 7 columns), rounded to two decimal places [numpy.around](https://numpy.org/doc/stable/reference/generated/numpy.around.html) );
- the day in which the maximum number of calories were burned through physical activity;
- the sum of calorie expenditure and calories from protein sources for each working day (the result with be an array with 5 elements);
- the number of calories from protein sources, number of calories from fat sources, number of calories from carbohydrate sources and the calorie expenditure for the week as a (1, 4) array.


In [None]:
import numpy as np
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
statistics = np.array([
                      [90, 100, 170, 200, 98, 167, 150], # calories from protein
                      [300, 200, 150, 174, 200, 270, 240], # calories from fat
                      [340, 255, 400, 500, 301, 200, 300],  # calories from carbs
                      [250, 90, 170, 200, 87, 160, 200]]) # calorie expenditure

## YOUR CODE HERE
total_calories = np.sum(statistics[:-1], axis=0)
print("a)", total_calories)
print("b)", np.around(statistics[[0, 2], :] / total_calories * 100, decimals=2))
print("c)", days[np.argmax(statistics[-1])])
print("d)", statistics[0][:-2] + statistics[-1][:-2])
print("e)", np.sum(statistics, axis=1)[np.newaxis, :])

**Expected output:**

The total number of calories consumed each day:
[730 555 720 874 599 637 690]

The percentage of calories from protein sources and the percentage of calories from carbohydrates:
[[12.33 18.02 23.61 22.88 16.36 26.22 21.74]
 [46.58 45.95 55.56 57.21 50.25 31.4  43.48]]

The day in which the maximum number of calories were burned through physical activity is :
Monday

The sum of calorie expenditure and calories from protein sources for each working day (the result with be an array with 5 elements):
[[340 190 340 400 185]]

The number of calories from protein sources, number of calories from fat sources, number of calories from carbohydrate sources and the calorie expenditure for the week:
[[ 975]
 [1534]
 [2296]
 [1157]]

 You can use [assert_equal](https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_equal.html) to check for your result.

# Plotting

During this course we'll frequently create plots to show the distribution of some data, to show the performance of the developed models etc. We'll use the *matplotlib* library for this.

Using this library is straightforward, and the function that we'll use the most is plot(). You can check more about this library in the documentation: https://matplotlib.org/3.3.1/contents.html .

For example, to display a sine wave we could do the following:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# compute the x range
x = np.arange(0, 5 * np.pi, 0.1)
y = np.sin(x)

plt.plot(x, y)

# set the title and the name of the x and y axes
plt.title('Sine function')
plt.ylabel('sine value')

# show the figure
plt.show()

We can plot different data in the same plot using subplot. Below is an example:

In [None]:
# compute the x range
x = np.arange(0, 2 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)

# subplot with 1 row and 2 columns
# the first subplot is the active one
plt.subplot(1, 2, 1)

# Make the first plot
plt.plot(x, y_sin)
plt.grid(True)
plt.title('Sine')

# activate the second plot
plt.subplot(1, 2, 2)
plt.plot(x, y_cos)
plt.title('Cosine')

# adjust the spacing between the plots
plt.subplots_adjust(wspace=0.5)
plt.show()

You might be familiar from the Artificial Intelligence class with some of the activation functions used in neural networks: ReLU, tahh, sigmoid and their friends. In the image below you have the common activation functions depicted as dance moves.




Pick your favourite three "dance moves" and plot them with matplotlib using subplots.

In [None]:
dance_moves_img = cv2.imread('dance_moves.png')
dpi = plt.rcParams['figure.dpi']

height, width, depth = dance_moves_img.shape
figsize = width / float(dpi), height / float(dpi)
plt.figure(figsize=figsize)
plt.imshow(dance_moves_img)

In [None]:
## YOUR CODE HERE
plt.subplot(1, 3, 1)
plt.imshow(dance_moves_img[:height // 3, :width // 4, :])

plt.subplot(1, 3, 2)
plt.imshow(dance_moves_img[height // 3:2 * height // 3, :width // 4, :])

plt.subplot(1, 3, 3)
plt.imshow(dance_moves_img[2 * height // 3:, 2 * width // 4:3 * width // 4, :])

plt.subplots_adjust(wspace=0.5)
plt.show()

# Image manipulation

Computer vision is about images (or image sequences), so you'll definetely need some image manipulation skills.
For now, we'll just need some functions to read and write images.

We'll use the *opencv* library to work images; opencv is an open-source, cross-platform computer vision library and it support a variety of programming languages (C++, Python, Java).

The python version of opencv is very simple and it allows you to express your brilliant ideas in fewer line of codes, while maintaining a hight readability of the code.

### Reading, writing and displaying an image

To read an image you'll use the <i>imread</i> function from the opencv library. To display an image you can use the <i>imshow</i> function from the matplolib library .
Pretty simple, isn't it?

In [None]:
img = cv2.imread('cute_cat.jpg') # BGR , RGB
# BGR -> RGB
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # opencv uses BGR channel ordering, while matplotlib uses RGB channel ordering

plt.imshow(img)

An image is esentially just a numpy array. The type of the elements stored in this array is np.uint8, so each element ranges from 0 (corresponding to the black color in grayscale images) and 255 (corresponding to the white color in grayscale images).

To get the size of the image, we can use the <i>size</i> attribute.

In [None]:
img_height, img_width, img_channels = img.shape[0], img.shape[1], img.shape[2]
print('The image resolution is ', img_width, 'x', img_height)
print('The number of channels is ', img_channels)

You can use the function imresize to change the resolution of an image.

In [None]:
# resize image to (120, 400) - breaks the aspect ratio
img_resize_fixed =  cv2.resize(img, (120, 400))
print('The shape of the resize image is:', img_resize_fixed.shape)
plt.imshow(img_resize_fixed)

In [None]:
# resize the image to w/4 x h/4 (keeps the aspect ratio)
img_resize_prop = cv2.resize(img, (0, 0), fx=0.25, fy=0.25)
print('The shape of the resize image is:', img_resize_prop.shape)
plt.imshow(img_resize_prop)

A color image consists of 3 image channels (the red, green and blue channels).

A grayscale image has a single channel. One way of converting a color image to grayscale is using the equation:

Gray = 0.2126 R + 0.7152 G + 0.0722 B

,where R, G and B are the red, green and blue channels of the input image.

In [None]:
img_gray = 0.2126*img_resize_prop[:,:,0] + 0.7152*img_resize_prop[:,:,1] + 0.0722*img_resize_prop[:,:,2]
img_gray = img_gray.astype(np.uint8)
plt.imshow(img_gray, cmap='gray', vmin=0, vmax=255) # use cmap='gray' (colormap) to display a grayscale image

A histogram is graphical representation of the grayscale values (or color tones in the input image). From a histogram we can determine statistical properties of the image, such as the average brightess and the contrast of the image.


In [None]:
hist, bins = np.histogram(img_gray, bins=255)# [0-255], 256

print('The histogram is:\n', hist)

plt.bar(np.arange(255), hist, color='cornflowerblue')
plt.title('histogram of the grayscale image')

Plot the histograms of the red, blue and green channel of an image on the same plot. The histogram of the red channel should be displayed in red bars, the histogram of the blue channel should be displayed in blue bars and the histogram of the green channel should be displayed with green bars.

In [None]:
## YOUR CODE HERE
values = np.arange(255)
channels = ['red', 'green', 'blue']
for i in range(3):
  hist, _ = np.histogram(img[:, :, i], bins=255)
  plt.bar(values, hist, color=channels[i], label=channels[i])
plt.legend()
plt.show()

Add to each element in the grayscale image with a positive number and store the result in img_l1.
What do you think is the effect of this operation? Display the image img_l1. Make sure that the result is in the range [0, 255].

In [None]:
# YOUR CODE THERE
img_l1 = np.clip(np.array(img_gray, dtype=np.uint16) + 50, 0, 255)
plt.imshow(img_l1, cmap='gray')

Compute and display the histogram of img_l1.  What do you notice? How is this histogram different than the previous one?

In [None]:
## YOUR CODE HERE
hist, _ = np.histogram(img_l1, bins=255)# [0-255], 256

plt.bar(np.arange(255), hist, color='cornflowerblue')
plt.title('histogram of the l1 image')

Now add to the grayscale image a negative number and store the result in img_l2.
If the resulting value is less than 0, clamp it to this 0.

What do you think is the effect of this operation?

In [None]:
## YOUR CODE HERE
img_l2 = np.clip(np.array(img_gray, dtype=np.int16) - 50, 0, 255)
plt.imshow(img_l2, cmap='gray')

Compute and display the histogram of img_l2. What do you notice? How is this histogram different than the previous ones?

In [None]:
## YOUR CODE HERE
hist, _ = np.histogram(img_l2, bins=255)# [0-255], 256

plt.bar(np.arange(255), hist, color='cornflowerblue')
plt.title('histogram of the l2 image')

Add a positive number (for example 40) to the red channel of the color image and store the result in imgg. If the result of the addition exceeds 255, clamp it to 255.
What do you think is the effect of this operation?
Display the resulting image imgr.

In [None]:
## YOUR CODE HERE
imgg = img + (40, 0, 0)
plt.imshow(imgg)

Display a region of interest from the input image defined by the rectangle (x=350, y=400, sz=(500x400)).

Hint: an image is just a numpy array, so you can easily achieve this with array slicing.

In [None]:
## YOUR CODE HERE
x = 350
y = 400
sz = (500, 400)
plt.imshow(img[x:x + sz[0], y:y + sz[1], :])

# Hello convolutional neural networks !

In the last part of this introductory laboratory, you'll "meet" a convolutional network for object classification. For now, consider it just as a black box that takes an image as input and outputs the 3-top predictions; however this network requires that the input data has the following properties: the size of the input image must be 224x224, the channels of the image should be stored in RGB format, the type of the data (of the numpy array) is float32 and the pixel values are normalized.

More specifically:
- resize the image by preserving the aspect ratio, such that its smallest dimension has 232 pixels;
- perform a central crop of the image of crop size equal to 224;
- rescale the pixels in the image to the interval [0.0, 1.0];
- normalize the pixel values using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] (subtract mean and then divide by the standard deviation).
- prior to feeding the image to model.predict() should be added such that the shape of the image is (1, 3, 224, 224) (channels first)

Your task is to pre-process the input images such that they are in the format requested by the network.

In [None]:
import torch
from torchvision.models import resnet50, ResNet50_Weights

weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights).float()

In [None]:
!wget "https://docs.google.com/uc?export=download&id=1X9au_JCNv4fg2Wgsr4DFT-N0OZht6Zmp" -O elephant.jpg

In [None]:
img_path = './elephant.jpg'

## TODO YOUR CODE HERE
img_raw = cv2.imread(img_path)
img_rgb = cv2.cvtColor(img_raw, cv2.COLOR_BGR2RGB)

min_size = min(img_rgb.shape[0], img_rgb.shape[1])
resized_size = 232
ratio = resized_size / min_size
img_resized = cv2.resize(img_rgb, (int(ratio * img_rgb.shape[0]), int(ratio * img_rgb.shape[1])))

height, width, _ = img_resized.shape
cropped_size = 224
img_cropped = img_resized[
  (height - cropped_size) // 2:(height + cropped_size) // 2,
  (width - cropped_size) // 2:(width + cropped_size) // 2,
  :,
]
img_scaled = img_cropped / 255

mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
img_norm = (img_scaled - mean) / std
img_c1 = np.transpose(img_norm, (2, 0, 1))[np.newaxis, :]
img = torch.from_numpy(img_c1).float()
## END TODO YOUR CODE HERE

prediction = model(img).squeeze(0).softmax(0)

class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"The predicted class is {category_name} with score {100 * score:.2f}%")


# get the top k(=3) predictions
k = 3
with torch.no_grad():
  predictions_np = prediction.numpy()

  topk = np.argpartition(predictions_np, -k)[-k:]

  topk_categories = [weights.meta["categories"][class_id].lower() for class_id in topk]
  topk_scores = [predictions_np[idx]*100 for idx in topk]
  plt.barh(topk_categories, topk_scores)
  plt.show()

Apply different effects (crop it, lower the contrast, change the brightness) on the training image and see if you can "fool" the network.

Also, upload other images from your computer and see what the network predicts

In [None]:
## TODO your experiments here
img_cpy = np.copy(img_c1)
height = img_cpy.shape[3]
img_cpy[:, :, :height // 2, :], img_cpy[:, :, height // 2:, :] = img_c1[:, :, height // 2:, :], img_c1[:, :, :height // 2, :]
plt.subplot(2, 1, 1)
plt.imshow(np.transpose(img_cpy[0], (1, 2, 0)))

img = torch.from_numpy(img_cpy).float()
## END TODO YOUR CODE HERE

prediction = model(img).squeeze(0).softmax(0)

class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"The predicted class is {category_name} with score {100 * score:.2f}%")


plt.subplot(2, 1, 2)

# get the top k(=3) predictions
k = 3
with torch.no_grad():
  predictions_np = prediction.numpy()

  topk = np.argpartition(predictions_np, -k)[-k:]

  topk_categories = [weights.meta["categories"][class_id].lower() for class_id in topk]
  topk_scores = [predictions_np[idx]*100 for idx in topk]
  plt.barh(topk_categories, topk_scores)
  plt.show()

Congratulations for reaching this point! This is the end of first laboratory.
Next time we'll build (from scratch) a simple linear classifier to recognize different objects from images.