# Practicals for lecture 1.0

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vigji/python-cimec-2025/blob/main/practicals/Practicals_1.0.ipynb)

## Introduction to `numpy`

#### 1.0.0 Creating numpy arrays

In [22]:
import numpy as np
import sys

In [3]:
# Create a numpy array from this list:

my_list = [3,2,4,5,6,1]


In [7]:
# What shape do you expect to see in the array that come from converting
# to numpy this list of lists of lists? Make your prediction, then convert list to array 
# and check the result!
my_list_of_lists = [[[1, 2, 3, 4], [5,6,7,8], [9,10,11,12]], 
                    [[13,14,15,16], [17,18,19,20], [21,22,23,24]]]



In [None]:
# Initialize a 3D numpy array full of zeros of shape (3, 2, 10). 
# Check its  `ndim` and `shape` attributes to make sure it is correct!

my_array = ...

In [6]:
# Initialize a numpy array full of ones of shape (3, 2, 10). Make it of data type np.uint16!
# Bonus: check its size in memory and compare it with the size of the zeros array we defined above!
# You can use the sys.getsizeof() function to check the size in memory of the array.

my_array = ...

In [9]:
# Initialize an array of shape (3, 2, 10) full of nans:

my_array = ...

In [None]:
# Initialize a 1D array containing all even numbers from 0 to 100:
my_array = ...

In [None]:
# Google (or ask chatGPT) how to use np.random to generate normally distributed values. 
# Then, create an array of normally distributed values and shape (4,5,2) called random_matrix.

random_array = ...

In [26]:
# [Advanced]: 
# 1) Another useful np.random function is shuffle, to change the order of elements in a list/array. 
# try it out! Does it return a new array or does it work inplace?

# 2) try initializing random arrays of different dtypes and look at their size in memory.

# 3) From the exercise above, you can imagine that sometimes using u/int8 or u/int16 can spare a lot of space
# when working with arrays. However, those types can only store integers. 
# Can you imagine what you could do to convert float values in the range 0-1 to integers that you can store
# using uint8 or uint16? 
# write a function that takes as input an array of floats between 0 and 1 and convert it in uint16 format 
# maintaning as much information as possible. Write also a function to transform the array back to the original float form.
to_be_compressed = np.random.rand(1000, 1000)

# 4) Can you estimate what is the resolution limit of the uint8 and uint16 encoded arrays? Resolution 
# could be defined as the minimum difference between two numbers that will make them mapped as different values
# in the uint8 or uint16 encoded arrays.

# 5) Given a dtype - let's say float - try to figure out how much memory is required to store each number of the array, 
# and how much memory is allocated as an overhead just to initialize the array.
# You can do this by creating arrays of different sizes (you can keep it 1D for simplicity) in a loop,
# creating a list or an array with the array sized you measure with sys.getsizeof() and then plot this curve
# using matplotlib plt.plot() (I assume you can do this if you're reading this)


#### 1.0.1 Indexing and plotting

In [None]:
# You are given this 2D array:
np.random.seed(42)
random_array = np.random.normal(0, 1, (4, 5))

# use numpy indexing to address the element (0, 1) (first row, second column) from random_array:

In [None]:
# use numpy indexing to select all values in the second row from random_matrix above:


In [None]:
# Set to replace with np.nan all the negative values in the matrix below:
np.random.seed(42)
random_matrix = np.random.normal(0, 1, (3,2))



In [None]:
# fMRI data

# The code snippet below loads a single fmri scan into an array (fmri_array).
# This scan consists of 50x59 voxels in each slice, 50 axial slices, 168 timepoints.

# First, run this cell to load the data:
!pip install nilearn
from nilearn import datasets, image, plotting
data = datasets.fetch_development_fmri(n_subjects=1)
fmri_img = image.load_img(data.func[0])
fmri_array = image.get_data(fmri_img)

In [None]:
# Print out the shape of the array to check the dimensions, and find the time axis (remember, we have 168 timepoints).

# Select the volume correponding to the 100th timepoint, then print the shape of the volume to check the dimensions
# and if they make sense to you given the specs of the scan given above:

In [None]:
# Now slice the volume on the 3rd axis to get the 20th slice, and plot it using plt.imshow:
# Bonus: check the documentation of plt.imshow and play with the vmin and vmax arguments to control the contrast range:

In [None]:
# Lets say that you know that timepoints 30-40 and 99-114 are corrupted by high levels of motion, 
# so you want to remove them (an approach known as motion censoring).
# Create a boolean array that indicates which timepoints need to be censored - a censoring timeseries essentially. 
# (Hint: you can do this by creating a boolean array of zeros and use slicing to set the correct values to 1)
# 
# The number of timepoints in your array should be match the number of timepoints in your scan.


# Now apply the censoring timeseries to your scan to remove the corrupted timepoints. 
# Check that the number of remaining timepoints is as you expect.

In [None]:
# Images are just arrays! This is one of the reasons working with arrays is so important!
# (Usually images are H x W x 3 arrays, with the third dimension storing the values
# for each of the RGB channels. Here the image will be grayscale, so we only have 2D - no colors!).

# Use the function below to download an image, and print the shape of the array to know the number of pixels. 
# Then, use plt.matshow to visualize it.

def fetch_image():
    """Fetch exercise data from github repo. 
    
    Returns:
        np.ndarray
            Array with the exercise data.
    
    """
    
    # You should never import stuff in a function! I'm doing it here
    # just to keep together all the code that you don't really need to read now.
    import numpy as np
    import requests
    from io import BytesIO

    # URL of the .npy file on GitHub:
    URL = "https://github.com/vigji/python-cimec/raw/main/practicals/data/corrupted_img.npy"

    response = requests.get(URL)
    
    return np.load(BytesIO(response.content))


# Tip 1: remember to import matplotlib.pyplot first - and give it an alias! ("import ... as ...")
# Tip 2: to make the image grayscale, you can pass the cmap="gray" argument to the matshow() function!

img = fetch_image()  # print its shape to know the number of pixels

In [None]:
# It looks like the image got corrupted with some noise! 
# To understand the noise pattern, you can try to look closer to it.
# Zoom in the image: plot it again, but selecting a small region using indexing 
# (e.g., img[10:80, 70:130])


In [None]:
# Can you understand what is going on? Can you think of an indexing strategy 
# that would filter out the noise?
# Try to retrieve the uncorrupted image with an indexing operation, and plot it!

# (See cell below if you are really stuck!)


In [None]:
# Hint: it looks like one every two columns of pixels has weird values! You could try to keep only one column every two...

In [None]:
# [Advanced]
# 1) check the type of the image array. Then, check its maximum. Is this data represented efficiently?
#   If not, convert it to the format that would preserve information with the maximum memory efficiency

# 2) As we mentioned, color images are just (w, h, 3) arrays where the third dimension correspond to color.
# Color is represented by triplets of numbers indicating the load over the Red, the Blue, and the Green axis.
# Let's make a colored version of the image, where the left side of the image appears red, the center blue,
# and the right green!
# To do so, initialize an empty (w,h,3) array and use indexing to fill with the image values the correct
# channels in different parts of the image. Bonus points: do it with a loop.
# After you have done it, consider this question: were you working inplace or on copies?
