# Introduction to Python

If you're here, you likely already know what Python is, how to install it, and why it's so widely loved.

This notebook includes some basic `Python` commands that will be useful for the Degree on Applied Artificial Intelligence and its Mathematical Foundations. 

Find more resources in this tutorial: [docs.python.org/3/tutorial/](https://docs.python.org/3/tutorial/). 


- **Author**: Julen Alvarez Aramberri

- **Contact**: julen.alvarez.@ehu.eus

- How a something works in Python:

- How does `print` work?  
    - Use `print?` to get information about it.  


In [None]:
#print?

- Adding two numbers in Python is straightforward:  


In [None]:
3+5

- In Python, text data can be represented using strings.  
    - Both `"hello"` and `'hello'` are valid strings.  
    - Strings can be joined using the `+` operator.  


In [None]:
"hello" + " " + "world"


## Variables and Data Types  

- In Python, we declare a variable by assigning a value with the `=` sign.  
    - Variables act as pointers rather than actual data containers!  
- Python offers a range of data types and structures, including:  
    - Booleans (`True` or `False`)  
    - Numbers (integers, floats, etc.)  
    - Lists
    - Strings
    - Tuples
    - Dictionaries
    - Classes
    - ... 
- No need to declare a variable's type when it is assigned, it is determined dynamically using [duck typing](https://en.wikipedia.org/wiki/Duck_typing).

("If it walks like a duck and it quacks like a duck, then it must be a duck")

---


### Data Structures  

In [None]:
# An integer. Notice the variable naming convention.
age_in_years = 39

In [None]:
# A float
almost_pi = 3.1415

In [None]:
# A string
proton = "P is for proton"

In [None]:
# A boolean takes on only the values True or False
enjoying_tutorial = True

- Most programs need more than just basic data types; they require structured data.  
- Python has built-in support for various essential data structures.  
    - Additional, more advanced, structures can be found in the [collections](https://docs.python.org/3/library/collections.html) module, a specialized container datatypes providing alternatives to Python’s general purpose built-in containers.

#### Lists  
- An ordered, heterogeneous collection of elements that can store multiple values in a single variable.  
- Elements in a list can be accessed by their position (index).  
- Lists are **mutable** elements (they can be changed after creation).  
- Lists are created using square brackets `[]`:  
  - `my_list = []`  
  - Alternatively, a list can be declared using `list()` (though square brackets are more commonly used).  
- To access elements:  
  - First element: `my_list[0]`  
  - Last element: `my_list[-1]`


In [None]:
x = [3, 4, 5]
y = [4, 9, 7]
x + y

In [None]:
# Creating a list
my_list = [10, 20, 30, 40]

# Accessing the first and last elements
first_element = my_list[0]
last_element = my_list[-1]

first_element, last_element


Lists have a dynamic size and elements can be added (appended) to them

In [None]:
my_list.append(4)
my_list

We can access "slices" of a list using `my_list[i:j]` where `i` is the start of the slice (again, indexing starts from 0) and `j` the end of the slice. For instance:

In [None]:
my_list[1:3]

Omitting the second index means that the slice shoud run until the end of the list

In [None]:
my_list[1:]

We can check if an element is in the list using `in`.

In [None]:
5 in my_list

The length of a list can be obtained using the `len` function

In [None]:
len(my_list)

#### Strings
- A **string** is a sequence of characters enclosed in quotes.  
- Strings can be created using **single (' ')**, **double (" ")** quotes.  
- Strings are **immutable** (they cannot be changed after creation).  
- Strings support **indexing** (to access individual characters) and **slicing** (to extract substrings).  
- Common string operations include:  
  - Concatenation (`+`)  
  - Repetition (`*`)  
  - Length (`len()`)  
  - Methods like `.upper()`, `.lower()`, `.strip()`, `.split()`, etc. 

In [None]:
# Different ways to create strings
str1 = 'Hello'  
str2 = "World,"  
str3 = "Python is fun!"  

print(str1, str2, str3)



#### Tuples  

- Very similar to lists.  
- The key difference: **tuples are immutable** (they cannot be changed after creation).  
- Syntax for declaring a tuple:  
  - `my_tuple = ()` (using parentheses).  
- Elements in a tuple can be accessed by index, just like lists:  
  - First element: `my_tuple[0]`  
  - Last element: `my_tuple[-1]`  


In [None]:
# Creating a tuple
my_tuple = (5, "car", 15, "duck")

# Accessing the first and last elements
first_element = my_tuple[0]
last_element = my_tuple[-1]

first_element, last_element


We cannot modify a tuple and trying to do so introduces an *exception*, or error.

In [None]:
#my_tuple = (3, 4, 5)
#my_tuple[0] = 2

#### Dictionaries

- **Unordered** collection of key-value pairs.  
- Dictionary elements are accessed using **keys**, not positions.
- Dictionaries are **mutable** (they can be changed after creation).  
- Syntax for creating a dictionary:  
  - `my_dictionary = {}` (using curly brackets).  
  - Alternatively: `my_dictionary = dict()`, but curly brackets are more commonly used.  
- Accessing values:  
  - `my_dictionary["key_name"]` retrieves the value associated with `"key_name"`.  


In [None]:
# Creating a dictionary
my_dictionary = {
    "name": "Alice",
    "age": 25,
    "city": "New York"
}

# Accessing values by key
name = my_dictionary["name"]
age = my_dictionary["age"]

name, age


In [None]:
## Conditionals

#### Classes  

- A **class** is a blueprint for creating **objects**.  
- Classes define attributes (variables) and **methods** (functions) that belong to an object.  
- The `__init__` method initializes the attributes of the class when an object is created.  
- `self` refers to the instance of the class itself and allows access to its attributes and methods.  
- Syntax:  

```python
class ClassName:
    def __init__(self, param1, param2):
        self.attribute1 = param1
        self.attribute2 = param2

    def method(self):
        return self.attribute1


In [None]:
# Defining a simple class
class Person:
    def __init__(self, name, age):
        self.name = name  # Attribute: name
        self.age = age    # Attribute: age

    def greet(self):
        return f"Hello, my name is {self.name} and I am {self.age} years old."


# Creating an instance of the Person class
person1 = Person("Julen", 40)

# Calling the method
person1.greet()



In [None]:
class Calculator:
    def __init__(self, value):
        self.value = value  # Instance attribute

    def add(self, num):
        self.value += num  # Modifies the instance attribute

    def multiply(self, num):
        self.value *= num  # Another instance method

    def reset():
        return Calculator(0)  # No 'self', creates a new instance

# Creating an instance
calc = Calculator(10)
calc.add(5)         # 10 + 5 = 15
calc.multiply(2)    # 15 * 2 = 30

# Calling the method without an instance
new_calc = Calculator.reset()  # Creates a new Calculator instance with value 0

# Printing results
print(calc.value)  # Output: 30
print(new_calc.value)  # Output: 0


### Exercises

---

- **Exercise 1**: List Operations  

Create a list containing the numbers `[3, 7, 2, 9, 1]`.  
Perform the following operations:  
1. Append the number `5` to the list.  
2. Remove the smallest number.  
3. Sort the list in descending order.  
4. Retrieve and print the last element in the list.

```python
# Expected Output: [9, 7, 5, 3, 2]


In [None]:
# Type here your code

- **Exercise 2**: Tuples

Given the tuple `coordinates = (10, 20, 30)`:  
1. Extract the values into three variables: `x, y, z`.  
2. Print each variable separately.  
3. Try modifying `x` to `15` and observe what happens.  
```python
# Expected Output: 
10
20
30

In [None]:
# Type here your code

- **Exercise 3**: Dictionaries  

Create a dictionary representing a student's grades with the following data:  
- `"Math"`: `85`  
- `"Science"`: `90`  
- `"History"`: `78`  

Then, perform the following:  
1. Add a new subject `"English"` with a grade of `88`.  
2. Update the `"Math"` grade to `92`.  
3. Remove `"History"` from the dictionary.  
4. Print all subjects and their corresponding grades.  
```python
# Expected Output:
Math: 92
Science: 90
English: 88

In [None]:
# Type here your code

- **Exercise 4**: Classes

1. Define a class `NeuralNetwork` with the following properties:  
   - `architecture` (string)  
   - `layers` (integer)  
   - `activation_function` (string)  

2. Create an `__init__` method to initialize these attributes.  

3. Add a method `get_info()` that returns a formatted string describing the neural network.  

4. Instantiate a `NeuralNetwork` object with `"Feedforward"`, `5`, and `"ReLU"`.  

5. Call the `get_info()` method and print the result.

```python
# Expected Output:

In [None]:
# Type here your code

## Loops, Conditional Statements, Functions

---

### Loops 

Loops allow us to execute a block of code multiple times. Python has two main types of loops:  

- **`for` loops** → Used for iterating over sequences (like lists, tuples, dictionaries, or strings).  
- **`while` loops** → Runs as long as a given condition is `True`.  

---

**`for`** loop 

In [None]:
# Looping through a list
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
    print(fruit)


**`for`** loop with range

In [None]:
# Using range to loop through numbers 0-4
for i in range(5):
    print(i)


**`while` ** loop


In [None]:
# Looping while a condition is True
count = 0
while count < 3:
    print("Count is:", count)
    count += 1


### Vectorization
- Vectorization is the process of performing operations on entire arrays or sequences of data **at once**, instead of using explicit loops. This technique takes advantage of **optimized low-level implementations** (such as those written in C or Fortran) to enhance **performance and efficiency**.

- Important in Machine learning: 
  - **Performance Boost**: Significantly speeds up computations by eliminating slow Python loops.  
  - **Cleaner Code**: Reduces code complexity and makes it more readable.  
  - **Optimized Memory Usage**: Uses efficient memory management for handling large datasets.  

- **Vectorization in Keras**  
  - In **Keras**, deep learning models rely on **vectorized tensor operations** to speed up computations using **NumPy** and **TensorFlow** as backends. Operations like matrix multiplications, activations, and optimizations are heavily vectorized to improve performance.  

- **Vectorization in JAX**  
  - **JAX** is a high-performance computing library that excels in automatic differentiation and **just-in-time (JIT) compilation**. It uses **vectorized operations** for numerical computations, making deep learning models run much faster on CPUs and GPUs. JAX provides functions like `vmap()` for automatic vectorization, improving efficiency further.


---

Using `for` lops

In [None]:
numbers = [10, 20, 30, 40, 50, 60, 70, 80]

selected = []
for i in range(1, 6, 2):  # Start at index 1, stop before 6, step 2
    selected.append(numbers[i])

print(selected)  # Output: [20, 40, 60]

Using List Slicing (Vectorized Approach)

In [None]:
selected_sliced = numbers[1:6:2]  # Start at index 1, stop before 6, step 2

print(selected_sliced)  # Output: [20, 40, 60]


Using List Comprehension (Pythonic Way)

In [None]:
selected_list_comp = [numbers[i] for i in range(1, 6, 2)]

print(selected_list_comp)  # Output: [20, 40, 60]

Using a Lambda Function (explained afertwards)

In [None]:
extract_elements = lambda lst: lst[1:6:2]

print(extract_elements(numbers))  # Output: [20, 40, 60]

### Conditional Statements

(`if`, `if-else`, and `if-elif-else` Clauses)

Conditional statements in Python allow the execution of different blocks of code depending on certain conditions.

- **`if`**: Executes a block of code if the condition is `True`.  
- **`elif` (else if)**: Checks another condition if the previous `if` condition was `False`. You can have multiple `elif` statements.  
- **`else`**: Executes a block of code if none of the `if` or `elif` conditions are met. It acts as the default case.


---

In [None]:
x = 7
if x > 10:
    print("x is greater than 10")
elif x > 5:
    print("x is greater than 5 but not more than 10")
else:
    print("x is 5 or less")  # This will execute


It is possible to include logical operators

In [None]:
score = 85
attendance = 90

if score >= 90 and attendance >= 80:
    print("Excellent performance!")  
elif (score >= 70 and attendance >= 75) or (score >= 80 and attendance >= 60):
    print("Good performance, but there’s room for improvement.")
elif score < 70 and (attendance < 75 or score < 50):
    print("Needs significant improvement.")
else:
    print("Performance is average.")

### Functions

- A **function** is a reusable block of code that executes only when explicitly called.  
- Functions allow code modularity, improving readability and maintainability.  
- Functions can:  
  - Accept **arguments** (or **parameters**) to modify their behavior.  
  - Accept **any number** and **any type** of inputs.  
  - Always **return a single object** (even if that object is a tuple, which may appear as multiple values).  
- Syntax for defining a function:  

```python
def function_name(parameter1, parameter2):
    # Function body (code to execute)
    return some_value


In [None]:
# Function that takes two numbers and returns their sum
def add_numbers(a, b):
    return a + b

# Using the function
result = add_numbers(3, 5)
print(result)  # Output: 8



In [None]:
# Function that returns multiple values (as a tuple)
def min_max(values):
    return min(values), max(values)

# Calling the function
numbers = [4, 7, 1, 9, 3]
min_val, max_val = min_max(numbers)

print(f"Minimum: {min_val}, Maximum: {max_val}")


### Lambda Functions in Python  

Lambda functions, also known as **anonymous functions**, are **small, single-line functions** that do not require a name. They are useful for **quick, simple operations** without formally defining a function using `def`.  


In [None]:
# Example: Squaring a list of numbers using lambda
numbers = [1, 2, 3, 4, 5]
squared_numbers = list(map(lambda x: x**2, numbers))

print(squared_numbers)  # Output: [1, 4, 9, 16, 25]


### Exercises

---

**Exercise 1**: Compute the Sum of Even Layer Sizes  
In a neural network, the number of neurons in each layer is often defined as a list.  
Write a function `sum_even_layers(layers)` that takes a list of integers representing layer sizes and returns the sum of the sizes of layers that contain an even number of neurons.

Example:  
```python
sum_even_layers([32, 64, 128, 45, 23, 256])  # Output: 480 (32 + 64 + 128 + 256)


In [None]:
# Type here your code

In [None]:
def sum_even_layers(layers):
    return sum(layer for layer in layers if layer % 2 == 0)

# Example usage
result = sum_even_layers([32, 64, 128, 45, 23, 256])
print(result)  # Output: 480 (32 + 64 + 128 + 256)


**Exercise 2**: Reverse an Activation Function Name  
Activation functions such as `"relu"`, `"sigmoid"`, and `"tanh"` are widely used in neural networks.  
Write a function `reverse_activation(name)` that takes an activation function name as a string and returns the reversed string.

Example:  
```python
reverse_activation("relu")  # Output: "ruler"

In [None]:
# Type here your code

**Exercise 3**: Count the Number of ReLU Activations.
Given a list of activation functions used in a neural network, write a function `count_relu(activations)` that counts how many times `"relu"` appears.

Example:  
```python
count_relu(["relu", "sigmoid", "relu", "tanh", "relu", "softmax"])  # Output: 3


In [None]:
# Type here your code<

## Modules and External Libraries

- In programming, a **module** is a piece of software that has a specific functionality.  
- For example, when building a **neural network**, one module may handle the **data preprocessing**, while another module is responsible for **model training** or **performance evaluation**.
- Think of **Modules, Packages, and Libraries** as **programs written by other developers**. Instead of coding everything from scratch, you can use them.
- Before using a module, you **must import** it into your script.

---

In [None]:
#Numpy arrays are great alternatives to Python Lists. Some of the key advantages of Numpy arrays are that they are fast, easy to work with, and give users the opportunity to perform calculations across entire arrays.

import numpy as np  # Importing the NumPy library

# Creating a simple NumPy array
data = np.array([1, 2, 3, 4, 5])

# Performing a basic operation (element-wise multiplication)
scaled_data = data * 2  

print(scaled_data)  # Output: [ 2  4  6  8 10]


In [None]:
# Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.



# Creating a dict to store data on brics
bricsdata = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
       "capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
       "area": [8.516, 17.10, 3.286, 9.597, 1.221],
       "population": [200.4, 143.5, 1252, 1357, 52.98] }

# Import the pandas package as pd
import pandas as pd

# Create pandas dataframe from brics data
brics = pd.DataFrame(bricsdata)
# print(brics)

# Set the index for brics
brics.index = ["BR", "RU", "IN", "CH", "SA"]

# Print out brics with new index values
print(brics)
     


## Numerics: NumPy

- **NumPy** (Numerical Python) is a fundamental library for numerical computations in Python. It provides support for **multi-dimensional arrays** and **mathematical functions** to operate on them efficiently.  

- <u>**Why Use NumPy**</u>

    - **Fast and efficient**: Operations on NumPy arrays are much faster than Python lists.  
    - **Support for multi-dimensional arrays**: Essential for machine learning and data processing.  
    - **Built-in mathematical functions**: Enables complex computations with ease.  


---

In [None]:
# Importing NumPy

# Before using NumPy, it must be imported: 

import numpy as np


Entering `fun?` in Python will display the documentation for the function `fun`, if available.  
Let's test this with `np.array()`.  


In [None]:
#np.array?


### Array Creation
---

In `numpy`, an *array* refers to a multidimensional collection of numerical values.  
We can create one-dimensional arrays, also known as vectors, using the `np.array()` function.  

Below, `x` and `y` are examples of such arrays.


In [None]:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print("Array x:", x)
print("Array y:", y)


Because `x` and `y` were created with `np.array()`, adding them together produces the expected result.  
This differs from the behavior seen earlier when attempting to add two standard Python lists without using `numpy`.  


In [None]:
x + y

In `numpy`, matrices are usually represented as two-dimensional arrays, while vectors are one-dimensional arrays.  
{Although `np.matrix()` can also be used to define matrices, we will rely on `np.array()` throughout these labs.}  
A two-dimensional array can be created as shown below.  


In [None]:
x = np.array([[1, 2], [3, 4]])
x

Another useful routine is `linspace` for creating linearly spaced values in an interval. For instance, to create 10 values in `[0, 1]`, we can use

In [None]:
np.linspace(0, 1, 10)

The object `x` comes with multiple *attributes*—associated properties that provide useful information.  
To access an attribute of `x`, we use the syntax `x.attribute`, replacing `attribute` with the specific name.  
For example, we can retrieve the `ndim` attribute of `x` as shown below.  


In [None]:
x.ndim

The result tells us that `x` is a two-dimensional array.  
Likewise, the `x.dtype` attribute reveals the *data type* of `x`, showing that it consists of 64-bit integers:  


In [None]:
x.dtype

Why does `x` contain integers? This happens because we constructed `x` using only integer values with the `np.array()` function.  
If we had included any decimal numbers, the resulting array would consist of *floating point numbers* (i.e., real numbers).  


In [None]:
np.array([[1, 2], [3.0, 4]]).dtype


The array `x` is a two-dimensional structure. To determine the number of rows and columns it contains, we can check its `shape` attribute.  


In [None]:
x.shape


### Basic Operations
---

In NumPy, functions that perform element-wise operations on arrays are known as [universal functions](https://numpy.org/doc/stable/reference/ufuncs.html).  


In [None]:
np.sin(x)

Arrays support direct arithmetic operations.  
For example, if two arrays have compatible shapes, we can add them together like this:  


In [None]:
array_a = np.array([1, 2, 3])
array_b = np.array([4, 5, 6])
array_a + array_b

A *method* is a function that belongs to a specific object.  
For example, if we have an array `x`, we can use the `sum()` method to compute the sum of all its elements by calling `x.sum()`.  
This function call impl*Another* important operation is `reshape`, for changing the shape of an arrayicitly passes `x` as the first argument to its `sum()` method.

Example: Sum elements of `x` by passing in `x` as an argument to the `np.sum()` function. 

In [None]:
x = np.array([1, 2, 3, 4])
np.sum(x)

Another example is the `reshape()` method, which generates a new array with the same elements as `x` but a different shape.  
To achieve this, we pass a `tuple` as an argument to `reshape()`. For instance, `(2, 3)` specifies that we want a two-dimensional array with 2 rows and 3 columns.  

{Tuples, like lists, represent a sequence of objects. The key difference is that tuples are *immutable*, meaning their elements cannot be changed after creation, whereas lists allow modifications.}  

In the following example, the `\n` character is used to create a *new line* in the output.  


In [None]:
x = np.array([1, 2, 3, 4, 5, 6])
print('beginning x:\n', x)
x_reshape = x.reshape((2, 3))
print('reshaped x:\n', x_reshape)


The output above shows that `numpy` arrays are structured as a sequence of *rows*.  
This arrangement is known as **row-major ordering**, in contrast to *column-major ordering*.  


In `Python` (and therefore `numpy`), indexing starts at 0.  
This means that to retrieve the top-left element of `x_reshape`, we use `x_reshape[0,0]`.  


In [None]:
x_reshape[0, 0] 

We can create and reshape together

In [None]:
A = np.array(np.arange(16)).reshape((4, 4))
A

Let's take a quick look at some useful attributes of arrays.  
The `shape` attribute returns the array's dimensions as a tuple.  
The `ndim` attribute indicates the number of dimensions, while `T` gives the array's transpose.  


In [None]:
print('x_reshape     :\n', x_reshape)
print('shape         :', x_reshape.shape)
print('nr. dimensions:', x_reshape.ndim)
print('transpose     :\n', x_reshape.T)

Like Python lists, NumPy arrays support slicing


In [None]:
np.arange(10)[5:]

We can also select only certain elements from the array

In [None]:
x = np.arange(10)
mask = x >= 5
x[mask]

### Exercises

---

**Exercise 1** : More simple

Create a 3x3 NumPy array filled with numbers from 1 to 9.  
1. Access and print the element in the second row, third column.  
2. Modify this element to be 99 and print the updated array.  

```python
#Expected Output:
Updated array:
 [[ 1  2  3]
 [ 4  5 99]
 [ 7  8  9]]


In [None]:
# Type here your code

**Exercise 2**: More advanced

1. Manually create a **4×5 NumPy array** containing integer values.
2. Print the **shape**, **data type**, and **number of dimensions** of the array.
3. Convert the array’s data type to `float64` and print the updated data type.
4. **Reshape** the array into a **2×10** array and print the new shape.
5. **Flatten** the array into a one-dimensional array and print its shape (use `flatten` method).
```python
#Expected Output:
Original Array:
 [[12 23 34 45 56]
 [67 78 89 90 12]
 [21 32 43 54 65]
 [76 87 98 11 22]]
Shape: (4, 5)
Data Type: int64
Number of Dimensions: 2
Updated Data Type: float64
New Shape (2x10): (2, 10)
Flattened Shape: (20,)

In [None]:
# Type here your code

---

### Numerical Python
---

#### Random Module

The `np.random` module in NumPy provides functions for generating random numbers, which are useful in simulations, machine learning, and statistical modeling.  

**Key Functionalities:**  
- **Generating random integers:** `np.random.randint(low, high, size, dtype)`  
- **Generating random floats:** `np.random.random(size)` or `np.random.rand(dim1, dim2, ...)`  
- **Sampling from distributions:** `np.random.normal(mean, std, size)`, `np.random.uniform(low, high, size)`  
- **Shuffling arrays:** `np.random.shuffle(arr)`  
- **Setting a random seed:** `np.random.seed(seed_value)` (for reproducibility)  

---

We generate 50 independent random variables from a $N(0,1)$ distribution. 

In [None]:
x = np.random.normal(size=50)
x

We get a **different set of results** any time we execute. We need to fix the seeding.

When working with **random number generation** in NumPy, results will vary each time the code is executed. 

To ensure **reproducibility**, we can set a **random seed** using `np.random.seed()`.

This ensures that the same sequence of random numbers is generated every time the script runs.

**Why Use a Random Seed?**
- Guarantees **consistent results** across multiple runs.
- Essential for **debugging** and comparing results.
- Useful when **training machine learning models** to ensure reproducibility.

**How to Set a Random Seed**
```python
# Set the seed for reproducibility
np.random.seed(42)

In [None]:
np.random.seed(42)
x = np.random.normal(size=50)
x

#### Statistics  

NumPy provides powerful tools for statistical analysis, making it a fundamental library for data science and IA. It offers various functions to compute descriptive statistics, such as mean, median, standard deviation, and correlation.  

**Commonly Used Statistical Functions in NumPy**:
- `np.mean()`: Computes the mean (average) of an array.
- `np.median()`: Finds the median value.
- `np.std()`: Calculates the standard deviation.
- `np.var()`: Computes the variance.
- `np.min()` / `np.max()`: Returns the minimum and maximum values.
- `np.percentile()`: Finds specific percentiles in the dataset.
- `np.corrcoef()`: Computes the correlation matrix.
- `np.histogram()`: Creates a histogram representation of data.

Example:

---

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Generate random dataset (100 values from a normal distribution)
# np.random.normal(mean, std, size)
data = np.random.normal(loc=50, scale=15, size=100)

# Compute basic statistics
mean_value = np.mean(data)
median_value = np.median(data)
std_dev = np.std(data)
variance = np.var(data)
min_value = np.min(data)
max_value = np.max(data)

# Compute percentiles
q25 = np.percentile(data, 25)  # 25th percentile
q50 = np.percentile(data, 50)  # Median (50th percentile)
q75 = np.percentile(data, 75)  # 75th percentile

print("Mean:", mean_value)
print("Median:", median_value)
print("Standard Deviation:", std_dev)
print("Variance:", variance)
print("Variance hand:", np.mean((y - y.mean())**2))
print("Min:", min_value, "Max:", max_value)
print("25th Percentile:", q25)
print("50th Percentile (Median):", q50)
print("75th Percentile:", q75)


#### Exercises

---

**Exercise 1**: Vectorizing a Loop.
You are given a list of numbers representing the **weights of different neurons** in a neural network:

weights = [0.5, 1.2, 0.8, 1.5, 2.0, 1.1]

Write a Python program that **doubles the weights of neurons 2 to 4** (indices **1 to 3**) using a **for loop**.  

Then, optimize your solution by **vectorizing the operation** using NumPy.  


EXTRA:
- Compute the same with a Pythonic concise approach.
- Compute the same using a lambda function.

```python
#Expected Output:
Original Weights:        [0.5, 1.2, 0.8, 1.5, 2.0, 1.1]
Modified Weights:        [0.5, 2.4, 1.6, 3.0, 2.0, 1.1]

In [None]:
# Type here your code

**Exercise 2** 

Generate two **random datasets** of **250 values** each:  

1. One using `np.random.uniform()` (floating-point values between **10 and 100**).  
2. Another using `np.random.normal()` (normally distributed data with a **mean of 55** and **standard deviation of 15**).  

Perform the following tasks:  
- Compute the **correlation coefficient matrix** between the two datasets (use `np.corrcoef`function).  

EXTRA:
- Compute the **skewness** and **kurtosis** of each dataset.  
- Find the **5th, 25th, 50th (median), 75th, and 95th percentiles** of each dataset.  
- Normalize both datasets to have **zero mean and unit variance** (standardization).  
- After standardization, compute the **new mean and standard deviation** of both datasets to verify correctness.
```python
#Expected Output:
Correlation Matrix:
 [[1.         0.15091028]
 [0.15091028 1.        ]]

Dataset 1 - Skewness: -0.021627525735899496 Kurtosis: -1.1870764128816396
Dataset 2 - Skewness: 0.18293757051883225 Kurtosis: -0.24073860946709047

Dataset 1 Percentiles (5th, 25th, 50th, 75th, 95th): [13.33850554 31.8284978  56.2777033  74.25925726 94.8033551 ]
Dataset 2 Percentiles (5th, 25th, 50th, 75th, 95th): [31.49070116 43.6842296  54.76930068 64.43962013 78.90635409]

After Standardization - Dataset 1 Mean: -1.5987211554602254e-16 Std Dev: 1.0
After Standardization - Dataset 2 Mean: -2.0605739337042906e-16 Std Dev: 1.0

In [None]:
# Type here your code

## Graphics: Matplotlib

- It provides tools to plot data in various formats and for creating **static, animated, and interactive visualizations**.

- In the context of **Neural Networks**, visualization is crucial for:
  - Understanding **data distributions** before training.
  - Monitoring **loss and accuracy curves** during training.
  - Visualizing **activation functions and decision boundaries**.
  - Inspecting **model predictions** and errors.

---

In [None]:
from matplotlib import pyplot as plt

### Plot
We can generate data and plot it as a smooth line.


In [None]:
x_values = np.linspace(-3, 3, 100)

plt.figure()
plt.plot(x_values, np.sin(x_values), label="Sinusoid")
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.title("Matplotlib example")
plt.legend(loc="upper left")
plt.show()

### Scattered Plot
We can visualize the relationship between two sets of randomly generated data points. This is useful for understanding data distribution before applying other techniques.

In [None]:
# Generate random data
np.random.seed(42)
x = np.random.rand(100)
y = x * 2 + np.random.normal(0, 0.1, 100)  # Linear relation with noise

# Scatter plot
plt.figure(figsize=(7, 5))
plt.scatter(x, y, c=y, cmap="viridis", alpha=0.7)
plt.colorbar(label="Value Intensity")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("Scatter Plot of Random Data")
plt.show()


### Histogram
We can plot histograms of data to obtain information about the distribution of the data.

In [None]:
# Generate normally distributed data
data = np.random.normal(loc=0, scale=1, size=1000)

# Histogram
plt.figure(figsize=(7, 5))
plt.hist(data, bins=30, color="blue", alpha=0.7, edgecolor="black")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Histogram of Normally Distributed Data")
plt.show()


### Boxplots
To represent distributions of different data at the same time. This is useful for comparing different categories in a dataset.

In [None]:
# Generate random data for different groups
group_1 = np.random.normal(0, 1, 100)
group_2 = np.random.normal(1, 1, 100)
group_3 = np.random.normal(2, 1, 100)

# Create boxplot
plt.figure(figsize=(7, 5))
plt.boxplot([group_1, group_2, group_3], tick_labels=["Group 1", "Group 2", "Group 3"])
plt.xlabel("Groups")
plt.ylabel("Values")
plt.title("Boxplot of Random Data Groups")
plt.show()


### Cost Function Evolution 
Typical Plot in Deep Learning Problems

In [None]:
# Simulating loss function values for training and testing
epochs = np.arange(1, 101)  # 100 epochs
train_loss = np.exp(-epochs / 20) + np.random.normal(0, 0.02, len(epochs))  # Simulated decay with noise
test_loss = np.exp(-epochs / 25) + np.random.normal(0, 0.02, len(epochs)) + 0.1  # Slightly different decay with noise

# Plotting the cost function evolution
plt.figure(figsize=(8, 5))
plt.plot(epochs, train_loss, label="Training Loss", color='blue', linewidth=2)
plt.plot(epochs, test_loss, label="Validation Loss", color='red', linestyle='dashed', linewidth=2)

# Highlighting key points
plt.scatter([epochs[-1]], [train_loss[-1]], color='blue', edgecolor='black', zorder=3)
plt.scatter([epochs[-1]], [test_loss[-1]], color='red', edgecolor='black', zorder=3)

# Adding labels and title
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Training vs Validation Loss Evolution")
plt.legend()
plt.grid(True)

# Show the plot
plt.show()


### Exercises
Generate a dataset of **random values following a normal distribution** (Gaussian) with **mean = 50 and standard deviation = 10**.  
- **Plot a histogram** of the dataset with appropriate bins.  
- **Overlay the probability density function (PDF)** of the corresponding normal distribution.
- Label the axes and add a title to the plot.

To compute the PDF use the scipy library as follows:
```python

from scipy.stats import norm

(...)
pdf = norm.pdf(x, mean, std_dev)  # Compute PDF values


# Expected Output: An histogram with a red line.

In [None]:
# Type here your code

## Data Management: Pandas

- Datasets often contain various data types and may include labeled rows or columns.  

- Pandas is a powerful **Python library** for **data manipulation and analysis**. It provides two primary data structures:

  - **Series** (1D labeled array)  
  - **DataFrame** (2D table similar to an Excel spreadsheet)  


- A **DataFrame** is commonly used to handle such structured data efficiently since it can be thought of as a collection of **columns**, where each column is an array of the same length.  
Rows are formed by combining corresponding entries from each column.  

---

In [None]:
import pandas as pd


### Create
We can create a DataFrame from a dictionary

In [None]:
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "Salary": [50000, 60000, 70000]
}

df = pd.DataFrame(data)
print(df)


### Load from csv
We can load a .csv

In [None]:
df = pd.read_csv("data.csv")  # Load data from a CSV file
print(df.head())  # Display the first 5 rows


### Access
We can access and filter some data


In [None]:
print(df["Pulse"])  # Access a specific column
print(df.iloc[0])  # Access the first row
print(df.loc[df["Pulse"] > 110])  # Filter rows where Age > 28


### Save to csv

In [None]:
# Create a sample DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}

df = pd.DataFrame(data)

print(df)  

# Save the DataFrame to a CSV file
df.to_csv("sample_data.csv", index=False)

print("Data saved successfully to sample_data.csv")


### Other Operations

If a value is missing, we can substitute it


In [None]:
# A DataFrame with missing values
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, np.nan, 35, 40],  # Missing value in Age
    "City": ["New York", "Los Angeles", np.nan, "Chicago"]  # Missing value in City
}

df_missing = pd.DataFrame(data)

# Display the DataFrame with missing values
print("Original DataFrame with Missing Values:\n", df_missing)

# Handling missing values: Fill missing values with a default value
df_filled = df_missing.fillna({"Age": df_missing["Age"].mean(), "City": "Unknown"})

print("\nDataFrame after Filling Missing Values:\n", df_filled)

# Alternatively, drop rows with missing values
df_dropped = df_missing.dropna()

print("\nDataFrame after Dropping Missing Values:\n", df_dropped)



We can sort the data

In [None]:
# Sorting the DataFrame by Age in ascending order
df_sorted = df_filled.sort_values(by="Age")
print("\nDataFrame Sorted by Age (Ascending):\n", df_sorted)

# Sorting the DataFrame by Age in descending order
df_sorted_desc = df_filled.sort_values(by="Age", ascending=False)
print("\nDataFrame Sorted by Age (Descending):\n", df_sorted_desc)



We can group and aggregate data

In [None]:

# Grouping by City and calculating the average Age for each group
df_grouped = df_filled.groupby("City")["Age"].mean()
print("\nAverage Age by City:\n", df_grouped)

# Grouping by City and counting the number of occurrences
df_grouped_count = df_filled.groupby("City")["Name"].count()
print("\nCount of People per City:\n", df_grouped_count)


### Exercises
Handling and Analyzing Employee Data with Pandas

You will work with a CSV file containing employee data. Your task is to load the data, handle missing values, sort the data, group it by specific attributes, and analyze key metrics.

1. **Load the Data:** Read the `exercise_pandas.csv` file into a Pandas DataFrame.
2. **Handle Missing Values:** Replace any missing values in the `Salary` column with the average salary.
3. **Sort the Data:** Sort the employees by `Salary` in descending order.
4. **Group Data:**
   - Compute the **average salary** per `Department`.
   - Count the **number of employees** per `City`.
5. **Save the Processed Data:** Write the sorted data to a new CSV file named `exercise_pandas_sorted.csv`.
6. **Bonus Task:** Identify and print the employee with the highest salary.

```python

#Expected Output:

Loaded DataFrame:
      Name Department   Salary           City
0    Alice         HR  50000.0       New York
1      Bob         IT  70000.0  San Francisco
2  Charlie         IT      NaN       New York
3    David    Finance  65000.0        Chicago
4      Eve         HR  48000.0        Chicago
DataFrame after handling missing values:
      Name Department   Salary           City
0    Alice         HR  50000.0       New York
1      Bob         IT  70000.0  San Francisco
2  Charlie         IT  58250.0       New York
3    David    Finance  65000.0        Chicago
4      Eve         HR  48000.0        Chicago

Average Salary per Department:
Department
Finance    65000.0
HR         49000.0
IT         64125.0
Name: Salary, dtype: float64

Employee Count per City:
City
Chicago          2
New York         2
San Francisco    1
Name: Name, dtype: int64

Highest Paid Employee:
Name                    Bob
Department               IT
Salary              70000.0
City          San Francisco
Name: 1, dtype: object

# An output file named example_pandas_sorted.csv

In [None]:
# Type here your code