# Introduction to Python
#### Diploma in Banking Supervision (CEMFI) - 2024

Alba Miñano-Mañero (alba.minano@cemfi.edu.es)

Welcome everyone to this short brush-up course in Python programming. Our aim over these two classes is to provide you with a solid foundation, ensuring you're well-prepared to make the most of the upcoming Data Science course taught by [Joël Marbet](https://www.joelmarbet.com).

For some of you, this may mark your initial foray into programming. These first two classes are designed to acquaint you with the fundamental concepts you'll need moving forward. If certain concepts seem unfamiliar or daunting, don't panic. Programming presents a steep learning curve, you'll be amazed at how much you can achieve in a short time.

The topics we aim to cover in these lessons are:
1. Overview of the installation guidelines you've been provided (S1)
2. Jupyter Notebook foundations (S1)
3. Basic programming syntax. (S1)
4.  `NumPy` (S1)
5. Data management with `pandas`(S2)
6. Data visualization with `matplotlib` and `seaborn` (S2)

### 1. Overview of setup and installation. 

**What is Python?** 

Python is a high-level, general-purpose programming language designed by Guido van Rossum in the late 1980s. It acts as a translator, abstracting away the  details of computers and enabling us to communicate with them using a language that's easier for humans to understand. 

Python emphasizes readability, making it *straightforward* to write and understand code. It supports various programming paradigms, including procedural, where we provide a sequence of instructions; object-oriented, where we create objects with data and perform actions on them; and functional programming, where code is organized around functions. 

With a vast ecosystem of libraries and frameworks, Python is widely used across diverse applications, from web development and data analysis to artificial intelligence and scientific computing. Over the years, Python has transitioned from a specialized scientific computing language to a key player in data science, machine learning, and software development, thanks to its extensive open-source libraries and its adaptability to construct sophisticated data applications.

**Why Python?**
- Python is widely used, increasing the availability of resources that we, as end-users, can leverage.
- Python is free and open-source.
- Python is connected to many software platforms. This kind of integration with other software allows us to expand the functionality of software packages and customize solutions for specific data analysis needs.

However, setting up a Python environment for data science is not straightforward. We need an environment that contains all the packages/libraries in a compatible way. Achieving this isn't always easy.

**Installation** 

We can just install Python from the [official website](https://www.python.org/downloads/). This installation would have us equipped to write and run code directly from our terminal. However, this approach is rather unintuitive, especially for beginners who are just getting started with programming. That's where text editors come into play. Text editors provide a more user-friendly interface for writing and organizing code, offering features like syntax highlighting, auto-completion, and project management tools, making the coding process more intuitive and efficient for programmers of any level. 

We'll use [VSCode](https://code.visualstudio.com) as editor (aka **I**ntegrated **D**evelopment **E**nvironment). VSCode is a very popular editor because it supports almost any language and, by downloading the right extensions, we can addapt the coding to all levels of programming. In this course we will use all the Python extensions and the Jupyter Notebook ones (*python, pylance, jupyter, jupyter cell tags, jupyter keymap, jupyter notebook renderers, jupyter slide show, python indent, python debugger, Python extended*)

With the native Python installation we get a subset of basic packages, but to this environment we need to include the packages/libraries that we need for our particular tasks. Packages are extensions to the basic Python environment developed by the community. Putting it in simple words, we could say they are a collection of scripts that define a new set of functions, data type and methods that allow further manipulation of our data. Because the architechture of all of this packages is highly interlinked, it is easy to run into incompatibilities installing our packages. 

That's why Python distributors like [Anaconda](https://www.anaconda.com/download) become useful, particularly because it allows us to install [Conda](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html), which is a very powerful package and environment manager. In particular, when we use conda to install a package it downloads a version of the package that has already been compiled for our specific operating system and architecture. Furthermore, Conda resolves package dependencies automatically, avoiding conflicts and installation failures. It also enables users to create isolated environments for projects, ensuring clean and reproducible development environments.

To make sure we all have the same working environment, we will follow the installation guideline provided by Joël. After you have downloaded Anaconda and VSCode and Anaconda, we download the environment configuration file (https://datascience.joelmarbet.com/environment.yml). Environment files (with the .yml extension) are files that contain all the dependencies and settings for a Conda environment. This file allows users to recreate the exact environment with all necessary dependencies by simply running a comand on the Conda terminal, making it easier to share and reproduce environments across different systems.
This means that if we all install the same configuration file, we will be sharing the same conda computational environment (i.e., same packages, versions, environment name). Just to review the instructions:

1. After downloading Anacona and the `environment.yml` file, we open a terminal (Mac) or the Anaconda Terminal (Windows)

2. We change the directory to the folder where we have stored the downloaded file by typing `cd path/to/environment`, where `path/to/environment` is the folder containing the environment configuration file.

3. To create the environment we type: `conda env create -f environment.yml` 

4. We check the installation by activating the environment: `conda activate datascience_course_cemfi`.

5. We test the Python version: `python --version`. 

In VSCode, we have to make sure we have downloaded all the relevant extensions and then, we open a new notebook, we should be able to select the new environment to run our code. 

### 1. Jupyter Notebooks

The way in which we will interact with Python code will be through Jupyter Notebooks on VSCode.  Jupyter Notebooks, as well as the popular Jupyter Lab, are all part of the [**Jupyter Project**](https://jupyter.org/try) which revolves around the provision of tools and standards for interactive computing across different computing languages(**Ju**lia,**Py**thon,**R**) through computational notebooks.

Jupyter Notebooks is an example of **computational notebook**. Notebooks merge code, plain language explanations, data, and visualizations into a shareable document. A notebook offers a rapid interactive platform for code typing, data exploration, visualization, and idea sharing, stemming from the belief that the most effective way to learn and program is through hands-on interaction with code. That is, we can pool together in a single file (**a notebook**) all data workflow behind our analysis. 

At the end of the day, notebooks capture interactive session inputs and outputs along with explanatory text providing a comprehensive computational record. The notebook file, saved with the .ipynb extension, is internally JSON files, allowing for version control and easy sharing. Additionally, notebooks can be exported to various static formats. 

Notebooks replace the traditional console-based interactive computing by introducing a web-based application that captures the entire computation process: from code development and documentation to execution and result communication. The Jupyter notebook integrates both elements:

1. **A web application** (although we will use VSCode rather than the browser based extension): This browser-based editing program enables interactive authoring of computational notebooks, offering a swift environment for code prototyping, data exploration, visualization, and idea sharing.

2. **Computational Notebook documents**: These shareable documents combine computer code, plain language explanations, data, visualizations and interactive controls.

The Jupyter notebook interacts with kernels, which are implementations of the Jupyter interactive computing protocol specific to different programming languages. In plain words, *kernels* are the separate process that interprets and executes our code in a given programming language. 

From Jupyter on VSCode we can just open a new Jupyter Notebook on **File** &rarr; **New File**  &rarr; **Jupyter Notebook**. When we open a new notebook, we will find the following elements:

![Notebooks on VSCode](./img1.png)

1. Toolbar of VSCode. 
2. Name of our new notebook: notice that when a white point appears is because there are unsaved changes. Notice that notebooks have the **.ipynb** extension. 
3. Run all: will run all cells of the notebook sequentially. 
4. Kernel: allows to specify the kernel in which we want to run our notebook. 
5. Notebook toolbar: allows to run and edit cells. 
6. Indicator of type of cell (Python or markdown)
7. Cell
8. Add new cells 

The notebook is composed of a series of cells, each serving as a multiline text input field. These cells can be executed by pressing Shift-Enter, or by clicking the "Play" button in the toolbar or selecting "Cell" then "Run" from the menu bar. The behavior of cell execution depends on its type, with three main types: code cells, markdown cells, and raw cells. While every cell begins as a code cell, its type can be altered using a drop-down menu in the toolbar, which will be activated if you press  on the type of cell in the bottom-left corner (i.e., in the image below where it says Python.)

![Notebooks on VSCode](./img3.png)

**Main types of cells**:
- Code

    Allow editing and creating new code, complete with syntax highlighting and tab completion. The choice of programming language depends on the kernel, with the default kernel (IPython) executing Python code.

    Upon execution of a code cell, the resulting computation outputs are then showcased in the notebook as the cell's output. This output isn't limited to text; various forms such as matplotlib figures and HTML tables (as employed in packages like pandas for data analysis) are also supported. 

- Markdown: 

    Allow you to document the computational process in a literate manner, combining descriptive text with code using rich text. In IPython, this is achieved by using Markdown cells, where text is marked up with the Markdown language. Markdown provides a simple way to emphasize text, create lists, and structure the document using headings indicated by hash signs. Markdown headings become clickable links in the notebook and provide structure hints when exporting to other formats like PDF. When a Markdown cell is executed, the Markdown code is converted into formatted rich text, allowing for arbitrary HTML code for further formatting.

- Raw cells: 

    They allow to write output directly, meaning they are not evaluated by the kernel when runing the notebook

**Workflow:**

A workflow in a Jupyter Notebook is run sequentially. As in a script, there is line hierarchy within cells, and there is also sequential hierarchy between cells. However, notebooks have the advantage of allowing us to edit cells separately multiple times until achieving the desired outcome, rather than rerunning separate scripts.

When working on a computational problem, you can organize related ideas into cells and progress incrementally, moving forward once previous parts function correctly, which is very useful when running codes that have snippets that take long. If at anypoint we want to interrupt running a cell, we can achieve that pressing the interrupt cell. Also, we can restart the Kernel to shut down the computational process. 

When we run a cell, the notebook sends that process to the Kernel and prints directly the output of the execution. The execution order appears in square brackets to the left of the printed output. Notice that when we run a notebook, all cells are run in the same kernel. This means that whatever we run in a given cell will remain active unless we restart the Notebook. 

Also, whenever we shut down a Notebook (either by restarting the Kernel or just saving and closing the notebook), all of our processes and outputs that have not been saved to disk will be erased. 

> My tip: while notebooks are highly useful for writing and explaining work processes, particularly for others, they may not be as practical for large research projects. Instead, I often work concurrently on a script that I run in a notebook, executing it step by step until all lines of code run smoothly. Once I have finished the code, I only save the script. 

### 3. Bascis of Python syntax 

Python syntax is determined by the fact that Ptyhon is: 

- Python is an interpreted language. By this we mean, that the Python interpreter will run a program by executing and evaluating one line at a time and in a sequential manner. That is, the source code is executed by interpreter line-by-line, without the need for compilation into machine code beforehand

- Python is an object-oriented programming language. Everything we define in our code exists within the interpreter as Python object, meaning it will have its associated attributes and methods. 

    Attributes are values associated with an object, tha are defined by the class of an object. Typically, we call objects with the syntax `object.attribute_name`.

    Methods are functions associated with objects, which can access the attributes of the object. Typically, we use methods with the syntax `object.method_name()`.

In [None]:
x = 5.2
print('What part of X is real?', x.real) # This is an attribute of the variable x 
print('Is x an integer?', x.is_integer()) # this is a method
print('What type of variable is x?',type(x)) # this is a function

#### 3.1 Creating variables. 

> Nothing prevents Python from using the rewritting variables. So if we first do `x=5` and then `x=6` we will automatically rewrite the variable `x` to 6. 

Variables allow us to define names for objects we will store in memory and use in our program. This call by object reference system implies that after we assign the name to an object, we can access the object by that reference name. For instance, `'x'` above is referencing to the memory object that is storing `5.2`. We can create a new variable `x2` that contains the same elements as `x`: 

In [None]:
x2 = x
x2

Both variables are the same, because they are both refering to the same memory object. However, because numbers are inmutable objects, changes to one won't affect the other. 

In [None]:
x2 = x2 + 5
print("This is 'x2' variable after we sum 5 to it:", x2)
print("This is our originary 'x', which hasn't changed", x)

> Operation signs: additions(`+`), substractions(`-`), multiplication (`*`), division (`/`), exponent(`**`), remainder of division aka modulo (`%`)

Immutable objects are also strings and tuples (more on this later), but there are also mutable objects like lists. In the case of mutable objects, if we create two variables that are just referencing the same, changing one **will affect** the other because both variables refer to the same memory object that has been modified. 

In [None]:
y = ['abc','def']
y2 = y
y2.append('ghi')
print("This is 'y2' variable after we add a new string:", y2)
print("This is our orignary 'y', but...", y)

In [None]:
print(y)

We can also assign variable names to different objects in the same code line separating them through commas. 

In [None]:
first_letters,second_letters, third_letters = 'abc','def','ghi'
print(first_letters)
print(second_letters)
print(third_letters)


#### 3.2 Naming variables: good practices. 

Just as in writing regular text, coding also involves an idyosincratic coding style that often results in differences between two individuals' code, even if they are accomplishing the same task or using the same functions. This divergence is especially noticeable in variable naming. However, whether if we are more of a concise or descriptive type, there are some rules and good practices for everyone: 
1. Names can only contain upper and lower case letters, underscores (_), numbers. 

    > Important: Python is case sensitive, wich means that variables with different capitalization are considered distinct (i.e., `Abc`, `abc` and `ABC` are all different)
2. Names can never start with a digit number, only letters or underscore. 
3. They cannot be **keywords** (Python language reserved words) (`help(keywords)` will show which words are).
4. We typically use only lower case letters for variables, and reserve upper case letters for parameters or constants. 
5. Names should be self-explaining and balance out brevity and explicability. Typically, we reserve names for variables, verbs for attributes, adjectives for booleans.
6. We can use names in any language, but in general English is preferred so that anyone could follow the code.
7. Avoid using built-in function names because that will overwrite the function (i.e., if we write `type` we will no longer be able to use `type` to access the `type' of variables).

#### 3.3 Data types. 

Unlike some other programming languages, Python dynamically infers the type of a variable based on the assigned value. We can access the type of variables using the `type()` function. 

Primitive data types are: strings, integers, floats, booleans, None. While I am hard-coding some examples for booleans,  it's more common for them to be generated as the outcome of assessing a condition within a function (i.e., Am I Alba? True). In fact, booleans are typically obtained by comparisons and can be manipulated with logical operations (`and`, `not`, `or`)

In [None]:
this_is_int = 7
type(this_is_int)

In [None]:
this_is_float = 5.2
type(this_is_float)

In [None]:
this_is_a_string= 'abc'
type(this_is_a_string)

In [None]:
this_is_bool =True
type(this_is_bool)

In [None]:
# More booleans examples
print('5 > 3', 5 > 3)      
print('5 > 5\t', 5 > 5)       
print('5 != 4.9', 5 != 4.9)

In [None]:
this_is_None = None
type(this_is_None)

Besides the **primitive datatypes**, there are also **containers datatypes** which are a collection of objects  in a particular given structure. They can be tuples, lists or dictionaries. 

\- **Tuples**: fixed sequence of immutable python objects. They are defined as a sequence of elements, separated by a comma between brackets (`()`) and can mix primitive data types. 

We can acces its elements using indexing, which we call as `name_of_tuple[i]`, where i indicates the position of the element we want. From left to right, we start counting from 0 on the first element. From right to left, we start counting on (-1) as the last. 

> Notice that for Python, the `=` is not the mathematical equality sign. It simply establishes an assignment saying left-hand side is right-hand side. When we want to make comparisons to check equality, we use `==`. 

In [None]:
this_is_tuple = (2,3,4)
this_is_tuple_str = ('a','b','c')
this_is_tuple_mix = (2,'b',True)
this_is_nested = ('a',(2,3,4),'b')

print(type(this_is_tuple), type(this_is_tuple_str), type(this_is_tuple_mix), type(this_is_nested))
print(this_is_tuple[0])

In [None]:
this_is_tuple[4]=8

While tuples are immutable (*don't support item assignment*), we can operate with them in different ways. We can concatenate them using the `+` sign, we can repeat them a n number of times using `*n*`, and we can unpack them. 

In [None]:
print(this_is_tuple + this_is_tuple_str + this_is_tuple_mix) # concatenating
print(this_is_tuple_mix*3) # repeating 
unpack1, unpack2, unpack3 = this_is_tuple_str # unpack 
print(unpack1)

#If we use `*_` in the unpack, we will discard some elements

*_,only_third_e, a = this_is_tuple # unpack 
print(only_third_e)


In [None]:
this_is_tuple

In [None]:
element1, element2, *_= this_is_tuple

In [None]:
unpack1, unpack2,*_  = this_is_tuple
print(unpack2)


In [None]:
only_first_e, *_,a = this_is_tuple # unpack 
print(a)

\- **Lists**: modifiable sequence of mutable Python objects (i.e., a tuple that we can change). They are also defined as a comma separated sequence of elements, but differently from tuples, with square brackets (`[]`)

In [None]:
this_is_list = ['a',4,['a',True]]
print(this_is_list)

We can slice them in the same way as tuples. 

In [None]:
this_is_list[2]

In [None]:
this_is_list

In [None]:
print(this_is_list[2]) # 3rd element
print(this_is_list[0:2]) # starts at 0, ends at 2-1 (i.e., 1 which is the second element)

We can extend them, using `append()` and, also, we can modify them by inserting elements in specfic positions using  `insert(positions,what we want to insert)`

In [None]:
this_is_list.insert(2,'Hola')
print(this_is_list)

In [None]:
this_is_list.append(['a','b','c'])
print(this_is_list)
this_is_list.append('hello')
print(this_is_list)

# let's inset hola as the second element
this_is_list.insert(2,'Hola')
print(this_is_list)


We can also remove elements  in a given position using `pop(index)` and the first occurence of a given value with `remove(value`)

In [None]:
this_is_list.remove('a')

In [None]:
this_is_list

In [None]:
this_is_list.pop(2)
print(this_is_list)
this_is_list.append('a')
this_is_list.remove('a')
print(this_is_list)

\- **Dictionaries**: unordered collection of items. Items consist on pairs of Python objects named keys and values. Each key is associated with a unique value, like in a dictionary entry, which allows us to access, retrieve and modify its values. Dictionaries are typically created using curly braces as `{key1:value1,key2:value2}`

In [None]:
weekday_activities = {
    "Monday": ["Go to work", "Attend meetings", "Exercise", "Cook dinner"],
    "Tuesday": ["Work on projects", "Run errands", "Visit the gym"],
    "Wednesday": ["Work remotely", "Attend class", "Grocery shopping"],
    "Thursday": ["Client meetings", "Volunteer", "Cook dinner"],
    "Friday": ["Finish work tasks", "Meet friends for dinner", "Movie night"],
}

In [None]:
weekday_activities['Monday']

To access one of the elements, we can look for the corresponding key. 

We can also obtain all keys and values, through methods `keys()`, `values()`, `items()`. The results of these methods, although they migh seems like lists, cannot be subscripted. 

In [None]:
print(weekday_activities.keys())
print(weekday_activities.values())
print(weekday_activities.items())


We can add new entries by specifying a new key and value pair. 

In [None]:
weekday_activities['Saturday']='Fondue night'
weekday_activities

It is also possible to use `.update()` to add new entries and modify other.

In [None]:
weekday_activities.update({'Friday':'Birthday Party','Sunday':'Hike'})
weekday_activities

Notice, however that if we already have an entry with a given key and we perform a new assign or update, we will overwrite it. 

In [None]:
weekday_activities['Friday']='Cinema'
weekday_activities

To add extra elements to an existing entry, we can use append. However, to be able to have this funcionality we should always write the values of the entries as list (i.e., in square brackets **[]**)

In [None]:
weekday_activities

In [None]:
weekday_activities['Friday']=['Cinema']
weekday_activities['Friday'].append('Dinner out')

In [None]:
weekday_activities

We can again use `pop` to remove elements specifying a key. Equivalently, we can use `del` to remove an entire entry. 

In [None]:
weekday_activities.pop('Monday')
weekday_activities

In [None]:
del weekday_activities['Tuesday']
weekday_activities

Using `items()` we can iterate through each entry in the dictionary. To do this, for each entry, we unpack the two elements that appear in the tuple generated by `.items()`

In [None]:
weekday_activities.items()

In [None]:
for keys, values in weekday_activities.items():
    print('My schefule for', keys, 'is', values)

Also, if we have a collecion of keys and values, we can write them as a dictionary using .`zip(k,v)`. This will automatically assign the keys to the values in an ordered manner (i.e., first element of the key list cooresponds to first element of the value list).

In [None]:
bday_dict = {} # initialize empty 
key_list = ('November','December','January','June')
value_list=("Joël's birthday","Utso's birthdya", "Alba's birthday","Yang's Birthday")
for key, value in zip(key_list, value_list):
    bday_dict[key] = value
bday_dict

#### 3.4 Functions. 

Sometimes we find that our code is repeating the same few lines multiple times, or we are sharing the same lines in different scripts. **Functions** in Python allow us to encapsulate a set of instructions and reuse them throughout our code. This helps us avoid repeating the same code multiple times and makes our code more organized and easier to maintain, while the complexity of the function is abstracted away. 

In essence, functions contains three elements: the arguments, the commands we execute on them (i.e., the body of the function), and the return values. They have the following structure: 
```python
def function_name(arg1, arg2):
    # do something with argument 1 and argument 2 to get the result r3
    return r3
```

It is also possible to avoid returning anything by specifying ```return None```  or nothing at all or even use `print()` expressions. 

In [None]:
def function_with_return(x,y):
    return x+y

def function_with_print(x,y):
    print(x+y)
    print(x+2)


In [None]:
function_with_return(4,6)

In [None]:
r1 = function_with_return(4,6)

In [None]:
r1

In [None]:
function_with_print(4,6)

In [None]:
r2 = function_with_print(4,6)

In [None]:
print(r2)

In [None]:
x=5
def function_with_print(x,y):
    print(x+y)

    print(x+2)
    return x+2

In [None]:
function_with_print(4, 6)

Although a function can have multiple `returns`, it cannot have more than one `return` line (unless they are part of some conditions) because the function stops evaluating the code after it reaches the first `return`. 

In [None]:
def returns_two_vars(x,y):
    return x+y, x*2

returns_two_vars(2,3)

And we could access them with our unpacking strategy

In [None]:
r1,r2=returns_two_vars(2,3)
print('r1:',  r1)
print('r2:', r2)

In [None]:
def returns_two_vars(x,y):
    return x+y
    return x*2

In [None]:
returns_two_vars(2,3)

Notice if we call as argument a variable that has been defined outside the function, it will automatically take that value (i.e., it's *global* to the function). If the variable is also defined inside, it will be *local* within the function.

In [None]:
global_var = 10

def example_function(global_var):
    # Access and modify the global variable
    print("Input you gave me", global_var) 
    global_var = global_var +5  # Modify the global variable
    print("Inside the function - modified global_var:", global_var)  
    
# Call the function
example_function(global_var)
print("Outside the function - global_var:", global_var) 

# Call the function
example_function(7)
print("Outside the function - global_var:", global_var) 


#### 3.5 Control flow: branching and looping. 
Every script, and computer programs in general, consist of instructions that are executed sequentially from "top" to "bottom". This sequence is what we know as flow. It's possible to modify this sequential flow to include branching or repeating certain instructions. The statements that allow us to make these modifications are grouped under flow control. 

##### 3.5.1  Branching: `if, elif, else`

Branching refers to the capacity to make decisions and execute distinct sets of statements depending on whether one or more conditions are met. In Python, branching departs from the `if` statement. 

```python
if condition:
    do line1
    do line2
```

Whenever the condition is found to be true, the code will perform line1 and line2. The way Python understands line1 and line2 are associated with the if statement is because of *identation* (i.e., white space to the left of the beginning of a line). The 4 spaces from the left to where `do line1` is written constitute the Python ident. We can use the `tab` key to get the ident and `shift+tab` to remove one ident. Idents are cumulatives, so if we had two blocks, the lines within the second would have to have two idents. 

For instance, the line below would perform line1 only if the condition is true, and after it would perform line 2 for all. 

```python
if condition:
    do line1
    
do line2
```

In [None]:
month = 'January'
if (month=='January') or (month=='February'):
    print('Hey, you are in the idented block')

    print('The entire month of', month, ' is considered to be in the winter')

In [None]:
month = 'June'
if (month=='January') or (month=='February'):
    print('Hey, you are in the idented block')
    print('The entire month of', month, ' is considered to be in the winter')
    

But what if we still want our branch to do something when the first condition is not met? We can achieve this by specifying a new action with `else`:
```python
if condition1:
    do line1
else:
    do line2
```
Notice that, whenever condition1 is not true, the `else` condition will imply we perform line2. 

In [None]:
month = 'May'
if (month=='January') or (month=='February'):
    print('Hey, you are in the idented block')
    print('The entire month of', month, 'is considered to be in the winter')
else: 
    print('The entire month of', month, 'is NOT considered to be in the winter')

We can also be more specific and chain conditions using `elif`. This will start executing the first condition, if it is not true, it will try the second and so on and so forth. Once it has found a condition for which it is `True` it will execute the lines under it and will stop evaluating the rest of conditions. This means that at most one - the first encountered- `True` statement will be evaluated. 

In [None]:
month = 'May'
if (month=='January') or (month=='February'):
    print('Hey, you are in the idented block 1')
    print('The entire month of', month, 'is considered to be in the winter')
elif (month=='April') or (month=='May'): 
    print('Hey, you are in idented block 2')
    print('The entire month of', month, 'is considered to be in the spring')
elif (month=='July') or (month=='August'): 
    print('Hey, you are in idented block 3')
    print('The entire month of', month, 'is considered to be in the summer')
elif (month=='October') or (month=='November'): 
    print('Hey, you are in idented block 4')
    print('The entire month of', month, 'is considered to be in the autumn')
else:
    print('Hey, you reached the else')
    print('The month', month, 'is in between two different seasons')    

In [None]:
month = 'January'
if (month=='January') or (month=='February'):
    print('Hey, you are in the idented block 1')
    print('The entire month of', month, 'is considered to be in the winter')
    if month=='January':
        print('We are in the nested if')
        print("It's cold")
elif (month=='April') or (month=='May'): 
    print('Hey, you are in idented block 2')
    print('The entire month of', month, 'is considered to be in the spring')
elif (month=='July') or (month=='August'): 
    print('Hey, you are in idented block 3')
    print('The entire month of', month, 'is considered to be in the summer')
elif (month=='October') or (month=='November'): 
    print('Hey, you are in idented block 4')
    print('The entire month of', month, 'is considered to be in the autumn')
else:
    print('Hey, you reached the else')
    print('The month', month, 'is in between two different seasons')    

If we want a statement to do nothing in Python, we use the pass keyword. This is because Python does not allow an empty block within a conditional statement. It's akin to being at a crossroads where we have to choose between turning left or right, but instead, we decide to wait and take no action.

In [None]:
month = 'April'
if (month=='January') or (month=='February'):
    print('Hey, you are in the idented block 1')
    print('The entire month of', month, 'is considered to be in the winter')
elif (month=='April') or (month=='May'): 
    pass
elif (month=='July') or (month=='August'): 
    print('Hey, you are in idented block 3')
    print('The entire month of', month, 'is considered to be in the summer')
elif (month=='October') or (month=='November'): 
    print('Hey, you are in idented block 4')
    print('The entire month of', month, 'is considered to be in the autumn')
else:
    print('Hey, you reached the else')
    print('The month', month, 'is in between two different seasons')    

##### 3.5.2  Looping: `while`, `for`

Sometimes, we also want to repeat the same statement multiple times. This is what we mean when we refer to `iteration` or `looping`, which we achieve in python through `while` and `for`. 

\- **While** loops execute a statement whenever the condition is true. We typically use this kind of loop to perform some operation or update in the variable. 
```python
while condition:
    do this
```

In [None]:
good_morning='Y'
while good_morning == 'Y':
    print('Good morning, Joël!')
    print('Good morning, Alba!')
    good_morning = input('Do we keep saying good morning? [Y/N]')
print('Awesome! Enjoy your class')

Notice that the previous loop can go on and on until we press N. Also, if we remove the line `good_morning = input('Do we keep saying good morning? [Y/N]')` it would just keep going and going. That's why typically in while loops we update the variable to make sure we stop when we reach a given amount of iterations. 

In [None]:
good_morning='Y'
iteration=0
while good_morning == 'Y':
    print('Good morning, Joël!')
    print('Good morning, Alba!')
    iteration += 1
    if iteration==100:
        break
    good_morning = input('Do we keep saying good morning? [Y/N]')
print('Awesome! Enjoy your class')

- **For**: we use `for` loops when we want to iterate over elements of a sequence. 

In [None]:
for i in "Cemfi":
    print(i.upper())

In [None]:
months_of_year = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]

# Loop through the months and add some summer vibes
for month in months_of_year:
    if month == "June":
        print(f"Get ready to enjoy the summer break, it's {month}!")
    elif month == "July" or month =='August':
        print(month,"is perfect to find reasons to escape from Madrid")          
    else:
        print(f"Winter is coming")

We can use the funciton `range` also to get a sequence of number to loop over. It follows the syntax `range(start, stop, step)`:
- Start is optional and it starts from 0 unless otherwise specified. 
- Stop: is always needed. The last number will always be stop-1
- Step: optional, defaults to 1. 

In [None]:
for i in range(10):
    print(i+2)

### 4. NumPy. 

A module, in Python, is a program that can be imported into interactive mode or other programs for use. A Python package typically comprises multiple modules. Physically, a package is a directory containing modules and possibly subdirectories, each potentially containing further modules. Conceptually, a package links all modules together using the package name for reference. 

NumPy (**Num**erical **Py**thon) is one of the most common packages used in Python. In fact, numerous computational packages that offer scientific capabilities utilize NumPy's array objects as a standard interface for data exchange. That's why, although NumPy doesn't inherently have scientific capabilities, understanding NumPy arrays and array-based computing principles can save your time in the future. 

NumPy offers a vast array of efficient methods for creating and manipulating numerical data arrays. Unlike Python lists, which can accommodate various data types within a single list, NumPy arrays require homogeneity among their elements for efficient mathematical operations. Utilizing NumPy arrays provides advantages such as faster execution and reduced memory consumption compared to Python lists. With NumPy, data storage is optimized through the specification of data types, enhancing code optimization.

In [None]:
import numpy as np 

The array serves as a fundamental data structure within the NumPy. They represent a grid of values containing information on raw data, element location, and interpretation. Elements share a common data type, known as the array dtype.

One method of initializing NumPy arrays involves using Python lists, with nested lists employed for two- or higher-dimensional data structures.

In [None]:
a = np.array([1, 2, 3, 4, 5, 6])

We can access the elements through indexing. 

In [None]:
a[0]

Arrays are N-Dimensional (that's why sometimes we refer to them as NDarray). That means that NumPy arrays will encompass vector (1-Dimensions), Matrices (2D) or tensors (3D and higher). We can get all the information of the array by checking its attributes. 

In [None]:
a = np.array([[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]])

In [None]:
a

In [None]:
print('Dimensions/axes:', a.ndim)
print('Shape (size of array in each dimension):', a.shape)
print('Size (total number of elements):', a.size)
print('Data type:', a.dtype)


We can initialize arrays using different commands depending on what we are aiming at. 

For instance, the most straightforward case would be to pass a list to `np.array()` to create one: 

In [None]:
arr1 = np.array([5,6,7])
arr1

However, sometimes we are more ambiguous and have no information on what our array contains. We just need to be able to initialize an array so that later on, our code, can update it. For this, we typically create arrays of the desired dimensions and fill them with zeros (`np.zeros()`), ones (`np.ones()`), with a given value (`np.full()`) or without initializing (`np.empty()`). 

> Note: when working with large data, `np.empty()` can be faster and more efficient. Also, large arrays can take up most of your memory and, in those cases, carefully establishing the `dtype()` can help to manage memory more efficiently (i.e.,  chose 8 bits over 64 bits.)

In [None]:
from numpy import zeros as nanazeros

In [None]:
nanazeros(4)

In [None]:
np.ones((2,2))

Here you have an example of a 3D array of ones: if has 3 rows, 2 columns and 1 of height(depth)

In [None]:
np.ones((3,2,1))

We can use `np.full()` to create an array of constant vales that we specify in the `fill_value` option. 

In [None]:
np.full((2,2,) , fill_value= 4)

When we use `np.empty()` we are creating an unitialized array, in the sense that it is reserving the requested space in memory and returns an array with 'garbage' values. 

In [None]:
np.empty(2)

With `np.linspace()` we create an array with values that are equally spaced between the start and endpoint. For instance, in the below code we are creating an array with 5 equally spaced values from 0 to 20. 

In [None]:
np.linspace(0,20,num=5)

#### 4.1 Managing array elements. 

Arrays accept common operations like sorting, concatenating and finding unique elements. 

For instance, using the `sort()` method we can sort elements within an array. 

In [None]:
arr1 = np.array((10,2,5,3,50,0))
np.sort(arr1)

In multidimensional arrays, we can sort the elements of a given dimension by specifying the axis (0 within columns, 1 across rows)

In [None]:
mat1 = np.array([[1,2,3],[8,1,5]])
mat1

In [None]:
mat1.sort(axis=0)
mat1

Using `concatenate` we can join the elements of two arrays along an existing axis. 

In [None]:
arr1 = np.array((1,2,3))
arr2 = np.array((6,7,8))
np.concatenate((arr1,arr2))

Instead, if we want to concatenate along a new axis, we use `vstack()` and `hstack()` 

In [None]:
np.vstack((arr1,arr2))

In [None]:
np.hstack((arr1,arr2)) # equivalent in this case to concatenate 

It is also possible to reshape arrays. For instance, let's reshape the concatenation of `arr1` and `arr2` to 3 rows and 2 columns

In [None]:
arr_c= np.concatenate((arr1,arr2))
arr_c.reshape((3,2))

We can also perform aggregation functions over all elements, like finding the minimum, maximum, means, sum of elements and much more. 

In [None]:
print(arr1.min())
print(arr1.sum())
print(arr1.max())
print(arr1.mean())

It is also possible to get only the unique elements of an array or to count how many elements are repeated. 

In [None]:
arr1 =np.array((1,2,3,3,1,1,5,6,7,8,11,11))
print(np.unique(arr1))
unq, count = np.unique(arr1, return_counts=True)

In [None]:
count # First element appears 3 times, second 1... 

Comparing NumPy arrays is also possible using operators as `==`, `!=`, and the like. Comparisons will result in an array of booleans indicating if the condition is met for the cells. 

In [None]:
arr1 = np.array(((1,2,3),(4,5,6)))
arr2 = np.array(((1,5,3),(7,2,6)))
arr1==arr2

#### 4.2 Arithmetic operations with arrays. 

From algebraic rules, we can only perform operations on an array and a scalar or with two arrays of the same shape. NumPy arrays support common operations as addition, substraction and multiplication as long as those two conditions are met. 

For NumPy, operations that happen within cells are what we know as **broadcasting**. **Broadcasting** is how NumPy operates between two arrays with different numbers of dimensions but compatible shape. 

In [None]:
arr1 = np.array(((1,2,3),
                (4,5,6)))
arr2 = np.array(((10,20,30),
                 (40,50,60)))

Element-wise addition, substraction and multiplication can be performed with `+`, `-` and `*`. 

In [None]:
arr1+arr2

In [None]:
arr1-arr2

In [None]:
arr1*arr2

To multiply (`*`) or divide (`/`) all elements by an scalar, we just specify the scalar. 

In [None]:
arr1*10

In [None]:
arr2/10

Matrix multiplication is achieved with `matmul()`. Because both of our matrices are 3-by-2, I will transpose one of them so that we can perform matrix multiplicaiton. 

In [None]:
np.matmul(arr1,arr2.T) 

#### Practice exercies. 
1.  Create a 1D array with all integer elemenets from 1 to 10 (both included). No hard-coding allowed!
2.  From array you created in 1, create one that contains all odd elements and one with all even elements. 
3.  Create a new array that replaces all elements in 1 that are odd by -1. 
4.  Create a 3-by-3 matrix filled with 'True' values (i.e., booleans).
5.  Suppose you have array `a=np.array(['a','b','c','d','e','f','g'])` and `b = np.array(['g','h','c','a','e','w','g'])`. Find all elements that are equal. Can you get the position where the elements of both arrays match?
6.  Write a function that takes a element an array and prints elements that are divisible by a given number. Try it creating an array from 1 to 20 and printing divisibles by 3.  
7. Consider two matrices, A and B, both of size 100x100, filled with random integer values between 1 and 10. Implement a function to perform element-wise multiplication of these matrices using nested loops. Implement the same operation using Numpy's vectorized multiplication. Repeart again with matrices of size 1000x1000, 10000x10000, 1000000x1000000 and compare the execution time. Which one is faster? 


In [None]:
#1. 
array_ex = np.array(range(1,11))
print(array_ex)

In [None]:
#2. 
odds = [o for o in array_ex if o%2==1]
print(odds)
even = [o for o in array_ex if o%2==0]
print(even)


In [None]:
#3. 
arr_r = array_ex
arr_r = np.where(arr_r%2,arr_r,-1)
arr_r

In [None]:
#4. 
mat3 = np.full((3,3),fill_value =True,dtype=bool)
mat3
print('Congrats, you just created your first mask!')

In [None]:
#5. 
a=np.array(['a','b','c','d','e','f','g'])
b = np.array(['g','h','c','a','e','w','g'])
inter=np.intersect1d(a,b)
print(inter)

inter, apos, bpos = np.intersect1d(a,b,return_indices=True)
print(apos)
print(bpos)

In [None]:
#6. 
def print_divisible(array,div):
    for i in array: 
        if i%div == 0:
            print(i,'is divisible by', div)
    return None

arr =np.array(range(1,21))
print_divisible(arr,3)

In [None]:
#7
mat_a =np.random.randint(1, 11, size=(1000, 1000))
mat_b =np.random.randint(1, 11, size=(1000, 1000))

def multiply_in_loop(m1,m2):
    if m1.shape != m2.shape:
        raise ValueError("Matrices A and B must have the same shape.")

    m, n = m1.shape
    result = np.zeros((m, n))

    for i in range(m):
        for j in range(n):
            result[i, j] = m1[i, j] * m2[i, j]

    return result


In [None]:
out= multiply_in_loop(mat_a,mat_b)


In [None]:
out_built = np.multiply(mat_a,mat_b)

In [None]:
(out_built == out).sum()