**My First Class Demo Notebook – A Simple Tutorial**

It is important to mention that we have just made a new selection for this course. You don't have to do everything, but it's highly recommended that you go through these fundamental concepts and learn how to master them. This course is not a programming course, but we will give you the essential content so that you can draw on it for your future engineering projects.

# Create or open a Jupyter Notebook

# Coding capabilities

## A first computation and plot

In [31]:
import matplotlib.pyplot as plt

## Python basics to learn today's list

Basics:
- [ ] Integer
- [ ] Float 
- [ ] Boolean
- [ ] String
- [ ] Basic operations

Lists:
- [ ] Indexing and slicing
- [ ] List concatenation and mutation
- [ ] Nested list

Dictionnary
- [ ] Data access
- [ ] Dictionnary concatenation and mutation
- [ ] Nested dictionnary

Loop and conditions
- [ ] `if` statements
- [ ] `while` statements
- [ ] `for` statements
- [ ] `range()` function

Packages
- [ ] NumPy
- [ ] Pandas
- [ ] Pandapower

### 3.1 `if` statements
Perhaps the most well-known statement type is the [`if`](https://docs.python.org/3.10/reference/compound_stmts.html#if) statement. For example:\
**Pay attention to the systax: there is not end to if statements, every lines bellow an if and containing a tab are within the if statement**

There can be zero or more [`elif`](https://docs.python.org/3.10/reference/compound_stmts.html#elif) parts, and the [else](https://docs.python.org/3.10/reference/compound_stmts.html#else) part is optional. The keyword '`elif`' is short for 'else if', and is useful to avoid excessive indentation. An `if` ... `elif` ... `elif` ... sequence is a substitute for the `switch` or `case` statements found in other languages.


### 3.2 `while` statements

With the while statement, we can execute a set of statements as long as a condition is true.
For instance, we can write an initial sub-sequence of the [Fibonacci series](https://en.wikipedia.org/wiki/Fibonacci_number) as follows

The keyword argument _end_ can be used to avoid the newline after the output, or end the output with a different string:

### 3.3 `for` statements

The [`for`](https://docs.python.org/3.10/reference/compound_stmts.html#for) statement in Python iterates over the items of any sequence (list, dictionnary, string, ...), in the order that they appear in the sequence. This differs from what you may be used to in C or Pascal where you always iterate over an arithmetic progression of numbers (like in Pascal), or give the user the ability to define both the iteration step and halting condition (as C).

You get the colection elements number using `enumerate()`:

Be carfull in code that modifies a collection while iterating over that same collection, it is safer to create a new collection:

To go further, You can find other [Looping Techniques](https://docs.python.org/3.10/tutorial/datastructures.html#tut-loopidioms).

### 3.4 `range()` function

If you need to iterate over a sequence of numbers, the built-in function `range()` will come in handy. It generates arithmetic progressions:

The given end point is never part of the generated sequence; `range(10)` generates 10 values, the legal indices for items of a sequence of length 10. It is possible to let the range start at another number, or to specify a different increment (even negative; sometimes this is called the 'step'):

To iterate over the indices of a sequence, you can combine [`range()`](https://docs.python.org/3.10/library/stdtypes.html#range) and [`len()`](https://docs.python.org/3.10/library/functions.html#len) as follows:

In many ways the object returned by `range()` behaves as if it is a list, but in fact it isn’t. It is an object which returns the successive items of the desired sequence when you iterate over it, but it doesn’t really make the list, thus saving space.

We say that such an object is [iterable](https://docs.python.org/3.10/glossary.html#term-iterable), meaning that it is suitable as a target for functions and constructs that expect something from which they can obtain successive elements until the supply is exhausted. We've seen that the [`for`](https://docs.python.org/3.10/reference/compound_stmts.html#for) statement is such a construct, while an example of a function that takes an iterable object is [`sum()`](https://docs.python.org/3.10/library/functions.html#sum):

Later we will see more functions that return iterables and take iterables as arguments. In chapter [Data Structures](https://docs.python.org/3.10/tutorial/datastructures.html#tut-structures), we will discuss in more detail about [`list()`](https://docs.python.org/3.10/library/stdtypes.html#list).

Coming soon ...

## References for Python's basics

If you want to get more insights and learn the basics in Python, please visit the following two very good references:

- [Learn X in Y minutes](https://learnxinyminutes.com/docs/python/)
- [The Python Tutorial](https://docs.python.org/3/tutorial/)

# **Panda**power

This tutorial explains **panda**power library basis used in RHT laboratories. **panda**power is an easy to use tool for loadflow and short-circuit calculations in power systems. To go further, we recommend you take a look at these two
- [**panda**power's documentation](https://pandapower.readthedocs.io/en/v2.13.1/elements/trafo.html).
- [**panda**power's tutorials](https://www.pandapower.org/start/#interactive-tutorials-).
- [**panda**power_heig_ui's documentation](https://heig-vd-iese.github.io/pandapower-heig-ui/).

## Create a small power network

We consider the following simple 3-bus example network from [**panda**power's tutorial](https://github.com/e2nIEE/pandapower/blob/master/tutorials/minimal_example.ipynb).

<img alt="pandapower with 2-bus minimal example" width ="600" caption="Figure 1 – A minimal example with a 2-bus power system including PQ-load" src="https://github.com/e2nIEE/pandapower/raw/develop/tutorials/pics/3bus-system.png" id="pandapower_simple"/>

The above network can be created in pandapower as follows:

In [32]:
import pandapower as pp

### Create a **panda**power empty power network object

In order to create an empty network object, we can run the following **panda**power command:

In [33]:
# Create an empty network
net = pp.create_empty_network()
net

This pandapower network is empty

The empty network object is composed by a dictionary of pandas DataFrame.

### Create three buses with different voltage levels

We need three buses with different voltage levels and names, buses are the elements which connect equipments together:

In [34]:
# Create buses
bus1 = pp.create_bus(net, vn_kv=20., name="Bus 1")
bus2 = pp.create_bus(net, vn_kv=0.4, name="Bus 2")
bus3 = pp.create_bus(net, vn_kv=0.4, name="Bus 3")

**Remark**: We need to pay attention that voltage levels are express in kV and correspond to line voltages.

### Creating a transformer and connecting it to the network

For a full understanding of the parameters to be applied, please read the [documentation](https://pandapower.readthedocs.io/en/v2.13.1/elements/trafo.html) carefully. 

In order to create the transformer object that will be connection to the network previously created, we can proceed as follows:

In [35]:
# Create transformer
trafo = pp.create_transformer_from_parameters(
    net, hv_bus=bus1, lv_bus=bus2, sn_mva=0.4, vn_hv_kv=20.0, vn_lv_kv=0.4, vk_percent=6.0, vkr_percent=1.425, pfe_kw=1.35, i0_percent=0.3375, name="Trafo")

# trafo = pp.create_transformer(net, hv_bus=bus1, lv_bus=bus2, std_type="0.4 MVA 20/0.4 kV", name="Trafo")

**Remark**: pay attention in parameters units and in voltage levels matchs.

### Create a transmission line and connect it to the network

In [36]:
line = pp.create_line_from_parameters(net, from_bus=bus2, to_bus=bus3, length_km=0.1, r_ohm_per_km=0.642, x_ohm_per_km=0.083, c_nf_per_km=210, max_i_ka=0.142, name="Line")

### Create a load connect it to the network

In [37]:
# Create bus elements
load = pp.create_load(net, bus=bus3, p_mw=0.100, q_mvar=0.05, name="Load")

### Create an external grid connection

This element is mandatory to be able to perform powerflow simulations. It insures to keep powers balanced within the power network:

In [38]:
ext_grid = pp.create_ext_grid(net, bus=bus1, vm_pu=1.20, name="Grid Connection")

## Data structure and data access

A **panda**power network object is structured as a dictionary:

- Keys are the type names of power network equipments names such as line, load transformer, etc. (string).
- Values are tables which contains all the information needed about their corresponding equipments (pandas DataFrame).
 
By calling the network have a quick overview of it and the number of element for each equipment.

In [39]:
net

This pandapower network includes the following parameter tables:
   - bus (3 element)
   - load (1 elements)
   - ext_grid (1 elements)
   - line (1 elements)
   - trafo (1 elements)

There are two ways to get the one equipment type table:

- By using the dictionary way to call values.
- By using the pandapower object.

In [40]:
net["bus"]

Unnamed: 0,name,vn_kv,type,zone,in_service
0,Bus 1,20.0,b,,True
1,Bus 2,0.4,b,,True
2,Bus 3,0.4,b,,True


In [41]:
type(net["bus"])

pandas.core.frame.DataFrame

In [42]:
net.bus

Unnamed: 0,name,vn_kv,type,zone,in_service
0,Bus 1,20.0,b,,True
1,Bus 2,0.4,b,,True
2,Bus 3,0.4,b,,True


In [43]:
net.trafo

Unnamed: 0,name,std_type,hv_bus,lv_bus,sn_mva,vn_hv_kv,vn_lv_kv,vk_percent,vkr_percent,pfe_kw,i0_percent,shift_degree,tap_side,tap_neutral,tap_min,tap_max,tap_step_percent,tap_step_degree,tap_pos,tap_phase_shifter,parallel,df,in_service
0,Trafo,,0,1,0.4,20.0,0.4,6.0,1.425,1.35,0.3375,0.0,,,,,,,,False,1,1.0,True


In [44]:
net.line

Unnamed: 0,name,std_type,from_bus,to_bus,length_km,r_ohm_per_km,x_ohm_per_km,c_nf_per_km,g_us_per_km,max_i_ka,df,parallel,type,in_service
0,Line,,1,2,0.1,0.642,0.083,210.0,0.0,0.142,1.0,1,,True


In [45]:
net.load

Unnamed: 0,name,bus,p_mw,q_mvar,const_z_percent,const_i_percent,sn_mva,scaling,in_service,type
0,Load,2,0.1,0.05,0.0,0.0,,1.0,True,wye


To have access to one specific element or value of a table, use Pandas functions:

In [46]:
net.bus.loc[0,:]

name          Bus 1
vn_kv          20.0
type              b
zone           None
in_service     True
Name: 0, dtype: object

In [47]:
type(net.bus.loc[0, :])

pandas.core.series.Series

In [48]:
net.bus.at[0, "name"]

'Bus 1'

We can also modify the data using Pandas function:

In [49]:
net.bus.loc[0, "name"] = "hv_bus"
net.bus

Unnamed: 0,name,vn_kv,type,zone,in_service
0,hv_bus,20.0,b,,True
1,Bus 2,0.4,b,,True
2,Bus 3,0.4,b,,True


## Run power flow

Now we can run a balanced power flow calculation using the following command:

In [50]:
pp.runpp(net)
net

This pandapower network includes the following parameter tables:
   - bus (3 element)
   - load (1 elements)
   - ext_grid (1 elements)
   - line (1 elements)
   - trafo (1 elements)
 and the following results tables:
   - res_bus (3 element)
   - res_line (1 elements)
   - res_trafo (1 elements)
   - res_ext_grid (1 elements)
   - res_load (1 elements)

Then if you check you **panda**power object you will see that powerflow results tables have been added.

It may also be interesting to consult the results for buses, lines and transformers:

In [51]:
net.res_bus

Unnamed: 0,vm_pu,va_degree,p_mw,q_mvar
0,1.2,0.0,-0.106038,-0.051875
1,1.190635,-0.539841,0.0,0.0
2,1.153532,0.080701,0.1,0.05


In [52]:
net.res_line

Unnamed: 0,p_from_mw,q_from_mvar,p_to_mw,q_to_mvar,pl_mw,ql_mvar,i_from_ka,i_to_ka,i_ka,vm_from_pu,va_from_degree,vm_to_pu,va_to_degree,loading_percent
0,0.103769,0.050486,-0.1,-0.05,0.003769,0.000486,0.139895,0.139896,0.139896,1.190635,-0.539841,1.153532,0.080701,98.518141


In [53]:
net.res_trafo

Unnamed: 0,p_hv_mw,q_hv_mvar,p_lv_mw,q_lv_mvar,pl_mw,ql_mvar,i_hv_ka,i_lv_ka,vm_hv_pu,va_hv_degree,vm_lv_pu,va_lv_degree,loading_percent
0,0.106038,0.051875,-0.103769,-0.050486,0.002268,0.001389,0.00284,0.139895,1.2,0.0,1.190635,-0.539841,24.593091


All other pandapower elements and power grid analysis functionality (e.g. optimal power flow, state estimation or short-circuit calculation) are also fully integrated into pandapower's tabular data structure. This concludes a short walkthrough of some pandapower features. More in-depth tutorials can be found under this [link](https://www.pandapower.org/start/#interactive-tutorials-.)

## Create small power network using excel files

A package has been created in order to simplify network generation, timeseries simulation and data visualisation. We can generate **panda**power object from data stored in excels files through the following function.

In [54]:
import pp_heig_plot as pp_plot
import pp_heig_simulation as pp_sim
from datetime import time

In [55]:
!pwd

/home/agiraldi/git/rht


In [56]:
net_file_path = "./tutorials/data/3_bus_example.xlsx"
net = pp_sim.load_net_from_xlsx(file_path=net_file_path)
net

This pandapower network includes the following parameter tables:
   - bus (3 element)
   - load (1 elements)
   - ext_grid (1 elements)
   - line (1 elements)
   - trafo (1 elements)

We can plot a simplified diagram of our network using the following function:

- By adding a filename, the plot will be saved in a png format in the default folder _plot_.
- We can change the folder name using the folder parameter.
- We can view the equipment parameters in the plot by moving the mouse over them.
- The network is well traced when it is tree-like. In the case of a mesh grid, a coordinate parameter must be added to the buses.

In [57]:
pp_plot.plot_power_network(net=net, plot_title="3-bus example", filename="3_bus_example")

We can run a simple power flow and visualise result using the following functions:

In [58]:
pp.runpp(net)
pp_plot.plot_powerflow_result(net=net, plot_title="3-bus powerflow results", filename="3_bus_pp_result")
net.res_bus

Unnamed: 0,vm_pu,va_degree,p_mw,q_mvar
0,1.2,0.0,-0.106038,-0.051875
1,1.190635,-0.539841,0.0,0.0
2,1.153532,0.080701,0.1,0.05


### Timeseries powerflow simulation


We can create power profiles from excel files to perform timeseries powerflow simulations. After having been loaded, the resulting object is a dictionary of dataframe:

- Keys is the equipment name where profile are related to.
- Values can be active and reactive power profile table.

In [59]:
profile_file_path = "./tutorials/data/3_bus_power_profile.xlsx"
time_series = pp_sim.load_power_profile_form_xlsx(file_path=profile_file_path)
print(time_series.keys())
print(time_series["load"].keys())
time_series["load"]["p_mw"]

dict_keys(['load'])
dict_keys(['p_mw', 'q_mvar'])


profile,0,1
00:00:00,0.02301,0.08414
01:00:00,0.01743,0.08866
02:00:00,0.01592,0.0895
03:00:00,0.02022,0.08509
04:00:00,0.03131,0.07463
05:00:00,0.03377,0.07337
06:00:00,0.03829,0.06886
07:00:00,0.05299,0.05567
08:00:00,0.07359,0.04019
09:00:00,0.08708,0.03242


In this example, the file loaded contains two different profiles for loads. If we take a look in the load **panda**power table we can see that the `profile_mapping` parameter of the load is set to 0. It means that power profiles applied to this load will be the 0.

In [60]:
net.load

Unnamed: 0,name,bus,p_mw,q_mvar,scaling,const_z_percent,const_i_percent,sn_mva,in_service,type,profile_mapping
0,load_0,2,0.1,0.05,1.0,0.0,0.0,,True,wye,0


In [61]:
pp_sim.apply_power_profile(net=net, equipment="load", power_profiles=time_series["load"])

Then we need to create an output writer which will store simulation results:

- Default results stored are `res_bus.vm_pu`, `res_line.loading_percent`, `res_trafo.loading_percent`.
- We can add other results using `add_results` parameters.

In [62]:
pp_sim.create_output_writer(net=net, add_results=["res_line.p_from_mw"])

Finally, we can run times series simulation and plot results – as follows:

In [63]:
result_df = pp_sim.run_time_simulation(net=net)
print()
pp_plot.plot_timeseries_result(data_df=result_df["res_bus.vm_pu"], ylabel="V [pu]",
                               plot_title="Bus voltage", filename="voltage_result")
print()
pp_plot.plot_timeseries_result(data_df=result_df["res_line.p_from_mw"], ylabel="P [MW]",
                               plot_title="line power", filename="line_result")
print()
pp_plot.plot_timestamps_powerflow_result(net=net, filename="net_result_12h", plot_time=time(hour=12))










We can use the second power profile loaded for the excel file. To do this, we just need to modify the `profile_mapping` parameter before applying once again the power profile:

In [64]:
net.load.loc[0, "profile_mapping"] = 1
pp_sim.apply_power_profile(net=net, equipment="load", power_profiles=time_series["load"])
result_df = pp_sim.run_time_simulation(net=net)
print()
pp_plot.plot_timeseries_result(data_df=result_df["res_bus.vm_pu"], ylabel="V [pu]",
                       plot_title="Bus voltage", filename= "voltage_result")
print()
pp_plot.plot_timeseries_result(data_df=result_df["res_line.p_from_mw"], ylabel="P [MW]",
                       plot_title="line power", filename= "line_result")







We can also scale our power profiles modifying `scaling` parameters:

In [65]:
net.load.loc[0, "scaling"] = 5
result_df = pp_sim.run_time_simulation(net=net)
pp_plot.plot_timeseries_result(data_df=result_df["res_bus.vm_pu"], ylabel="V [pu]",
                       plot_title="Bus voltage", filename= "voltage_result")
print()
pp_plot.plot_timeseries_result(data_df=result_df["res_line.p_from_mw"], ylabel="P [MW]",
                       plot_title="line power", filename= "line_result")




## References

- [Pandapower 'Getting started'](http://www.pandapower.org/start/)
- [Pandapower's documentation](https://pandapower.readthedocs.io/en/v2.13.1/index.html)
- [Pandapower's tutorials on GitHub](https://github.com/e2nIEE/pandapower/tree/v2.13.1/tutorials)

### Citing pandapower

 &copy; Copyright 2016-2023 by Fraunhofer IEE and University of Kassel. Revision 2feba868.

```latex
@article{pandapower.2018,
author={L. Thurner and A. Scheidler and F. Schafer and J. H. Menke and J. Dollichon and F. Meier and S. Meinecke and M. Braun},
journal={IEEE Transactions on Power Systems},
title={pandapower - an Open Source Python Tool for Convenient Modeling, Analysis and Optimization of Electric Power Systems},
year={2018},
doi={10.1109/TPWRS.2018.2829021},
url={https://arxiv.org/abs/1709.06743},
ISSN={0885-8950}
}
```

# NumPy

**Note**: This tutorial was greatly inspired by [Aurélien Géron's NumPy notebook](https://github.com/ageron/handson-ml3/blob/main/tools_numpy.ipynb), all his interactive notebooks can also be found on [Google Colab](https://colab.research.google.com/github/ageron/handson-ml3/blob/main/index.ipynb). 

It is also important to mention that we have just made a new selection for this course. You don't have to do everything, but it's highly recommended that you go through these fundamental concepts and learn how to master them. This course is not a programming course, but we will give you the essential content so that you can draw on it for your future engineering projects.

A comparison with MATLAB was done [at the end of the NumPy tuto](#key-differences-with-matlab).

NumPy is the fundamental library for scientific computing with Python. NumPy is centered around a powerful N-dimensional array object, and it also contains useful linear algebra, Fourier transform, and random number functions.

**Every element of a numpy object schould have the same type**

## Import package

As `numpy` is not a basic package, it first need to be imported. Most people import it as `np`:

## Creating arrays


### `np.zeros()`

The `zeros` function creates an array containing any number of zeros:

It's just as easy to create a 2D array (i.e. a matrix) by providing a tuple with the desired number of rows and columns. For example, here's a 3x4 matrix:

### `np.ones()`

The `ones` function creates an array containing any number of ones:

### `np.arange()`

The `arange` function returns an array containing every points between two values at a given step (note that the maximum value is not _included_).

### `np.linspace()`

The `linspace` function returns an array containing a specific number of points evenly distributed between two values (note that the maximum value is _included_).

### `np.rand()`

A number of functions are available in NumPy's `random` module to create `ndarray`s initialized with random values. For example, here is a 3x4 matrix initialized with random floats between 0 and 1 (uniform distribution):

### `np.eye()`

The `eye` function returns an NxN shape identity matrix

### `np.reshape()`

The `reshape` function returns a new `ndarray` object pointing at the _same_ data. This means that modifying one array will also modify the other.

1.6 Some vocabulary

In NumPy, each dimension is called an **axis**.

The number of axes is called the **rank**:
* The above 3x4 matrix is an array of rank 2 (it is 2-dimensional).

An array's list of axis lengths is called the **shape** of the array:
* The above matrix's shape is `(3, 4)`.

The **size** of an array is the total number of elements, which is the product of all axis lengths
*  The above matrix's size is $3\times4=12$.

## Indexing and slicing

Data access are pretty similar to list

## Arrays concatenation and setting

You can use `np.concatenate()` to merge arrays together. Unlike list you can perform if in any axis wanted

**Note**: Pay attention the arrays must have the same shape, except in the dimension corresponding to axis in whitch the concatenation occures.

Modifying element in arrays are quite similare to list:

## `dtype`

NumPy's `ndarray`s are also efficient in part because all their elements must have the same type (usually numbers). We can check what the data type is by looking at the `dtype` attribute:

Available data types include signed `int8`, `int16`, `int32`, `int64`, unsigned `uint8|16|32|64` and `complex64|128`. Check out the documentation for the [basic types](https://numpy.org/doc/stable/user/basics.types.html) and [sized aliases](https://numpy.org/doc/stable/reference/arrays.scalars.html#sized-aliases) for the full list.

The number included in the type refers to the number of bits each element will be stored

Instead of letting NumPy guess what data type to use, we can set it explicitly when creating an array by setting the `dtype` parameter

### `complex`

NumPy also handle complex numbers, you can create complex arrays by setting the `dtype` parameter

You can also create complex arrays using `complex` notation (i.e. r+ij)

Note that you you want to create `complex` array using `arange` function, it is safer to create arrays of reals you and add or multiply it by the need `complex` number

## Arithmetic operations

All the usual arithmetic operators (`+`, `-`, `*`, `/`, `//`, `**`, etc.) can be used with `ndarray`s. They apply `elementwise`:

In general, when NumPy expects arrays of the same shape but finds that this is not the case, it applies the so-called _broadcasting_ rules:

### First rule broadcasting

1. If the arrays do not have the same rank, then a 1 will be prepended to the smaller ranking arrays until their ranks match.

Now let's try to add a 1D array of shape `(5,)` to this 3D array of shape `(1,1,5)`. Applying the first rule of broadcasting!

### Second rule broadcasting

Let's try to add a 2D array of shape `(2,1)` to this 2D `ndarray` of shape `(2, 3)`. NumPy will apply the second rule of broadcasting:

Combining rules 1 & 2, we can do this:

And also, very simply:

### Third rule broadcasting

After rules 1 & 2, the sizes of all arrays must match.

Broadcasting rules are used in many NumPy operations, not just arithmetic operations, as we will see below. For more details about broadcasting, check out [the documentation](https://numpy.org/doc/stable/user/basics.broadcasting.html).

### Upcasting rule

When trying to combine arrays with different `dtype`s, NumPy will `upcast` to a type capable of handling all possible values (regardless of what the _actual_ values are).

**Note**: `int16` is required to represent all _possible_ `int8` and `uint8` values (from -128 to 255), even though in this case a `uint8` would have sufficed.

## Conditional operators

The conditional operators also apply elementwise:

And using broadcasting:

### Indexing using conditional operators

This is most useful in conjunction with boolean indexing (discussed below).

## Commonly used functions

Many mathematical and statistical functions are available for `ndarray`s.

### Mathematical operators

The two warnings are due to the fact that `sqrt()` and `log()` are undefined for negative numbers, which is why there is a `np.nan` value in the first cell of the output of these two functions.

### Statistical operators

**Note**: This computes the mean of all elements in the `ndarray`, regardless of its shape.

These functions accept an optional argument `axis` which lets you ask for the operation to be performed on elements along the given axis. For example:

Sum each element across matrices

Sum each rows in matrices

Sum each columns in matrices

We can also sum over multiple axes:

### Binary functions

There are also many binary ufuncs, that apply elementwise on two `ndarray`s. Broadcasting rules are applied if the arrays do not have the same shape:

## Interpolation

Starting with a small example:

We can plot an interpolant to the sine function:

Interpolation with periodic x-coordinates:

Complex interpolation:

## Key differences with MATLAB

NumPy and MATLAB are quite similar, they have a lot in common. NumPy was created to work with Python, not to be a pure MATLAB clone. This guide will help MATLAB users get started with NumPy.

It is important to mention some key differences:

| MATLAB                                                                      | NumPy                                                                      |
| :-------------------------------------------------------------------------- | :------------------------------------------------------------------------- |
| <ul><li>It uses multidimensional arrays as the basic type.</li><li>Scalars are also multidimensional arrays with one elements.</li><li>Array assignments are 2D double precision floats by default.</li><li>You can specify the number of dimensions and type of an array.</li><li>2D array operations follow matrix operations in linear algebra.</li></ul> | <ul><li>NumPy's basic type: multidimensional `array`.</li><li>Specify dimensions and type: optional.</li><li>Element-by-element operations: multiplying 2D arrays with `*` is not a matrix multiplication -- it's an element-by-element multiplication.</li><li>Matrix multiplication use `@` operator.</li></ul> | <!-- Line 1 end -->
| <ul><li>MATLAB numbers indices from 1; `a(1)` is the first element.</li></ul> | <ul><li>NumPy, like Python, numbers indices from 0; `a[0]` is the first element.</li> | <!-- Line 2 end -->
| <ul><li>MATLAB scripting language was created for linear algebra so the syntax for some array manipulations is more compact than NumPy's.</li><li>The API for adding GUIs and creating full-fledged applications is more or less an afterhought.</li></ul> | <ul><li>NumPy is based on Python, a general-purpose language. The advantage to NumPy is access to Python libraries including: SciPy, Matplotlib, Pandas, OpenCV, and more.</li><li>Python is often embedded as a scripting language in other software, allowing NumPy to be used there too.</li></ul> | <!-- Line 3 end -->
| <ul><li>MATLAB array slicing uses pass-by-value semantics, with a lazy copy-on-write scheme to prevent creating copies until they are needed.</li></ul> | <ul><li>NumPy array slicing uses pass-by-reference, that does not copy the arguments.</li><li>Slicing operations are views into an array.</li></ul> |

## References to go further with NumPy

- [NumPy for MATLAB users](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html)
- [NumPy's documentation](https://numpy.org/doc/stable/reference/index.html#reference)
- [SciPy's documentation](https://scipy.org/)

# Pandas

The `pandas` library provides high-performance, easy-to-use data structures and data analysis tools. The main data structure is the `DataFrame`, which we can think of as an in-memory 2D table (like a spreadsheet, with column names and row labels). Many features available in Excel are available programmatically, such as creating pivot tables, computing columns based on other columns, plotting graphs, etc. We can also group rows by column value, or join tables much like in SQL. Pandas is also great at handling time series.

This tutorial explains some difference and key features with spreadsheet programs like Excel, Google Sheets, LibreOffice Calc, Apple Numbers and other Excel-compatible spreadsheet software. In this tutorial we will go through the main concept of the pandas library. If one needed, the different references are available at the end of this section.

To get started, we need the following important packages:

## Object creation

Creating a `Series` by passing a list of values, letting pandas create a default integer index:

Creating a `DataFrame` by passing a NumPy array, with datetime index and labeled columns:

Creating a `DataFrame` by passing a dictionary of objects that can be converted into a series-like structure:

df2.dtypes

If you're using IPython, tab completion for column names (as well as public attributes) is automatically enabled. Just type `df2.<TAB>` to see them.

## Data viewing

See the [Basics section](https://pandas.pydata.org/docs/user_guide/basics.html#basics).

The columns of resulting `DataFrame` have different `dtypes`:

`info()` shows a quick statistic summary of your data:

`describe()` shows a quick statistic summary of your data:

Here is how to view the top and bottom rows of the frame:

Display the index, columns:

## Data selection

Pandas get optimized pandas data access methods, `.at`, `.iat`, `.loc` and `.iloc`. To go further, see the indexing documentation [Indexing and Selecting Data](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing) and [MultiIndex / Advanced Indexing](https://pandas.pydata.org/docs/user_guide/advanced.html#advanced).

### Selecting columns

Selecting a single column or a group of columns using `[]`:

### Selection by label using `.loc`  and `.at` methods

See more in [Selection by Label](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-label).

For getting a row using its label:

Selecting on a multi-axis by label:

You can slice ofer data using lablel, both endpoints are _included_:

For getting fast access to a scalar (equivalent to the prior method):

### Selection by position using `.iloc`  and `.iat` methods

See more in [Selection by Position](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-integer).

Select via the position of the passed integers:

By lists of integer position locations, similar to the NumPy/Python style:

By integer slices, acting similar to NumPy/Python:

For getting fast access to a scalar (equivalent to the prior method):

### Conditional indexing

Selecting values from a DataFrame where a boolean condition is met:

Using the `isin()` method for filtering:

By integer slices, acting similar to NumPy/Python:

## Data setting

You can modify values of elements using `.at`, `.iat`, `.loc` and `.iloc` methode in the same way presented in chapter 2..

You can modify an entire columns with the following:

- A unique value which will be assigned to the whole column
- A list to assigne different values through the columns. **size must be the same**
- A series which will only assigne values in matched index (nan values is assigned when a index from the dataframe is not included in the serie).

If you assigne a column which is not icluded to the dataFrame, it will be added

### Set and reset index

## Data processing

### Missing data

pandas primarily uses the value `NaN` to represent missing data. It is by default not included in computations.

See the [Missing Data section](https://pandas.pydata.org/docs/user_guide/missing_data.html#missing-data).

Reindexing allows you to change/add/delete the index on a specified axis. This returns a copy of the data:

To drop any rows that have missing data:

To drop any column that have missing data:

To drop any rows that have only missing data:

Filling missing data:

To get the boolean mask where values are `nan`:

### Duplicate data

You can find duplicate data using [`duplicated()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html).

This methode will decect every duplicated rows, by default the first occurrence is set on False and all others on True.

By setting keep='last, the last occurrence of each set of duplicated values is set on False and all others on True.

By setting keep=False, all duplicates are True.

To find duplicates on specific subset ofcolumn(s), use subset.

df.duplicated(subset=['brand'])

You drop duplicates data you can use [`drop_duplicates()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html?highlight=drop_duplicat#pandas.DataFrame.drop_duplicates)

This methode has quitly the same behaviour than `duplicated()`

### Concatenate dataframe

 [`pd.concat()`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html) allows to combine several dataframe together using their commun column or index label.
 
 Unlike numpy, pandas concatenation could be performed to unmatched size dataFrame, `NaN` values are added in order to keep built dataFrame integrity.

Combine two dataframe using their commun columns:

The two last columns of the resulting dataFrame contain `NaN` values

Resulting dataframe will keep source dataframe indexes. If you want to create a new index just set  `ignore_index=True`

Coincatenate two dataframe using their commun rows by setting axis = 1:

Fannaly iy you want to only keep common row or column (depending of which axis is set) you can set  `join="inner"` 

It is also possible to combine dataframe through the commun values from one of their columns using [`merge()` methode](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html]).

**Pay attention, if both key merging columns have duplicate values, it will create duplicate rows**

## Statistics operators

Every statistics operators available on Numpy are also on Pandas. Operations in general _exclude_ missing data.

## [`apply()` function](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html)

Pandas DataFrame apply() function is used to efficently apply a function along an axis of the DataFrame. It returns a new DataFrame object after applying the function to its elements.

 The for loop iterates through the numbers list and calls the `square()` function on each element. The `apply()` function is not defined for lists. It is a method of DataFrames and Series in Pandas.

The `apply()` function can be more efficient than a `for` loop in some cases, because it does not have to create a new list to store the results. However, the `for` loop is more readable and easier to understand.

- [ ] Check with LTI

Using a `for` loop:

Using the `apply()` function:

The `apply()` function can be a powerful tool for applying functions to data structures in Python. It is important to choose the right tool for the application, and to understand the trade-offs between performance and readability.

First we define the function we need to make it work:

Then we create a DataFrame to compare the different methods and apply the functions:

### `lambda` function

A `lambda` function is a small, anonymous function that can be used as an argument to other functions. Lambda functions are often used in Python to perform simple tasks that would otherwise require a more complex function.

The `arguments` are the input values to the function, and the `expression` is the code that is executed when the function is called.

TODO:

- [ ] Build a short tutorial (coming soon...)

## References for Pandas

If you want to get more insights and learn more about Matplotlib, please visit the following two very good references:

- [10 minutes to pandas](https://pandas.pydata.org/docs/user_guide/10min.html)
- [Comparison with spreadsheets](https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_spreadsheets.html)
- [Pandas cheatsheet](https://github.com/pandas-dev/pandas/blob/main/doc/cheatsheet/Pandas_Cheat_Sheet.pdf)

# Acknowledgments for this tutorial preparation

**Note**: These tutorials was greatly inspired by [Aurélien Géron's NumPy notebook](https://colab.research.google.com/github/ageron/handson-ml3/blob/main/index.ipynb), all his interactive notebooks. Do not hesistate to go through them when you have time during the semester.

## References

- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron. Copyright 2023 Aurélien Géron, 978-1-098-12597-4.
```latex
@book{geron2022hands,
  title={Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow},
  author={G{\'e}ron, Aur{\'e}lien},
  year={2022},
  publisher={" O'Reilly Media, Inc."}
}
```
- [Code examples](https://github.com/ageron/handson-ml3)

## Notebook tutorials to go further

- [Power System Notebooks](./index.ipynb)