# Python Exam (Masters/DU, 2024): Bees Dynamics Analysis

> + **Allocated time:** 1h30
> + **Drop your final notebook** *renamed with your family name* at the end of the exam on this [page](https://cernbox.cern.ch/s/CJr5MMZg1Jz223G)
> + **Allowed documents:** documentation of [python](https://docs.python.org/3/), [numpy](https://numpy.org/devdocs/user/index.html), [matplotlib](https://matplotlib.org/contents.html), [pandas](https://pandas.pydata.org/pandas-docs/stable/) and [scipy](https://docs.scipy.org/doc/scipy/reference/) as well as the documentation from the notebook interface using `Shift+tab` keyboard shortcut, `help(module)` or `help(function)`.
> 
> The final mark will be a number between 0 (very bad) and 20 (very good). The evaluation of this exam takes into account mainly the correctness of the answers, but also the clarity of the explanations and the quality of the code.

## General informations

### Data description

This exam proposes to analyze data describing the 2D motion of a system of two bees, in different situations. These data were generated by a numerical simulation with tunable parameters, such as an attraction force between the two bees, the presence of a field of flowers (attracting the bees too), or some random noise in the trajectory of each bees. More precisely, the following parameters will be considered:
 + `BOX`       : if enabled, there is a box (*i.e.* a limited area) in which the bees can live (bees are simply reflected when hiting a wall of the box)
 + `LBOX`      : length of the box, which is chosen to be a square
 + `ATTRACTION`: if enabled, bees attract each other (spring force)
 + `g_ATTRA`   : strenght of the mutual attraction
 + `FIELD`     : if enabled, there is a field of flowers organised on a regular lattice
 + `g_FIELD`   : strengh of individual flower attraction
 + `NOISE`     : if enabled, some random noise is added in individual bee trajectories
 
*N.B.* For this exam, there is no need to know the exact mathematical definition of these coefficients, or the exact equations describing the various forces.

The data contains the following variables:
 + time (in ms) : `tms`
 + x-position of bee 1 and bee 2: `Rx1` and `Rx2`
 + y-position of bee 1 and bee 2: `Ry1` and `Ry2`
 + x- and y- velocities of bee 1 and bee 2 : `Vx1`, `Vx2`, `Vy1`, `Vy2`


### Exam organisation

The first section of the exam proposes to study data that were already generated, while the second part of the exam focuses on data generation using an external module, in order to see how the model parameters affect the system.

### Import packages and plot cosmetics

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
import matplotlib as mpl
mpl.rcParams['legend.frameon'] = False
mpl.rcParams['legend.fontsize'] = 'xx-large'
mpl.rcParams['xtick.labelsize'] = 16
mpl.rcParams['ytick.labelsize'] = 16
mpl.rcParams['axes.titlesize'] = 18
mpl.rcParams['axes.labelsize'] = 18
mpl.rcParams['lines.linewidth'] = 2.5
mpl.rcParams['figure.figsize'] = (10, 7)

## 1. Analyzing existing data [7 + 1 bonus points]

**1.1 [1 pts]** Load the following file as a `pandas` dataframe: `beesMotion_BOX-1_LBOX-300_ATTRACTION-1_gATTRA-0.10_FIELD-0_gFIELD-2.80_NOISE-0_RUN-0.csv`, and print it. How many rows does it contain and what is the time interval between rows ? 

**1.2 [1 pts]** The name of the datasets will follow this structure `prefix_PARAMATER1-val1_PARAMETER2-val2..._RUN-runNumber.csv`. 
Write a function `param_values()` that takes in argument the name of the csv file and returns a  dictionnary storing the parameter simulation values `{param:val}`. *Hint*: use the `split()` function. Don't forget to remove the prefix "beesMotion_" and suffix ".csv"

**1.3 [1 pts]** Plot the trajectories (`Rx` against `Ry`) of the two bees on the same plot. Don't forget to add a legend!

**1.4 [1 pts]** Add a new column to the dataframe, containing the distance between the two bees. 
*Reminders* : Disance between points $\sqrt{(r_{x, 1} - r_{x,2})^2 + (r_{y,1}-r_{y,2})^2}$. *Hint: $r_{x, 1}$ corresponds to `Rx1` in the dataframe, etc..*

**1.5 [1 pts]** Plot the distance `R12` between the 2 bees as function of time. Don't forget to add axis labels, legend, etc..

**1.5 [2pts]** We would like to understand how position and velocities are correlated between bee 1 and bee 2. Compute the velocity $\sqrt{v_x^2 + v_y^2}$ and the position $\sqrt{r_x^2 + r_y^2}$, for each bee. Produce a plot of $r_1$ versus $r_2$ and another plot $v_1$ versus $v_2$. What would you conclude about how correlated are bee 1 and 2?

**BONUS QUESTION [1pts]**. Find the total duration (in milliseconds), and fraction of time for which the bees are close to each other, *i.e.* with a distance lower then 5. *Hint: use a mask*

## 2. Adding an extra dataset with noise [ 8 points]

We will now work with an additional simulation with noise added in the bees' motion ((while the previous one was without noise) : 
 + no noise (previous dataset): `beesMotion_BOX-1_LBOX-300_ATTRACTION-1_gATTRA-0.10_FIELD-0_gFIELD-2.80_NOISE-0_RUN-0.csv`
 + noise (new dataset): `beesMotion_BOX-1_LBOX-300_ATTRACTION-1_gATTRA-0.10_FIELD-0_gFIELD-2.80_NOISE-1_RUN-2.csv`
 
**2.1 [2pts]** After loading the new dataset, plot the trajectory of the bee 1 (`Rx1` vs `Rx2`) with and without noise on the same plot. Don't forget legend, axis labels, etc... 

**2.2 [2pts]** Use `df.columns` to loop over the observables $\mathcal{O}$ in the dataset. (So $\mathcal{O}$ being {$x_1$, $x_2$, $y_1$, $y_2$, $v_{x, 1}$, $v_{x, 2}$). For each one, calculate the difference between the observable with and without noise $\mathcal{O}_{noise} - \mathcal{O}$. You can for example store this difference as a new column in your dataframe.

**2.3 [2pts]** Make a plot of the difference as a function of time for each observable (either on separate plots or on the same plot is fine, so long as you make a nice legend). Making use of a loop is highly recommended, doing a large number of copy-pastes will be penalised.

**2.3 [2pts]** Plot the histogram of the distance introduced by the noise (defined as  $\sqrt{(r_{x, 1} - r_{x,1,noise})^2 + (r_{y,1}-r_{y,1, noise})^2}$), for the bee 1. Do the same for the bee 2 and overlay the two histograms. Is the noise similar for the two bees ? Make sure the two histogrames use the same binning and that you make nice legends and label your axes.

## 3. Simulating data using an external package [5 + 2 bonus pts]

**2.1 [2pts]** Import the external package `bees_simulation.py` as `bs` and run a simulation using the function `run_bees_simulation` using the exact same setup as in question **1.2** (*ie* `noise=0, field=0`), but *with a number steps of 10000*. Make sure the two dataframes are identical plotting for example the difference of the two x-positions of bee 1, as function of time. 

*Hint:* the documentation of a function can be obtained using `Shift+Tab`. You can use the argument `run` to make sure you can identify your newly generated sample.

**2.2 [2pts]** Generate simulations (without noise) enabling a flowers field, for 3 different interaction strengh values `g_FIELD`, namely `0.0`, `2.8` and `5`.

**2.3 [1 pts]** Overlay the bee 1 trajectory of each simulation on the same plot, with a proper legend speficying which trajectory corresponds to which `g_FIELD` value.

**BONUS QUESTION 2.3 [2 pts]** Write a function which takes two datasets (*i.e.* two simulation setups) and plots as a function of time the difference between the 2 datasets of the distances between bees $r_{12} = \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}$;
  
Test it on the data sets with `g_FIELD=0` and `g_FIELD=5`.