# Data manipulation challenges <a name="0."></a>

Contents:
- [Challenges - Easy](#1.)
  - [Challenge E1 - Area of a triangle](#1.1)
  - [Challenge E2 - Parametric plots](#1.2)
- [Challenges - Medium](#2.)
  - [Challenge M1 - Double plot](#2.1)
  - [Challenge M2 - Basic Higgs decay data](#2.2)
  - [Challenge M3 - Profit margin](#2.3)
  - [Challenge M4 - Fitting a potential](#2.4)
  - [Challenge M5 - Möbius strip](#2.5)
- [Challenges - Hard](#3.)
  - [Challenge H1 - Fourier function](#3.1)
  - [Challenge H2 - Matrix multiplication](#3.2)

If you need a refresher, a link to the notebook is [here](Data.ipynb).

# 1. Challenges - Easy <a name="1."></a>

## 1.1 Challenge E1 - Area of a triangle <a name="1.1"></a>

<hr style="border:2px solid gray">

An arbitrary triangle can be described by the coordinates of its three vertices $(x_1,y_1), (x_2,y_2), (x_3,y_3)$.  The area of the triangle is given by $A = \frac{1}{2} \sqrt{x_2y_3 -x_3y_2 - x_1y_3 + x_3y_1 + x_1y_2 - x_2y_1}$
- Write a function that returns the area of a triangle whose vertices are specified by the argument `vertices`, which is a nested list of vertex coordinates. For example, vertices can be $[[0,0],[1,0],[0,2]]$ if the three corners of the triangle have coordinates $(0, 0), (1, 0), (0, 2)$
- Test the area function on a triangle with known area
- Add a docstring to your function to explain to the user what the arguments of the function need to be and what it is going to return

<details>
    <summary>Hint: </summary>
    You'll need an array for this, where each element of the array is chosen with a float input.  Since the resulting function will be quite long, it would be best to give the array as short a name as possible.
</details>

<hr style="border:2px solid gray">

## 1.2 Challenge E2 - Parametric plots <a name="1.2"></a>

<hr style="border:2px solid gray">

The Butterfly Curve and Fermat Spiral are said to have beautiful graphical representations.  Using their parametric equations, use `matplotlib` to demonstrate this on separate plots.

Butterfly curve: $x = sin(t) \left[ e^{cos(t)} - 2 \; cos (4t) + sin^5 \left( \frac{t}{12} \right) \right] \; , \;
y = cos(t) \left[ e^{cos(t)} - 2 \; cos (4t) + sin^5 \left( \frac{t}{12} \right) \right]$

Fermat Spiral: $x = \frac{t}{2} cos(t) \; , \; y = -\frac{t}{2} sin(t)$

<details>
    <summary>Hint: </summary>
    For trigonometric equations, use $np.sin()$ and $np.cos()$.  For exponentials, use $math.e$.
</details>

[Return to contents](#0.)

<hr style="border:2px solid gray">

# 2. Challenges - Medium <a name="2."></a>

## 2.1 Challenge M1 - Double plot <a name="2.1"></a>

<hr style="border:2px solid gray">

Replicate the plots shown in the figure below.  The two plots should be produced within the same figure.

<CENTER><img src="twoplot_ex.png" style="width:80%"></CENTER>

<details>
    <summary>Hint 1: </summary>
    The *x* data can be generated using linspace().  Arrows can be rendered with the annotate() function.  In the legends, mathematical text can be generated using LaTeX syntax.
</details>

<details>
    <summary>Hint 2: </summary>
    Axes in matplotlib are referred to as $spines$; these can be adjusted like so:
    
    ax1.spines['left'].set_position('center')
    ax1.spines['right'].set_color('none')
</details>

<details>
    <summary>Hint 3: </summary>
    The intervals along an axis are referred to as $ticks$ and they can also be adjusted as well as labelled:
    
    ax1.set_yticks(['pos1','pos2','pos3'])
    ax1.set_yticklabels([r'pos1',r'pos2',r'pos3'])
</details>

<details>
    <summary>Hint 4: </summary>
    An example of the annotate() function is below:
    
    ax1.annotate('text', xy=('x','y'), arrowprops=dict(arrowstyle='-',connectionstyle='angle3'), xytext=('x','y'))
</details>

<hr style="border:2px solid gray">

## 2.2 Challenge M2 - Basic Higgs decay data <a name="2.2"></a>

<hr style="border:2px solid gray">

Higgs bosons will decay into two photons.  Given that the mass of the Higgs is $125 \; GeV$, write a program to determine which pair of photons could have been produced by a Higgs decay.

|            | $E$ (GeV) | $p_x$ (GeV) | $p_y$ (GeV) | $p_z$ (GeV) |
|:----------:|:---------:|:-----------:|:-----------:|:-----------:|
| $\gamma_1$ | 8.705     | 1.114       | 1.092       | -3.410      |
| $\gamma_2$ | 6.808     | -5.209      | -14.834     | 7.561       |
| $\gamma_3$ | 4.282     | 1.750       | 3.788       | 6.835       |

<details>
    <summary>Hint: </summary>
    The formula for invariant mass is given by $M^2 = \sum\limits_{i}^{} E^2_i - \sum\limits_{i}^{} \mid \bar{p}_i \mid = (E_i + E_j)^2 - (p_{x i} + p_{x j})^2 - (p_{y i} + p_{y j})^2 - (p_{z i} + p_{z j})^2$.
You'll need to use the math.isclose() function, which checks whether or not two values are close to each other.  Its arguments are the output value and the value you are hoping to get close to.
</details>

<hr style="border:2px solid gray">

## 2.3 Challenge M3 - Profit margin <a name="2.3"></a>

<hr style="border:2px solid gray">

Using the file ***company_sales_data.csv*** we imported earlier, display the following as graphs:
- Display sales data for each individual product as a line graph on a single plot
- Read the facewash and facecream sales data as a bar chart on a single plot
- Calculate the total sales data for the final year for each product and show using a pie chart, indicating which product generated the most sales income

In [None]:
import pandas as pd

df = pd.read_csv("company_sales_data.csv")
print(df)

For this particular `.csv` file, the unnamed (first) column gives the index of each row, the column `month_number` gives the number of the month (presumably starting from January) and the following columns for individual products give the units of each product sold for each month.  The column `total_units` gives the total units of products sold for each month, and the column `total_profit` gives the total profit gained from product sales for each month.

<details>
    <summary>Hint: </summary>
    You may need to adjust the axes for plot 1, and you'll need to 'shift' the data for both products in opposite 'directions' for plot 2.
</details>

<hr style="border:2px solid gray">

## 2.4 Challenge M4 - Fitting a potential <a name="2.4"></a>

<hr style="border:2px solid gray">

The following data is obtained for a helium dimer - it details the interaction energies at several internuclear separations (in angstrom).

In [None]:
distances = [2.875, 3.0, 3.125, 3.25, 3.375, 3.5, 3.75, 4.0, 4.5, 5.0, 6.0]

energies = [0.35334378061169025, -2.7260131253801405, -4.102738968283382, -4.557042640311599, -4.537519193684069, -4.296388508321034, -3.6304745046204117, -3.0205368595885536, -2.1929538006724814, -1.7245616790238782, -1.2500789753171557]

Fit this data to a Lennard-Jones potential, which is given below.

$$V = 4 \varepsilon \left( \left( \frac{\sigma}{r} \right)^{12} - \left( \frac{\sigma}{r} \right)^6 \right)$$

Obtain estimates for the values of $\varepsilon$ and $\sigma$ and plot the data against a fitted Lennard-Jones potential plot.

<details>
    <summary>Hint: </summary>
    The fit parameters obtained using curve_fit() will be in order of the fit function.
</details>

[Return to contents](#0.)

<hr style="border:2px solid gray">

## 2.5 Challenge M5 - Möbius strip <a name="2.5"></a>

<hr style="border:2px solid gray">

Generate a 3D plot of a Möbius strip, the parametric equations of which are below (for a strip of radius $R$ at a height $z = 0$).

$$x = \left[ R \; + \; s \; cos \left( \frac{1}{2} \; t \right) \right] \; cos(t)$$

$$y = \left[ R \; + \; s \; cos \left( \frac{1}{2} \; t \right) \right] \; sin(t)$$

$$z = s \; sin \left( \frac{1}{2} \; t \right)$$

Here, $-w \leq s \leq w$ and $0 \leq t \leq 2 \pi$ where $w$ is the half-width of the strip.

<details>
    <summary>Hint 1: </summary>
    You'll want to use the meshgrid() function from numpy on $s$ and $t$, otherwise your Möbius strip will be too thin.  It creates a 2D grid from two 1D arrays and is used like $x,y = np.meshgrid(x,y)$.  However, you'll need to return $s$ and $t$ to a 1D form to be able to give $x$, $y$ and $z$ in the 1D form they require.  This can be done using the flatten() function from numpy; use it like $x = x.flatten()$.
</details>

<details>
    <summary>Hint 2: </summary>
    If you want to make the 'kink' in your Möbius strip more visible, use the following:
    Opacity - change the opacity of a line using alpha after the x, y, z arguments. Lower values equate to reduced opacity.  Line effects - You can give a line a 'shadow' like so:
    
    plt.plot(x,y,z, other_arguments,
            path_effects=[path_effects.SimpleLineShadow(),
                          path_effects.Normal()])
</details>

[Return to contents](#0.)

<hr style="border:2px solid gray">

# 3. Challenges - Hard <a name="3."></a>

## 3.1 Challenge H1 - Fourier function <a name="3.1"></a>

<hr style="border:2px solid gray">

A Fourier series in sine and cosine form takes the expression $f(x)=\frac{a_0}{2}+\sum\limits_{n=1}^{\infty}\left[ a_n cos \left( \frac{n \pi x}{L} \right) + b_n sin \left( \frac{n \pi x}{L} \right) \right]$.

For the square wave, $a_0$ and $b_0$ are found to be *0* whilst $b_n$ is found to be $\frac{2}{n\pi}(-1)^{n-1}$.

For the sawtooth wave, $a_0$ is found to be $\frac{\pi}{2}$, $b_0$ and $a_n$ are found to be *0*, and $b_n$ is found to be $-\frac{1}{n}$.
- Write a user defined function to sum the values up to the $n^{th}$ term of a fourier series to approximate a square wave of the form
$$
f(x)=\begin{cases}
-1, & \pi \leq x \leq 2\pi \\ 
1, & 0 \leq x \leq \pi
\end{cases}
$$
- Modify your program to include a function to sum the values up to the $n^{th}$ term of a fourier series to approximate a saw-tooth wave of the form $f(x)=x$ for $0 \leq x \leq \pi$
- Plot both functions for the sum of $n$ up to $9$, $99$ and $999$ as two subplots on the same figure.

<details>
    <summary>Hint: </summary>
    In both cases, you will need a function to determine $b_n$ that will need a user input for $n$ and a square wave/sawtooth wave function involving your function for $b_n$.  Plots can be generated using linspace() to obtain $x$-data and the square wave/sawtooth wave functions for $y$-data.
</details>

<hr style="border:2px solid gray">

## 3.2 Challenge H2 - Matrix multiplication <a name="3.2"></a>

<hr style="border:2px solid gray">

Write a Python program to perform matrix multiplication on two matrices that are created by user input.

The multiplication process of two $2 \times 2$ matrices $A$ and $B$ is below.

$$\begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \; \times \; \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix} \; = \; \begin{bmatrix} A_{11}B_{11} + A_{12}B_{21} & A_{11}B_{12} + A_{12}B_{22} \\ A_{21}B_{11} + A_{22}B_{21} & A_{21}B_{12} + a_{22}B_{22} \end{bmatrix}$$

The multiplication process of two $3 \times 3$ matrices $C$ and $D$ is below.

$$\begin{bmatrix} C_{11} & C_{12} & C_{13} \\ C_{21} & C_{22} & C_{23} \\ C_{31} & C_{32} & C_{33} \end{bmatrix} \; \times \; \begin{bmatrix} D_{11} & D_{12} & D_{13} \\ D_{21} & D_{22} & D_{23} \\ D_{31} & D_{32} & D_{33} \end{bmatrix} \; = \; \begin{bmatrix} C_{11}D_{11} + C_{12}D_{21} + C_{13}D_{31} & C_{11}D_{12} + C_{12}D_{22} + C_{13}D_{32} & C_{11}D_{13} + C_{12}D_{23} + C_{13}D_{33} \\ C_{21}D_{11} + C_{22}D_{21} + C_{23}D_{31} & C_{21}D_{12} + C_{22}D_{22} + C_{23}D_{32} & C_{21}D_{13} + C_{22}D_{23} + C_{23}D_{33} \\ C_{31}D_{11} + C_{32}D_{21} + C_{33}D_{31} & C_{31}D_{12} + C_{32}D_{22} + C_{33}D_{32} & C_{31}D_{13} + C_{32}D_{23} + C_{33}D_{33} \end{bmatrix} $$

To make things easier, each matrix will be square - it will have the same number of rows and columns.

<details>
    <summary>Hint: </summary>
    What do you notice about the indices of each element in the multiplication matrices?  This will be vital for any attempts at multiplication.  You are advised to use control structures, particularly $for$ loops, and will likely need to append to an empty sequenced data type.
</details>

[Return to contents](#0.)

<hr style="border:2px solid gray">

There we go!  You can see the worked solutions [here](DataSolutions.ipynb), though there are many ways in which each exercise can be completed.

<hr style="border:2px solid gray">