<a href="https://colab.research.google.com/github/melailem/Bioinformatics/blob/master/rayane2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Arrays and graphs


## Arrays

The module [numpy](https://docs.scipy.org/doc/numpy/reference/) provides everything that's needed for arrays and maths on arrays.

Usually we will **not** use mathematical function from the math package that we used during the previous TP.  
Numpy superseed them with equivalant functions that works not only on numbers but also for arrays.  

In [0]:
import numpy as np

The `arange(begin, end, step)` function create an array of envenly spaced points, **excluding** the end

In [0]:
t = np.arange(0., 2.1, 0.1)
print(t)

Similarly there is the `linspace(begin, end, nb_points)` function if you prefer to give the number of points.  
Be careful, in this case it's **including** the end.

In [0]:
t = np.linspace(0., 2., 21)
print(t)

print("Numpy arrays are capable of some basic computation:")
print(("{:6} = {:5.2f}\n" * 3).format("t min", t.min(),
                                      "t mean", t.mean(),
                                      "t max", t.max()))

print("Or we can use numpy's functions on those arrays (see doc):")
print(("{:6} = {:5.2f}\n" * 3).format("t min", np.min(t),
                                      "t mean", np.mean(t),
                                      "t max", np.max(t)))

Applying the sinus function on an array will apply sinus on each element of the array

In [0]:
s = np.sin(2 * np.pi * t) 
print(s)

The documentation of numpy online is very well written.

Do not hesitate to use it !

### External data and simple statistics

#### Saving

To write arrays in a simple format you have first to compact them in a single variable says DataOut:
```python
DataOut = np.column_stack((x, y1, y2, y3))
```

Then data are saved on the file `myFile.txt` simply through:
```python
np.savetxt("myFile.txt", DataOut) 
```

Since we are on google colab, you may want to download this file
```python
from google.colab import files
files.download('myFile.txt')
```

#### Loading
Since we are on google colab, we first have to upload the file

```python
from google.colab import files
uploaded = files.upload()
```

If the data are stored in a similar way, you can use 
```python
DataIn = np.loadtxt("myFile.txt") 
```

An then "unpack" the result
```python
x, y1, y2, y3 = DataIn.T
```
Notice the transposition here ?  
That's because python only know how to unpack lines and not columns


#### Your turn

Load the content of the file `populations.txt`

In [2]:
import numpy as np 
from google.colab import files
uploaded = files.upload()

data = np.loadtxt('populations.txt')
year, hares, lynxes, carrots = data.T

OSError: ignored


Compute for each species
 * the mean population
 * the standard deviation


 * The year when each species had it's maximum of population



 * The two years when each species had it's minimum of population

### Your first algorthm 

We will here implement a classic algortith to sort data : **bubble sort**

![Texte alternatif…](https://upload.wikimedia.org/wikipedia/commons/3/37/Bubble_sort_animation.gif)

The idea here is to go through the list comparing each consecutive pair of elements. If two element are not in the right order, they are swapped.

![Texte alternatif…](https://upload.wikimedia.org/wikipedia/commons/c/c8/Bubble-sort-example-300px.gif)

Once the list has been parsed, we start from the beginning again until no more swapping are necessary.

The algorithm can be described like that:

![Texte alternatif…](https://www.researchgate.net/publication/303337342/figure/fig7/AS:372605338046474@1465847443705/Sample-flowchart-for-a-sorting-algorithm-This-flowchart-illustrates-the-conditional.png =400x)

Write a function `swap(mylist, i, j)` that swap the element in i and the element in j

Yon can test your function here:

In [0]:
test = [2, 1, 3, 4]
swap(test, 0, 1)
print(test)

test = [3, 2, 1, 4]
swap(test, 0, 2)
print(test)

test = [2, 1, 3, 4]
swap(test, 1, 0)
print(test)

Now write a function `compare(mylist, i, j)` that compare the element in i and the element in j and, if `mylist[i]` is bigger than `mylist[j]` , call the previous function to swap them 

Yon can test your function here:

In [0]:
test = [2, 1, 3, 4]
compare(test, 2, 3)
print(test)
compare(test, 0, 1)
print(test)
compare(test, 2, 0)
print(test)

You can now write a function `one_pass(mylist)`  that goes through the list one time, making all the consecutive pairwise comparisons

Yon can test your function here:

In [0]:
test = [2, 1, 3, 4]
one_pass(test)
print(test)

test = [4, 1, 2, 3]
one_pass(test)
print(test)

test = [2, 1, 4, 3]
one_pass(test)
print(test)

test = [1, 1, 2, 3]
one_pass(test)
print(test)

test = [2, 3, 4, 1]
one_pass(test)
print(test)

Finally, you can write the `bubble_sort(mylist)` function that will call the previous function repeatedly until the list is sorted.

In order to keep thing simple, one can remark that we have at most `n = len(mylist)` pass to do.  
So that what you'll do, without checking if the list has been sorted already

Execute the following cell mutliple time in order to check that your sorting implementation is working

In [0]:
n = np.random.randint(50)
random_list = np.random.randint(50, size=50)
print("start: ",random_list)

sorted_list = np.sort(random_list)
print("solution:", sorted_list)

bubble_sort(random_list)
print("your result: "random_list)

Have you notice how we split up the problem in very small pieces, each of them beeing rather simple?

Divide and conquer starategy are very well suited for programming.


## Graphs

The module [matplotlib](https://matplotlib.org/gallery/index.html) provides everything that's needed for plotting any kind of graphs :


### Demo


In [0]:
t = np.linspace(0., 2., 21)
s = np.sin(2 * np.pi * t) 

import matplotlib.pyplot as plt

plt.figure("My first Figure")
plt.plot(t, s) 
plt.xlabel('time (s)') 
plt.ylabel('voltage (mV)') 
plt.title('About as simple as it gets, folks') 
plt.grid(True) 
plt.show() 

As you can see matplotlib do stuff behind the scene.  
We did not had to create a variable to store the current figure and act on it.  
When we say `plt.plot` it understand directly that we want to plot on the last openned figure.  

Warning, the order of the points is important is order to draw a line:

In [0]:
t2 = np.random.permutation(t)
s2 = np.sin(2 * np.pi * t2) 

plt.figure("My second Figure")
plt.plot(t, s, 'g', label='points are in order') 
plt.plot(t2, s2, label='randomly shuffled points') 
plt.xlabel('time (s)') 
plt.ylabel('voltage (mV)') 
plt.title('Everything looks garbage now') 
plt.grid(True) 
plt.legend()
plt.show() 

There is too much to say on matplotlib to devellop here.  
The documentation is not very clear, but the [exemples](https://matplotlib.org/gallery/index.html) are.  
Moreover, internet is full of help on this topic.

#### All your plots must contains all those informations
- Title of graph  
- Titles on axis  
- Grid  
- Legend (if multiple plot)  
<span style="font-size:xx-large;">
    <center>
        I will give you penalties if you don't !
    </center>
</span>
</div>



### Your turn

Create a function `f(x, λ)` that compute (use the return statement):
$$ f(x) = x^2 \sin\left(\frac{2 \pi x}{\lambda}\right) $$

Create a array `x` with `100` points evenly distributed from $-1$ to $1$ (both included)  

Compute
- $ y_1 = f(x, \lambda) $ with $ \lambda = 0.3$
- $ y_2 = f(x, \lambda) $ with $ \lambda = 0.4$
- $ y_3 = f(x, \lambda) $ with $ \lambda = 0.5$

Plot them in the same graph  
$y_2$ should be present with a red solid line  
$y_3$ should be present with a green dashed line


To save an image from a script, you can use:
```python
plt.savefig("monImage.png")
```

But again, since we are on google colab, If you want to download this file
```python
from google.colab import files
files.download('monImage.png')
```

# Do not forget

Before sending this notebook to me, **restart the kernel, and reexecute all the cells in order.**

Once all cells have been filled and executed,
 * **save this document as a pdf file** (by printing it)  
 * **Check** that everything is present on the produced pdf  
 * **Send the pdf AND the notebook** to me via Universitice  

This will be used to evaluate you.

# That's it for today !

Write here an rough estimation of the time you spent on this TP.  
This will **not** be used to evaluate you,  
but to will allow me to better adjust the complexities of the TP.

Time spent: *h