This notebook contains a few exercises on NumPy, Pandas and Scipy. 

Assigned readings:
* [A Visual Intro to NumPy and Data Representation](http://jalammar.github.io/visual-numpy) by Jay Alammar, **up to "Transposing and Reshaping**.
* [Pandas DataFrame introduction](https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html)
* [Pandas read-write tutorial](https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html)
* [Scipy introduction](https://docs.scipy.org/doc/scipy/tutorial/general.html)
* [Scipy IO tutorial](https://docs.scipy.org/doc/scipy/tutorial/io.html)

Exercises marked with **!** require information not found in the assigned readings. To solve them you will have to explore the online documentations:
* [NumPy](https://numpy.org/doc/stable/user/index.html)
* [Pandas](https://pandas.pydata.org/docs/user_guide/index.html)
* [Scipy](https://docs.scipy.org/doc/scipy/tutorial/index.html)

# Numpy

## Operations on 1D arrays

To practice with operations on 1D NumPy arrays, we will illustrate the law of large numbers. Before we start, we will import the NumPy module and fix the random seed used in the random number generator:

In [None]:
import numpy as np

np.random.seed(0)

**Exercise 1.1.1**

Create a 1D array of 50 random numbers drawn from the uniform distribution in [0,1]. Determine the minimum, maximum and mean value in the array.

**Exercise 1.1.2**

Create a Python list with 100 elements where element $i$ is the mean of an array of $i$ elements drawn from the uniform distribution in [0,1].

Which one of the 5th, 50th and 100th element is closest to 0.5?

Assuming that the previous Python list is stored in a variable called `means`, its content can be plotted as follows:



In [None]:
from matplotlib import pyplot as plt

plt.plot(means)

If all went well, the list should converge to 0.5!

## Operations on 2D arrays

We will practice operations on 2D NumPy arrays by manipulating 2D images. The Python Imaging Library (PIL) provides an easy way to load 2D images of various types in NumPy arrays. Here, we will practice with a PNG image representing the NumPy logo:

In [None]:
from PIL import Image
import os

image = np.array(Image.open(os.path.join("data", "numpy.png")))

NumPy arrays representing images can easily be shown with Matplotlib:

In [None]:
from matplotlib import pyplot as plt

plt.imshow(image)

**Exercise 1.2.1**

Determine the size of the image (number of pixels in x and y dimension).

**Exercise 1.2.2**

Plot the bottom half of the image, i.e., the lines from x=250 on.

**! Exercise 1.2.3**

Write a program to remove the whitespace around the image.

**! Exercise 1.2.4**

Using NumPy's `linalg` module, solve the equation **Ax** = **b**, where:

$$
\textbf{A}=
\begin{bmatrix}
8 & -6 & 2\\
-4 & 11 & -7\\
4 & -7 & 6
\end{bmatrix}
\quad
\mathrm{and} \quad \textbf{b} = \begin{bmatrix}
28\\
-40\\
33
\end{bmatrix}
$$

Determine the inverse of **A**

# Pandas

We will explore file `airbnb.csv`, a dataset of Airbnb prices in New York City. The dataset was exported from [Kaggle](https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data).

**Exercise 2.1**

Load the dataset in a Pandas data frame and show a sample of the data frame:

Each row holds information for a given listing and columns represent the attributes of a listing.

**Exercise 2.2**

What is the highest price listed in the dataset?

What is the total number of reviews contained in the dataset?

What are the min, max, and mean of the following features?
* Price
* Number of reviewers
* Minimum nights

**Exercise 2.3**

How many listings have a price lower than $100?

**Exercise 2.4**

What is the cheapest private room in Manhattan?

**! Exercise 2.5**

Among the numerical features (latitude, longitude, minimum nights, etc), which one is the most correlated to the listing price?

# Scipy


**Exercise 3.1**

A colleague of yours who uses MATLAB sent you data in the mat file `points.mat`. Load this file and retrieve the x and y arrays in it. Using matplotlib, plot the (x, y) points.

**! Exercise 3.2**

Using Scipy's `interpolate` module, interpolate the datapoints using (1) nearest neighbors, and (2) cubic splines. Plot the interpolants.

**! Exercise 3.3**

Using Scipy's `integrate` module, determine the integral (area under the curve) of the interpolants between 1 and 10.