# Week 02 Coding: Numpy, Pandas, and MatPlotLib (Lab)

## BEENOS Inc. Machine Learning Study Group

In the second week, you'll practice using `numpy`, `pandas` and `matplotlib`.

## Part 0: Numpy

NumPy is short for Numerical Python, and it is the most important package one needs when working with data in Python.

Other popular packages, like `pandas`, depend on the versatile functionality provided by numpy.

The core benefit to using `numpy` is access to its `ndarray` objects.

`ndarray` is short for **n-dimensional arrays**, which allow us to perform vectorized mathematical operations and data manipulation on collections of data.

For example, trying to add 2 to every element of a list will raise an error:

In [1]:
# add 2 to every element of a `list`

mylist = [1, 2, 3, 4, 5]

mylist + 2

TypeError: can only concatenate list (not "int") to list

But such an operation is trivial for `numpy`:

In [2]:
# add 2 to every element of an `ndarray`

import numpy as np

myarray = np.array([1,2,3,4,5])

myarray + 2

array([3, 4, 5, 6, 7])

We can also use `numpy` to create matrices from lists of lists:

In [3]:
# create a 3x3 matrix from a list of list

morelists = [[1,2,3], [4,5,6], [7,8,9]]

mymatrix = np.array(morelists)

mymatrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

### `numpy` Practice

For the first part of this lab, you'll be completing short exercises to get you used to working with `numpy` arrays.

The exercises are taken from [Machine Learning Plus](https://www.machinelearningplus.com/python/101-numpy-exercises-python/) and increase in difficulty from Level 1 to Level 4.

Feel free to skip around to exercises that challenge you.

Good luck!

**Q.0** Import numpy as `np` and print the version.

*Sample Output*:

> `1.15.4`

In [None]:
# INSERT CODE HERE

**Q.1** Use `np.array` to create a 1D array of numbers from 0 to 9.

*Output:*

> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
# INSERT CODE HERE

**Q.2** Extract all odd numbers from `arr`.

*Output:*

> array([1, 3, 5, 7, 9])

In [None]:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# INSERT CODE HERE

**Q.3** Use `reshape` to convert a 1D array to a 2D array with two rows.

*Output:*

>     array([[0, 1, 2, 3, 4],
>            [5, 6, 7, 8, 9]])

In [None]:
arr = np.arange(10)

# INSERT CODE HERE

**Q.4** Use `np.full` to create an nD boolean array. Choose your size and `True` or `False`.

*Sample Output:*

>     array([[ True,  True],
>            [ True,  True],
>            [ True,  True]])

In [None]:
# INSERT CODE HERE

**Q.5** Use `np.where` to replace all odd numbers in `arr` with `-1`, but **DO NOT** change `arr`.

*Sample Output*:

> array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1]) `# newarray`

> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) `# arr`

In [None]:
arr = np.arange(10)

# INSERT CODE HERE

print(arr, newarray)

**Q.6** Create the following pattern. **Use numpy functions.**

*Output:*

> array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

In [None]:
a = np.array([1,2,3])

# INSERT CODE HERE

**Q.7** Get all common items in both `a` and `b`.

*Output:*

> array([2,4])

In [None]:
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

# INSERT CODE HERE

**Q.8** Remove from `a` all items in `b`.

*Output:*

> array([0, 1, 2, 4])

In [None]:
a = np.array([0,1,2,3,4,5])
b = np.array([3,5,6,7,8,9])

# INSERT CODE HERE

**Q.9** Create a 2D array of size 5x3 containing random numbers between 5 and 10.

In [None]:
# INSERT CODE HERE

**Q.10** Use `np.genfromtxt` to import the Iris flowers dataset from the given url.

*Hint: If your last column shows `nan` values, trying specifying an `object` data type.*

*Output:*

> `# Print the first row`

> array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa']], dtype=object)

In [None]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'

# INSERT CODE HERE

## Part 2: Numpy

[intro to what numpy is, and why we use it]

[what they'll be doing in this section]

- arrays
- creating
- indexing
- datatypes
- math
- broadcasting

http://cs231n.github.io/python-numpy-tutorial/

EXERCISES: https://www.machinelearningplus.com/python/101-numpy-exercises-python/

## Part 3: Pandas

[intro to what pandas is, and why we use it]

[what they'll be doing in this section]

- create/read df
- describe
- summary
- indexing (loc, iloc)
- sorting
- groupby/agg
- transposing

From MLM:
```python
import numpy
import pandas
myarray = numpy.array([[1, 2, 3], [4, 5, 6]])
rownames = ['a', 'b']
colnames = ['one', 'two', 'three']
mydataframe = pandas.DataFrame(myarray, index=rownames, columns=colnames)
print(mydataframe)
```

EXERCISES: https://www.machinelearningplus.com/python/101-pandas-exercises-python/

## Part 4: Matplotlib

[intro to what matplotlib is, and why we use it]

[what they'll be doing in this section]

- draw simple plot
- create multiple subplots
- piechart
- histogram
- bar plot
- saving a plot

https://jakevdp.github.io/PythonDataScienceHandbook/04.00-introduction-to-matplotlib.html

https://heartbeat.fritz.ai/introduction-to-matplotlib-data-visualization-in-python-d9143287ae39

https://www.tutorialdocs.com/article/python-matplotlib-tutorial.html

EXERCISES: https://www.w3resource.com/graphics/matplotlib/