# Basics of Python

One of the Python libraries data scientists often use is `numpy`, which is a library that facilitates array computations, such as matrix algebra. What this means is that using numpy, we can manipulate vectors, matrices or any n-dimensional array mostly without the need to write loops, so the code is cleaner and more succinct. In this exercise, we want to learn the basics of `numpy` and `pandas`.

1. Create a Python list whose elements are the numbers 3, 7, 1, 3, 5. 

In [2]:
a1_list = [3,7,1,3,5]

a1_list

[3, 7, 1, 3, 5]

2.  Write a function that computes the average of a list of numbers. Run your function on the above list so that it returns its average. Your function should only make use of Python **built-ins** (no libraries). 

In [3]:
def meanVal(some_list):
    avg = sum(some_list)/len(some_list)
    return avg

mean_a1_list = meanVal(a1_list)

mean_a1_list
    

3.8

3. Use the `%%timeit` magic to compute the average runtime of your function. Use the `-n 100` switch to choose to re-run the function 100 times (the more often you re-run it, the more accurate the average runtime is). 

In [4]:
res = %timeit -n 100 -o meanVal(a1_list)

res

280 ns ± 6.46 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)


<TimeitResult : 280 ns ± 6.46 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)>

4. Load the `numpy` library and use it to turn the above list into a `numpy` 1-D array. HINT: Use `numpy.array`. 

In [5]:
import numpy as np

a1_array = np.array(a1_list)
a1_array

array([3, 7, 1, 3, 5])

Because getting the average of an array is a common operation, with `numpy` we don't have to "re-invent the wheel": we can just call the `mean` function. There are two ways of doing this: (1) you can call the `numpy.mean` function and pass it the array, or (2) you can call the `mean` method of the array. 

5. Print the average of the above array. Get the average using `numpy` in **both** of the ways described above. 

I assumed that both ways are from numpy

In [6]:
a1_array_npmean = np.mean(a1_array)
print(" The average of the array using numpy.mean is ", a1_array_npmean)

from numpy import mean
a1_array_meanMethod = mean(a1_array)
print(" The average of the array using the mean method is ", a1_array_meanMethod)

 The average of the array using numpy.mean is  3.8
 The average of the array using the mean method is  3.8


6. Compare the runtime of the average computation using `numpy` with the runtime of the function you wrote eariler.

In [7]:
res2 = %timeit -n 100 -o np.mean(a1_array)


7.12 µs ± 1.27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


The runtime average of numpy.mean function is much slower than the function because the array/list used has a small size. Numpy functions come with an overhead cost, thus run slower for small array. However, they will outperform built-in function with larger size array.

Of course, most data scientists don't write their own machine learning algorithms. Instead we use existing algorithms and apply them to real-world problems. So `numpy` is to some extent too "low level" and we need a higher level library like `pandas` to work with data. So what do `numpy` and `pandas` have in common? First let's see what a 1-D array looks like in `pandas`:

7. Load the `pandas` library and use `pandas.Series` to create a pandas `Series` object, which is the equivalent of a `numpy` 1-D array. 

In [8]:
import pandas as pd

a1_series = pd.Series(a1_list)
a1_series

0    3
1    7
2    1
3    3
4    5
dtype: int64

8. Pass the `Series` to the `numpy.mean` function to confirm it returns its average. 

In [9]:
a1_series_npmean = np.mean(a1_series)
a1_series_npmean

3.8

9. Call the `mean` method of the `Series` and confirm it returns its average. 

In [10]:
a1_series_pdmean = pd.Series.mean(a1_series)
a1_series_pdmean

3.8

So you can think of a `Series` in `pandas` almost as the same thing as a 1-D array in `numpy`. In fact calling the `values` attribute of the `Series` returns it as a numpy array.

10. Show that by calling the `values` attribute of a `Series` object, you get a `numpy` array. HINT: You can use the `type` built-in to check its type. 

In [11]:
a1_series_values = pd.Series(a1_series).values
type(a1_series_values)

numpy.ndarray