# Numpy Intro
Now we can finally begin our discussion of scientific computing in Python.

In [1]:
%load_ext memory_profiler
import numpy as np

## The n-dimensional array.

Lets first look at a 1 dimensional array.
A 1-dimensional array is the similar list we are used to.

## We will cover the n-dimensional array in the next lecture
### Think of Calc 1&2 vs Calc 3

In [None]:
#Here is an example. Heights of my friends in inches.
heights = [71,73,74,69]
print(heights)

## Lists in Python

Without going too far into the C-code underlying python, here is what you have to know.

**Lists in Python are flexible**

Lets see what this means below.

### 1: Type flexibility
Here is the c code for making a list called `numbers`

Note: c is a much faster language than python

`lst = int numbers[ 10 ];`

Notice how we must declare that we have a list of integers
Knowing that this list is filled with integers allows c to not have to check each time


In [None]:
#lets look at a list in python
some_list  = [1,2.0,'Lucas']

#If we tried this in c we would get an error

### 2: Memory flexibility

Lets look back at the c code
`lst = int numbers[ 10 ];`

See the 10 next to numbers? That tells c that our list is of size 10
c then reservers 10 memory blocks next to each other which can quickly be modified together

In [None]:
#In python we can readily add to our lists. For this reason, we do not reserve memory.
#This also slows down our computations
print(some_list)
some_list.append(True)
print(some_list)

#There is no such code in c because this is not possible.

### Both of these make our lists flexible but slow for doing operations.

#### What if we knew how big our lists had to be and what data types would be in them (as we often do in data analysis)?

## Enter numpy

In [None]:
#literally enter numpy
import numpy as np

`import numpy as np` is the standard way to import numpy, I usually include it at the top of my notebooks even if I'm not sure I'll use numpy

Now we can access any function in numpy by calling `np.<FUNC>`

In [None]:
#For example
np.add(1,2)

#Obviously this isn't very cool yet

# What does numpy do?
### In essence, numpy brings C-style arrays into Python. It does a whole lot more but thats the gist.

### We lose the type flexibility and size flexibility but gain a lot of speed.

### Creating the numpy array

There are are a few ways to create a numpy array.
I typically only use 7of them.

In [None]:
#1 From a list of a specificed type (Always works how you want it to)
print(heights)

In [None]:
#A list of numbers
heights_arr = np.array(heights)
#calling np.array(<LIST>) creates an array form of our list

In [None]:
heights_arr

In [None]:
#1b From a list of different types (Sometimes works, how you want it to)
#Remember, numpy needs each list to have one datatype
print(some_list)

In [None]:
new_list = some_list[:2] + some_list[-1:]

In [None]:
some_arr = np.array(some_list)
some_arr
#We got upscaled to a string array
#In general the order goes boolean < int < float < string
#i.e. if you have 1 million booleans and 1 float your whole array will become floats

In [None]:
np.array(new_list)

In [None]:
#1c What if we wanted to make heights a float array because we'll want to use fractions later
#Two ways

In [None]:
np.array(heights,dtype=str)

In [None]:
#Starting from the list
heights_float_arr = np.array(heights,dtype=float)
print(heights_float_arr)

In [None]:
heights_arr = np.array(heights)
heights_arr

In [None]:
heights_arr.astype(bool)

In [None]:
#Starting from the int array
print(heights_arr)
heights_float_arr = heights_arr.astype(float)
#<ARR>.astype(<TYPE>) in general

In [None]:
#Be careful with .astype, it won't downcast if it doesn't understand how
#Ex: Try to make some_arr a boolean arr

print(some_arr)

In [None]:
some_bool = some_arr.astype(bool)

In [None]:
np.array(['73.2','1']).astype(bool)

In [None]:
#get errors
#Sometimes it will work but it won't do what you want
#Just be careful

In [None]:
#2 Using np.empty()
#Ex: Say we want an array of 10 elements that we'll individually fill later
#This is most similar to the C-style call
empt_arr = np.empty(10)
print(empt_arr)

In [None]:
#By default, the dtype is float
#We can use the dtype argument to use anything
empt_str_arr = np.empty(10,dtype=str)
print(empt_str_arr)

What is going on in the float call of np.empty?

We are putting into our array what ever happens to be in memory at the space that we create it. This is the fastest way to create an array (we'll test this soon) as nothing must be modified

In [None]:
#3 using np.zeros()
#By defualt does floats
#Generally use this if you're going to add or subtract something from some elements
#Another use can be if want to create a boolean array that is False by default
#Ex:
zeros_arr = np.zeros(5)
zeros_arr

In [None]:
#We'll soon see how to efficiently select some elements to add to them
#Here is an example
zeros_arr[::2]+=3
print(zeros_arr)

In [None]:
#4 using np.ones()
#By defualt does floats
#Use this if you're going to multiply or divide something from some elements
#Another use can be if want to create a boolean array that is True by default
#Ex:
true_arr = np.ones(5,dtype=bool)
true_arr

In [None]:
#5 using np.arange(<START>,<END>,<STEP>)
#creates a range from <Start> to <END>-<STEP>
#default <STEP> is 1
#useful is we need to iterate over something
print(np.arange(1,10))
#Floats by default

In [None]:
#There is also a 3rd argument which is step size
#ex:
print(np.arange(1,10,2))

In [None]:
#Can do decimial steps
print(np.arange(1,10,.5))

In [None]:
#default <START> is 0
np.arange(10)

`np.arange` will create `ints` if it can and `floats` if it cannot.

In [None]:
#6 np.linspace(<START>,<END>,iterates)
#similar to arange except you select the number of items in between <START> and <END>
#Includes <END>
#Create 100 equally sized steps between 0 and 10
np.linspace(0,10,100)


In [None]:
#perhaps not what you expected
np.linspace(0,10,101)
#probably what you expected

In general, we use `np.linspace` when we need to graph the line of a function. We apply it many iterates in a range and then plot that. Here is an example you'll be comfortable with soon enough

In [None]:
import matplotlib.pyplot as plt

#Plot x^2 function from 0 to 1
X = np.linspace(-1,1,1000)
plt.plot(X,X**2)

In [None]:
#7 (From a dataset)
#We'll get to this soon

In [None]:
#8 (random arrays)
# https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.random.html

In [None]:
np.arange(1,8,2)

## Speed test
We'll create very large arrays 1 million elements using the zeros, ones, and empty. We expect linspace and arange to be slower as they are more specialized so not worth testing.

In [None]:
%timeit np.empty(1000000)

In [None]:
%timeit np.zeros(1000000)

In [None]:
%timeit np.ones(1000000)

### If you truly need to create an empty array, use np.empty

# Mathematical Array Operations

Lets create an array going from 1 to 10 and then multiply every element by 2

In [None]:
#In standard python
lst = list(range(1,11))
print(lst)

In [None]:
#Multiply all by 2
for i in range(len(lst)):
    lst[i]*=2
print(lst)

In [None]:
#The numpy way
arr = np.arange(1,11)
print(arr)

In [None]:
arr*=2
print(arr)

We eliminate the loop by just multiplying the whole array by 2. This is known as vectorization.

There is a loop going on under the hood but it is written in fast C-code.

## Other operations with a scalar.



In [None]:
arr = np.arange(1,11)
arr = arr + 1
print(arr)

In [None]:
# Most math operations you can think of will work
#Another example
arr = np.arange(1,11)

In [None]:
print(arr**3)

## Mathematical Functions

Here is a list:
https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.math.html

In [None]:
arr = np.arange(-10,11)
arr

In [None]:
np.abs(arr)

In [None]:
np.cos(arr)

In [None]:
np.exp(arr)

Note: Even though our array how dtype int, numpy will automatically upcast to float

In [None]:
print(arr.dtype)

# Operations with two arrays

# Must be same size*
*This is not true for all n-d arrays which we will talk about later

In [None]:
arr1 = np.ones(5)
arr2 = np.zeros(5)
print(arr1)
print(arr2)

In [None]:
print(arr1 + arr2)

In [None]:
print(arr1 * arr2)

What if the arrays aren't the same size?

In [None]:
arr1 = np.ones(4)
arr2 = np.zeros(5)

In [None]:
arr1 + arr2

What if the size of one is a factor of the other? (This works in R)

In [None]:
arr1 = np.ones(4)
arr2 = np.zeros(8)

In [None]:
arr1 + arr2
#What do you expect will happen?

## Aggregation: Reduction

Say we want to "reduce" an array to one value via an operation. For example, we may want to add all the values together. How would we do this?

In [3]:
arr = np.arange(1,5)
arr

array([1, 2, 3, 4])

In [None]:
np.add.reduce(arr)

In [None]:
def mystery(num):
    #What is this function doing?
    arr = np.arange(1,num+1)
    return np.multiply.reduce(arr)

In [None]:
mystery(6)

## Aggregation: Accumlation
What if we want to "store" the reduction at each step?

In [4]:
arr

array([1, 2, 3, 4])

In [6]:
np.multiply.accumulate(arr)

#In practice use pd.cumsum(arr)

array([ 1,  2,  6, 24])

## Aggregation: Outer Products
## Do all pairwise operations

In [7]:
x = np.arange(1,4)
y = np.arange(2,5)
print(x)
print(y)

[1 2 3]
[2 3 4]


In [8]:
outed = np.add.outer(x,y)
outed

array([[3, 4, 5],
       [4, 5, 6],
       [5, 6, 7]])

In [9]:
#First element of x + second element of y
outed[0,1]

4

In [None]:
#We can do the same with multiply

## How often do I use these?
- Virtually never
- Because we have summary stats

## Summary Statistics:


In [10]:
## A better way to add.accumlate
arr

array([1, 2, 3, 4])

In [11]:
np.sum(arr)

10

In [12]:
arr.sum()

10

In [13]:
arr.max()

4

In [14]:
arr.min()

1

In [15]:
## A better way to multiply.accumlate
arr.prod()

24

In [16]:
#mean
arr.mean()

2.5

In [17]:
#standard deviation
arr.std()

1.118033988749895

In [18]:
#variance
arr.var()

1.25

In [19]:
np.median(arr)
#arr.median() isn't a thing for some reason

2.5

### What if we wanted to know the location (index) of the min and max?
    * This actually comes up quite often

In [20]:
arr

array([1, 2, 3, 4])

In [27]:
#The min is in index 0 and max is in index 3
arr.argmin()

0

In [22]:
arr.argmax()

3

In [23]:
#Another way to get the min
arr[arr.argmin()]

1

In [24]:
#Same as
arr[0]

1

## Sorting

In [28]:
#Create a random integer 1-d array of size 14 where values can range from 0 to 9
rand_arr = np.random.randint(1,10,14)
rand_arr

array([8, 7, 1, 4, 9, 7, 1, 4, 6, 5, 2, 8, 6, 1])

In [None]:
#Something interesting. Normally the np.<FUNC>(arr) and arr.<FUNC>() are the same, not here
#Lets see

In [29]:
#Sort it and make a copy
np.sort(rand_arr)

array([1, 1, 1, 2, 4, 4, 5, 6, 6, 7, 7, 8, 8, 9])

In [30]:
#Doesn't change the array
rand_arr

array([8, 7, 1, 4, 9, 7, 1, 4, 6, 5, 2, 8, 6, 1])

In [31]:
rand_arr.sort()

In [32]:
#Actually modifies it
rand_arr

array([1, 1, 1, 2, 4, 4, 5, 6, 6, 7, 7, 8, 8, 9])

## Note: Never use the Python built in sort function for numpy arrays, its much slower.

In [33]:
rand_arr = np.random.randint(1,10,1000000)

In [34]:
%%timeit 
sorted(rand_arr)

242 ms ± 5.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [35]:
%%timeit 
np.sort(rand_arr)

16.7 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [None]:
%memit
sorted(rand_arr);

In [None]:
%memit
np.sort(rand_arr);

#### argsort

In [36]:
rand_arr = np.random.randint(1,10,14)
rand_arr

array([1, 5, 4, 9, 8, 2, 9, 8, 7, 2, 2, 5, 4, 1])

In [37]:
np.argsort(rand_arr)

array([ 0, 13,  5,  9, 10,  2, 12,  1, 11,  8,  4,  7,  3,  6])

In [None]:
#We'll talk more about indexing soon
rand_arr[np.argsort(rand_arr)]

In [38]:
#Also returns a copy
rand_arr.argsort()

array([ 0, 13,  5,  9, 10,  2, 12,  1, 11,  8,  4,  7,  3,  6])

In [39]:
rand_arr

array([1, 5, 4, 9, 8, 2, 9, 8, 7, 2, 2, 5, 4, 1])

## Comparison Operations

## With a Scalar

In [41]:
arr = np.arange(1,10)

arr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [42]:
#Returns a bool array
arr == 3

array([False, False,  True, False, False, False, False, False, False])

In [43]:
arr != 3

array([ True,  True, False,  True,  True,  True,  True,  True,  True])

In [44]:
arr >= 3

array([False, False,  True,  True,  True,  True,  True,  True,  True])

## With another array (of the same size) 

In [45]:
arr = np.arange(1,10)
arr2 = np.random.randint(1,10,9)

In [46]:
arr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [47]:
arr2

array([5, 3, 6, 3, 9, 5, 9, 6, 7])

In [48]:
arr > arr2

array([False, False, False,  True, False,  True, False,  True,  True])

## Special Comparisons of Boolean Arrays

In [49]:
bool_arr1 = np.random.randint(0,1+1,10).astype(bool)
bool_arr2 = np.random.randint(0,1+1,10).astype(bool)
#Why 1+1?
#what is .astype(bool) doing?

In [50]:
bool_arr1

array([False, False,  True, False, False, False,  True, False,  True,
       False])

In [51]:
bool_arr2

array([False, False,  True,  True,  True, False,  True, False, False,
        True])

In [52]:
# & asks if both are true
bool_arr1 & bool_arr2

array([False, False,  True, False, False, False,  True, False, False,
       False])

In [None]:
# | asks if either is true
bool_arr1 | bool_arr2

In [56]:
range_arr = np.arange(0,11)
range_arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [60]:
range_arr[(range_arr < 2) | (range_arr > 7)]

array([ 0,  1,  8,  9, 10])

#### These end up being very usual when we talk about indexing, which we will cover soon.

### Any and All Functions on Boolean Arrays

In [61]:
bool_arr1 = np.random.randint(0,1+1,10).astype(bool)

In [62]:
bool_arr1

array([ True, False,  True,  True,  True,  True, False, False, False,
       False])

In [63]:
#Are any of these true?
bool_arr1.any()

True

In [64]:
#Are all of these true?
bool_arr1.all()

False

In [66]:
bool_arr1

array([ True, False,  True,  True,  True,  True, False, False, False,
       False])

In [67]:
#Sum of a boolean array?
bool_arr1.sum()
#Counts the trues

5

In [68]:
#Mean of a boolean array?
bool_arr1.mean()
#Finds proportion of trues

0.5