# What? 
- Numpy is the fundamental package for scientific computing in Python.
- The numpy array is an n-dimensional array.
- Numpy methods allow for fast and simple linear algebra and data processing tasks.

## So What? 
- Numpy is one of the main reasons why Python is so powerful and popular for scientific computing
- Super fast. Numpy arrays are implemented in C, which makes numpy very fast.
- Numpy is the most popular linear algebra library for Python
- Provides loop-like behavior w/o the overhead of loops or list comprehensions (vectorized operations)
- Provides list + loop + conditional behavior for filtering arrays

## Now What?
- Start working with numpy arrays! `np.array([1, 2, 3])` to create a numpy array!
- We'll start using built-in numpy functions all the time:
    - min, max, mean, sum, std
    - np.median, 
- Learn to use some vectorized operations
- Learn how to create arrays of booleans to filter results

In [1]:
import numpy as np

In [2]:
%%timeit
[x ** 2 for x in range(1, 1_000_000)]

224 ms ± 4.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [3]:
%%timeit
np.arange(1, 1_000_000) ** 2

1.97 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [4]:
370 / 3.04

121.71052631578947

C is "closer to the metal" than Python

Assembly is closer to the metal than C

and Processor instruction sets == are the metal!

## Getting Started
- The `np.array` method converts iterables and collections into numpy arrays
- The numpy array is n-dimensional, which means it's flexible.

In [5]:
# Let's make our first numpy array!
x = [1, 2, 3, 4, 5]
x = np.array(x)
x, type(x)

(array([1, 2, 3, 4, 5]), numpy.ndarray)

In [6]:
# shape tells us the shape of our n-dimensional array
x.shape

(5,)

The `np.array()` method converts Python collections into numpy arrays of the appropriate dimension  

In [7]:
matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 0, 9]
])
matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 0, 9]])

In [8]:
type(matrix)

numpy.ndarray

.dtype is a property on numpy arrays

In [9]:
matrix.dtype

dtype('int64')

Numpy array element syntax is the same as Python (because it's Python)

In [10]:
# zero indexed
matrix[0]

array([1, 2, 3])

In [11]:
# first element of the first array
matrix[0][0]

1

In [12]:
# First element of the first array
matrix[0, 0]

1

In [13]:
first_row = matrix[0]
first_row

array([1, 2, 3])

In [14]:
first_element = first_row[0]
first_element

1

Slicing syntax works the same as in Python

In [15]:
x[0:3]

array([1, 2, 3])

In [16]:
a = np.array(range(1, 100))
a

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
       52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
       69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
       86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [17]:
# Numpy has many descriptive statistics as methods
a.sum(), a.mean(), a.min(), a.max(), a.std()

(4950, 50.0, 1, 99, 28.577380332470412)

In [18]:
# Calling .median off of the class instead of the array objects
np.median(a)

50.0

In [19]:
b = np.array([2, 3, 4, 5])
should_include_elements = np.array([False, True, False, True])
b[should_include_elements]

array([3, 5])

In [20]:
should_include_elements = np.array([False, True, True])
matrix[should_include_elements]

array([[4, 5, 6],
       [7, 0, 9]])

## Arrays of Booleans == Beating Heart of Filtering/Transforming Arrays
- This is how we can do loop-like stuff w/o loops
- This "spell" is called "Boolean Masking" and folks may "array filtering" or "indexing"

In [21]:
x = np.array([1, 2, 3, 4, 5])

In [22]:
x == 3

array([False, False,  True, False, False])

In [23]:
x < -9

array([False, False, False, False, False])

In [24]:
x > 3

array([False, False, False,  True,  True])

In [25]:
x % 2 == 0

array([False,  True, False,  True, False])

In [26]:
only_threes = x == 3
x[only_threes]

array([3])

In [27]:
# We don't need the extra variable, however, we can do the following:
# I read this code almost like SQL in my head:
# Select X where X is equal to 3
x[x == 3]

array([3])

In [28]:
# Select X where X is less than zero
x[x < 0]

array([], dtype=int64)

In [29]:
# Select x where x is greater than 3
x[x > 3]

array([4, 5])

In [30]:
# Select x where x divided by two leaves no remainder
evens_from_x = x[x % 2 == 0]
evens_from_x

array([2, 4])

In [31]:
# In the Python admissions test, there was a question called "remove_evens" where you write a function that removes evens
# In base Python, this is a loop w/ a conditional and another operation to append to a list, or a list comprehension
def remove_evens(x):
    x = np.array(x)
    return x[x % 2 == 0]

y = remove_evens([2, 3, 34, 5, 6, 24, 442, 24, 12, 3, 24, 3, 3, 23, 23, 23, 10])
y

array([  2,  34,   6,  24, 442,  24,  12,  24,  10])

In [32]:
x = np.array([1, 2, "3", 4])

In [33]:
x

array(['1', '2', '3', '4'], dtype='<U21')

## Intro to Vectorization
- Loop like behavior on a array w/o the loop

In [34]:
x = np.zeros(10)
x

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [35]:
x + 1

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [36]:
# Let's make an array of random integers
# start is inclusive, end is exclusive
# So the following line is like rolling a 20 sided die
x = np.random.randint(1, 21)
x

12

In [37]:
# 3rd argument is the size (or the number of random numbers)
x = np.random.randint(1, 21, 10)
x

array([13, 18, 12,  7,  6, 17, 13, 16, 13, 18])

In [38]:
# Let's make Python fall on its face
a = [1, 2, 3]

# In Python, how would we add one to every item on this list?
[n + 1 for n in a]

[2, 3, 4]

In [39]:
x + 1

array([14, 19, 13,  8,  7, 18, 14, 17, 14, 19])

In [40]:
# elementwise division
x / 10

array([1.3, 1.8, 1.2, 0.7, 0.6, 1.7, 1.3, 1.6, 1.3, 1.8])

In [41]:
# Vector addition
x + x

array([26, 36, 24, 14, 12, 34, 26, 32, 26, 36])

In [42]:
# scalar-vector multiplication along w/ vector subtraction
x - 2*x

array([-13, -18, -12,  -7,  -6, -17, -13, -16, -13, -18])

In [43]:
# There's many more linear algebra features
np.dot(x, x)

1929

In [44]:
np.linalg.norm(x)

43.9203825119955

In [45]:
original_array = [1, 2, 3, 4, 5]
array_with_one_added = []
for n in original_array:
    array_with_one_added.append(n + 1)
array_with_one_added

[2, 3, 4, 5, 6]

In [46]:
np.array(original_array) + 1

array([2, 3, 4, 5, 6])

In [47]:
beatles = np.array(["Ringo", "George", "Paul", "John"])

In [48]:
beatles == "Ringo"

array([ True, False, False, False])

In [49]:
beatles[beatles != "Ringo"]

array(['George', 'Paul', 'John'], dtype='<U6')

In [50]:
beatles[beatles == "Ringo"]

array(['Ringo'], dtype='<U6')

In [51]:
matrix * 2

array([[ 2,  4,  6],
       [ 8, 10, 12],
       [14,  0, 18]])

In [52]:
# Inner product
np.dot(x, x)

1929

In [53]:
np.random.randn(10)

array([ 0.45982871,  0.35720387, -3.0301247 ,  0.45522815,  2.63510564,
       -3.16151473,  0.09190138, -0.22853473,  1.26207173, -1.85474689])

In [54]:
np.random.randint(1, 7, 6)

array([5, 4, 4, 3, 1, 6])

## NumPy Official Documentation
- [NumPy Homepage](https://numpy.org/)
- [NumPy Quickstart](https://numpy.org/devdocs/user/quickstart.html)


## Other Works by [Ryan Orsinger](https://github.com/ryanorsinger)
- [101 Exercises](https://101exercises.com/) for Python or JavaScript basics
- [90 Minutes to Machine Learning](https://www.youtube.com/watch?v=VMx_3yM6G9s)

##### [Apply to Codeup](https://codeup.com/)'s Data Science program to learn numpy and much more with Ryan and the entire team