#### ARRAY PROPERTIES
NumPy arrays have these key properties:
* `ndim` - the number of dimensions (axes) in the array
* `shape` - the size of the array for each dimension 
* `size` - the total number of elements in the array
* `dtype` - the data type of the elements in the array

### Package Imports

In [1]:
import numpy as np

### Practice

In [2]:
array = np.array(range(5))

array

array([0, 1, 2, 3, 4])

In [3]:
array_2d = np.array([range(5), range(5, 10)])

array_2d

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [5]:
array_2d + 1

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [4]:
array.shape # there is no seond dimension

(5,)

In [6]:
array_2d.shape  # 2 rows, 5 columns

(2, 5)

In [7]:
array.size

5

In [8]:
array_2d.size

10

In [9]:
array_2d.T.shape # transpose - swap axes

(5, 2)

In [10]:
array_2d.T # the transpose method

array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])

In [11]:
print(array.ndim, array_2d.ndim)

1 2


In [12]:
array.dtype

dtype('int64')

In [13]:
array_2d.dtype

dtype('int64')

In [14]:
np.array(['I', 'love', 'Python'])

array(['I', 'love', 'Python'], dtype='<U6')

# Assignment 1: Array Basics

Hi there,

Can you import Numpy and convert the following list comprehension (I just learned about comprehensions in an awesome course by Maven) into an array?

Once you've done that report the following about the array:
* The number of dimensions 
* The shape
* The number of elements in the array
* The type of data contained inside

In [15]:
my_list = [x * 10 for x in range(1, 11)]

In [16]:
my_array = np.array(my_list)

In [17]:
my_array

array([ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100])

In [18]:
print(f"ndim: {my_array.ndim}, shape: {my_array.shape}, size: {my_array.size}, dtype: {my_array.dtype}")

ndim: 1, shape: (10,), size: 10, dtype: int64


In [19]:
print(f"ndim: {my_array.ndim}")
print(f"shape: {my_array.shape}")
print(f"size: {my_array.size}")
print(f"dtype: {my_array.dtype}")

ndim: 1
shape: (10,)
size: 10
dtype: int64


In [20]:
my_array.reshape(2, 5)

array([[ 10,  20,  30,  40,  50],
       [ 60,  70,  80,  90, 100]])

# Assignment 2: Array Creation

Thanks for your help with the first piece - I'm starting to understand some of the key differences between base Python data types and NumPy arrays. 

Does NumPy have anything like the range() function from base Python?

If so: 
* create the same array from assignment 1 using a NumPy function. 
* Make it 5 rows and 2 columns. 
* It's ok if the datatype is float or int.

Looking good so far! One of our data scientists asked about random number generation in NumPy.

Can you create a 3x3 array of random numbers between 0 and 1? Use a random state of 2022.

Store the random array in a variable called `random_array`.

### Notes

**Create arrays** using functions

* ones - `np.ones((rows, cols), dtype)` - create an array of ones of a given size, as float by default
* zeros - `np.zeros((rows, cols), dtype)` - create an array of zeros of a given size, as float by default
* arange - `np.arange(start, stop, step)` - create an array of integers with given start & stop values, and a step size (only stop is required, and is not inclusive)
* linspace - `np.linspace(start, stop, n)` - create an array of floats with given start & stop values with n elements, separated by a consistent step size (stop is inclusive)
* reshape - `np.reshape(rows, cols)` - changes an array into the specified dimensions, if compatible

In [21]:
np.ones(4,)

array([1., 1., 1., 1.])

In [36]:
# we want an array with 5 ones 
np.ones(5)

array([1., 1., 1., 1., 1.])

In [37]:
np.ones(2, 'int') # equals to np.ones(2, dtype='int') - explicitely reference the dtype argument name

array([1, 1])

In [38]:
# NumPy zeros
np.zeros((2, 5), dtype=int)

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

In [39]:
np.zeros(100, 'int')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [40]:
np.zeros(100, 'int').reshape(10, 10) # 10 by 10 matrix

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [41]:
np.zeros(100, 'int').reshape(5, 20) # 5 by 20 matrix

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [42]:
# Identity matrix (square matrix with ones on the main diagonal)
np.identity(10) # an identity matrix with 10 rows and 10 columns

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

In [43]:
np.identity(10).reshape(5, 20)

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
        0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
        0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
        0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 1.]])

In [44]:
np.identity(10).reshape(5, 20).T

array([[1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1.]])

In [45]:
np.arange(10) # start is 0 and step is 1 by default, stop is not inclusive

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [47]:
np.arange(10).reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [48]:
np.arange(5, 10) 

array([5, 6, 7, 8, 9])

In [49]:
np.arange(1, 9, 2)

array([1, 3, 5, 7])

In [50]:
np.arange(1, 9, 2).reshape(2, 2)

array([[1, 3],
       [5, 7]])

In [33]:
# Create an array with equaly spaced elements based on a start, stop and number of elements
# This can be super helpful for creating chart axes or custom labels elsewhere

np.linspace(0, 100, 5) # start, stop (inclusive), number of elements

array([  0.,  25.,  50.,  75., 100.])

In [34]:
np.linspace(0, 100, 11)  # stop is inclusive

array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

In [35]:
np.linspace(8, 64, 8)  

array([ 8., 16., 24., 32., 40., 48., 56., 64.])

# Assignment 3: Accessing Array Data


Slice and index the `random_array` we created in the previous exercise. Perform the following:

* Grab the first two 'rows' of the array
* Grab the entire first column
* Finally, grab the second selement of the third row.

Thanks!


In [88]:
rng = np.random.default_rng(2022)

random_array = rng.random(9).reshape(3, 3)

random_array

array([[0.24742606, 0.09299006, 0.61176337],
       [0.06066207, 0.66103343, 0.75515778],
       [0.1108689 , 0.04305584, 0.41441747]])

In [89]:
# 1 - grab the first 2 rows

random_array[:2, :]

array([[0.24742606, 0.09299006, 0.61176337],
       [0.06066207, 0.66103343, 0.75515778]])

In [90]:
# 2 - grab the entire first column 

random_array[:, 0]

array([0.24742606, 0.06066207, 0.1108689 ])

In [91]:
# 3 - grab the 2nd element of the 3rd row 

random_array[2, 1]

0.04305584439252108

### Random Number Generation / Random Number Arrays

You can create **random number arrays** from a variety of distributions using NumPy functions and methods (great for sampling and simulation!)
* `default_rng` - creates a random number generator (the seed is for reproducibility) - `np.default_rng(seed)`
* `random` - returns a random numbers from a uniform distribution between 0 and 1 - `rng.random(n)` 
* `normal` - returns a random numbers from a normal distribution with a given means and standard deviation - `rng.normal(mean, stdev, n)`

**PRO TIP:** Event hough it's optional, make sure to **set a seed** when generating random numbers to ensure you and others can recreate the work you've done (the value for the seed is less important)

In [51]:
from numpy.random import default_rng

rng = default_rng(12345)

random_array = rng.random(10)
random_array

array([0.22733602, 0.31675834, 0.79736546, 0.67625467, 0.39110955,
       0.33281393, 0.59830875, 0.18673419, 0.67275604, 0.94180287])

In [52]:
rng = default_rng(12345)
mean, stdev = 5, 1
random_normal = rng.normal(mean, stdev, size=10)
random_normal

array([3.57617496, 6.26372846, 4.12933826, 4.74082677, 4.92465669,
       4.25911535, 3.6322073 , 5.6488928 , 5.36105811, 3.04713694])

In [56]:
rng = np.random.default_rng(616)

rng.random(10) # 10 random numbers between 0 and 1

array([0.39682145, 0.86568572, 0.46040359, 0.30599848, 0.57381588,
       0.08888468, 0.88194347, 0.73228387, 0.73215182, 0.56233394])

In [57]:
rng.integers(0, 10, 100) # 100 random integers between 0 and 10

array([8, 3, 6, 0, 3, 3, 1, 2, 2, 4, 9, 1, 2, 5, 1, 8, 9, 0, 3, 1, 5, 8,
       0, 6, 1, 7, 0, 7, 6, 0, 3, 7, 1, 9, 4, 1, 6, 1, 4, 4, 9, 5, 3, 4,
       7, 8, 3, 3, 5, 0, 4, 9, 9, 5, 6, 5, 8, 2, 3, 0, 0, 8, 3, 8, 4, 8,
       8, 9, 7, 3, 8, 4, 9, 2, 6, 3, 0, 3, 2, 0, 0, 4, 9, 5, 4, 1, 8, 2,
       7, 7, 1, 2, 2, 4, 1, 6, 7, 3, 7, 5])

In [58]:
rng.normal(50, 5, 10) # 10 random numbers from a normal distribution with mean 50 and standard deviation 5

array([53.53115044, 46.35468178, 46.1580894 , 52.89177118, 45.90693639,
       44.51748741, 57.41467712, 51.82624671, 53.9032125 , 53.21454641])

### ASSIGNMENT: ARRAY CREATION

In [59]:
my_array = np.linspace(10, 100, 10).reshape(5,2)

my_array

array([[ 10.,  20.],
       [ 30.,  40.],
       [ 50.,  60.],
       [ 70.,  80.],
       [ 90., 100.]])

In [62]:
my_array = np.arange(10, 101, 10).reshape(5,2)

my_array

array([[ 10,  20],
       [ 30,  40],
       [ 50,  60],
       [ 70,  80],
       [ 90, 100]])

In [64]:
my_array = (np.arange(1, 11) * 10)#.reshape(5,2)

my_array

array([ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100])

In [65]:
my_array = (np.arange(1, 11) * 10).reshape(5,2)

my_array

array([[ 10,  20],
       [ 30,  40],
       [ 50,  60],
       [ 70,  80],
       [ 90, 100]])

**Create a 3*3 array of random numbers between 0 and 1. Use a random state of 2022**

The creativity of our marketing team knows no bounds!

They've asked us to come up with a simple algorithm to provide a random discount to our list of prices below. 

Before we do that, 

* Add a 5 dollar shipping fee to each price. Call this array `total`.

Once we have that, we want to use the random_array created in assignment 2 and apply them to the 6 prices.

* Grab the first 6 numbers from `random_array`, reshape it to one dimension. Call this `discount_pct`.
* Subtract `discount_pct` FROM 1, store this in `pct_owed`.
* Multiply `pct_owed` by `total` to get the final amount owed.

In [60]:
from numpy.random import default_rng

rng = default_rng(2022)

random_array = rng.random(9).reshape(3, 3) # 9 random numbers

random_array

array([[0.24742606, 0.09299006, 0.61176337],
       [0.06066207, 0.66103343, 0.75515778],
       [0.1108689 , 0.04305584, 0.41441747]])

In [67]:
from numpy.random import default_rng

rng = default_rng(2022)

random_array = rng.random((3, 3))

random_array

array([[0.24742606, 0.09299006, 0.61176337],
       [0.06066207, 0.66103343, 0.75515778],
       [0.1108689 , 0.04305584, 0.41441747]])

# Assignment 4: Arithmetic Operations

The creativity of our marketing team knows no bounds!

They've asked us to come up with a simple algorithm to provide a random discount to our list of prices below. 

Before we do that, 

* Add a 5 dollar shipping fee to each price. Call this array `total`.

Once we have that, we want to use the random_array created in assignment 2 and apply them to the 6 prices.

* Grab the first 6 numbers from `random_array`, reshape it to one dimension. Call this `discount_pct`.
* Subtract `discount_pct` FROM 1, store this in `pct_owed`.
* Multiply `pct_owed` by `total` to get the final amount owed.

In [110]:
prices = np.array([5.99, 6.99, 22.49, 99.99, 4.99, 49.99])

total = prices + 5 

total 

array([ 10.99,  11.99,  27.49, 104.99,   9.99,  54.99])

In [112]:
# recreate the random array

from numpy.random import default_rng

rng = default_rng(2022)

random_array = rng.random(9).reshape(3, 3) # 9 random numbers

random_array

array([[0.24742606, 0.09299006, 0.61176337],
       [0.06066207, 0.66103343, 0.75515778],
       [0.1108689 , 0.04305584, 0.41441747]])

In [115]:
discount_pct = random_array[:2, :].reshape(6)

pct_owed = 1 - discount_pct

final_owed = total * pct_owed

final_owed.round(2)

array([ 8.27, 10.88, 10.67, 98.62,  3.39, 13.46])

In [116]:
# we can also combine this to a single operation (may not necessary)

((1 - (random_array[:2, :].reshape(6))) * total).round(2)

array([ 8.27, 10.88, 10.67, 98.62,  3.39, 13.46])

In [117]:
print(discount_pct)
print(pct_owed)
print(final_owed.round(2))

[0.24742606 0.09299006 0.61176337 0.06066207 0.66103343 0.75515778]
[0.75257394 0.90700994 0.38823663 0.93933793 0.33896657 0.24484222]
[ 8.27 10.88 10.67 98.62  3.39 13.46]


### Indexing & Slicing Arrays

**Indexing & slicing** one-dimensional arrays is the same as base Python
* `array[index]` - indexing to access a single element (0-indexed)
* `array[start:stop:step size]` - slicing to access a series of elements (stop is not inclusive)

In [68]:
import numpy as np

product_array = np.array(['fruits', 'vegetables', 'cereal', 'dairy', 'eggs', 'snacks', 'beverages', 'coffee', 'tea', 'spices'])

product_array

array(['fruits', 'vegetables', 'cereal', 'dairy', 'eggs', 'snacks',
       'beverages', 'coffee', 'tea', 'spices'], dtype='<U10')

In [69]:
print(product_array[1])
print(product_array[-1])

vegetables
spices


In [70]:
print(product_array[:5])

['fruits' 'vegetables' 'cereal' 'dairy' 'eggs']


In [71]:
print(product_array[5::2])

['snacks' 'coffee' 'spices']


**Indexing & slicing** two-dimensional arrays requires an extra index or slice
* `array[row index, column index]` - indexing to access a single element (0-indexed)
* `array[start:stop:step size, start:stop:step size]` - slicing to access a series of elements

In [72]:
product_array2D = product_array.reshape(2, 5)
product_array2D

array([['fruits', 'vegetables', 'cereal', 'dairy', 'eggs'],
       ['snacks', 'beverages', 'coffee', 'tea', 'spices']], dtype='<U10')

In [73]:
# to grab coffee

product_array2D[1, 2] # row 1, column 2

'coffee'

In [74]:
# grab all rows
# slicking from the third column to the end of our list of columns

product_array2D[:, 2:]

array([['cereal', 'dairy', 'eggs'],
       ['coffee', 'tea', 'spices']], dtype='<U10')

In [75]:
# This goes to the second row and grabs all its elements

product_array2D[1:, :]

array([['snacks', 'beverages', 'coffee', 'tea', 'spices']], dtype='<U10')

### Practice

In [77]:
# integers from 0 to 11 (12 total values)

integer_array = np.arange(12)

integer_array

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [78]:
integer_array[-2]

10

In [79]:
# slice 

integer_array[:5]

array([0, 1, 2, 3, 4])

In [80]:
# reverse the colon sign and now be starting at the sixth element and going to the end

integer_array[5:]

array([ 5,  6,  7,  8,  9, 10, 11])

In [81]:
# step size of two (grab every other element)

integer_array[::2]

array([ 0,  2,  4,  6,  8, 10])

In [82]:
# create a new 3 rows and 4 columns array

new_array = integer_array.reshape(3, 4)

new_array

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [83]:
# second row but all columns

new_array[1, :]

array([4, 5, 6, 7])

In [84]:
# just grab the last column

new_array[:, -1] # this equals to new_array[:, 3]

array([ 3,  7, 11])

In [85]:
# grab seond through the end of our columns

new_array[:, 1:]

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11]])

In [87]:
# grab a single cooridinate 
# grab the number 9

new_array[2, 1]

9

### ARRAY OPERATIONS

Array operations are applied via `vectorization` and `broadcasting` which eliminates the need to loop through the array's elements

In [92]:
import numpy as np 

rng = np.random.default_rng(616)

inventory = rng.integers(0, 100, 10)

inventory

array([39, 39, 93, 86, 48, 46, 48, 30, 11, 57])

In [93]:
inventory - 24

array([ 15,  15,  69,  62,  24,  22,  24,   6, -13,  33])

In [94]:
inventory / 2

array([19.5, 19.5, 46.5, 43. , 24. , 23. , 24. , 15. ,  5.5, 28.5])

In [95]:
(inventory / 2).dtype

dtype('float64')

In [97]:
# a random array with length of 11

price = (rng.random(11) * 10).round(2)

price

array([5.24, 8.02, 0.75, 1.51, 8.97, 6.63, 7.26, 7.66, 0.3 , 7.78, 9.62])

In [98]:
price * inventory

ValueError: operands could not be broadcast together with shapes (11,) (10,) 

In [99]:
price = (rng.random(10) * 10).round(2)

price * inventory

array([ 74.1 ,  65.91, 440.82, 509.12, 204.  , 378.12, 182.4 ,   6.6 ,
       108.02, 313.5 ])

In [100]:
(price * inventory).sum()

2282.59

In [103]:
# convert the inventory array to a list

inventory_list = list(inventory) # inventory.tolist()

In [104]:
inventory_list

[39, 39, 93, 86, 48, 46, 48, 30, 11, 57]

In [105]:
new_inventory = []

for x in inventory_list:
    new_inventory.append(x + 2)

new_inventory

[41, 41, 95, 88, 50, 48, 50, 32, 13, 59]

In [106]:
[ x + 2 for x in inventory_list ]

[41, 41, 95, 88, 50, 48, 50, 32, 13, 59]

In [107]:
[x * y for x, y in zip(inventory_list, price)]

[74.1,
 65.91,
 440.82,
 509.12,
 204.0,
 378.12,
 182.39999999999998,
 6.6,
 108.02000000000001,
 313.5]

# Assignment 5: Filtering Arrays

Filter the product array to only include those with prices greater than 25.

Modify your logic to include cola, despite it not having a price greater than 25. 
Store the elements returned in an array called `fancy_feast_special`.

Next, create a shipping cost array where the cost is 0 if price is greater than 20, and 5 if not. 

In [141]:
products = np.array(
    ["salad", "bread", "mustard", "rare tomato", "cola", "gourmet ice cream"]
)

products

array(['salad', 'bread', 'mustard', 'rare tomato', 'cola',
       'gourmet ice cream'], dtype='<U17')

In [142]:
products[prices > 25]

array(['rare tomato', 'gourmet ice cream'], dtype='<U17')

In [143]:
mask = (prices > 25) | (products == "cola")

fancy_feast_special = products[mask]

fancy_feast_special

array(['rare tomato', 'cola', 'gourmet ice cream'], dtype='<U17')

In [144]:
shipping = np.where(prices > 20, 0, 5)

shipping

array([5, 5, 0, 0, 5, 0])

## Filtering Arrays

You can `filter arrays` by indexing them with a logical test
* Only the array elements in positions where the logical test returns True are returned

In [118]:
sales_array = np.array([[0, 5, 155, 0, 518], [0, 1827, 616, 317, 325]])

sales_array

array([[   0,    5,  155,    0,  518],
       [   0, 1827,  616,  317,  325]])

In [119]:
sales_array != 0

array([[False,  True,  True, False,  True],
       [False,  True,  True,  True,  True]])

You can filter arrays with `multiple logical tests` 
* Use `|` for `or` conditions, and `&` for `and` conditions 

In [120]:
sales_array[(sales_array == 616) | (sales_array < 100)]

array([  0,   5,   0,   0, 616])

In [122]:
sales_array[(sales_array > 100) & (sales_array < 500)]

array([155, 317, 325])

**PRO TIP:** Store complex filtering criteria in a variable (known as a Boolean mask)

In [123]:
mask = (sales_array > 100) & (sales_array < 500)

sales_array[mask]

array([155, 317, 325])

### Practice

In [124]:
my_array = np.arange(20)

my_array

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [126]:
my_array % 2 == 0

array([ True, False,  True, False,  True, False,  True, False,  True,
       False,  True, False,  True, False,  True, False,  True, False,
        True, False])

In [127]:
my_array[my_array % 2 == 0]

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [128]:
even_odd = np.array(["even", "odd"] * 10)

even_odd

array(['even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even',
       'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd',
       'even', 'odd'], dtype='<U4')

In [129]:
even_odd != 'odd'

array([ True, False,  True, False,  True, False,  True, False,  True,
       False,  True, False,  True, False,  True, False,  True, False,
        True, False])

In [130]:
even_odd[even_odd != 'odd']

array(['even', 'even', 'even', 'even', 'even', 'even', 'even', 'even',
       'even', 'even'], dtype='<U4')

In [131]:
# replace all even numbers in my_array with 0

my_array[my_array % 2 == 0] = 0

my_array

array([ 0,  1,  0,  3,  0,  5,  0,  7,  0,  9,  0, 11,  0, 13,  0, 15,  0,
       17,  0, 19])

In [132]:
my_array[2] = 2

my_array

array([ 0,  1,  2,  3,  0,  5,  0,  7,  0,  9,  0, 11,  0, 13,  0, 15,  0,
       17,  0, 19])

## The WHERE Function

The `where()` NumPy function performs a logical test and returns a given value if the test is True, or another if the test is False

```python
np.where(logical test, value if True, value if False)
```

In [133]:
inventory_array = np.array([12, 102, 18, 0,  0])

inventory_array

array([ 12, 102,  18,   0,   0])

In [134]:
product_array = np.array(["fruits", "vegetables", "cereal", "dairy", "eggs"])

product_array

array(['fruits', 'vegetables', 'cereal', 'dairy', 'eggs'], dtype='<U10')

In [135]:
# If inventory is zero or negative, assign "Out of Stock", otherwise assign "In Stock"

np.where(inventory_array <= 0, "Out of Stock", "In Stock")

array(['In Stock', 'In Stock', 'In Stock', 'Out of Stock', 'Out of Stock'],
      dtype='<U12')

In [136]:
# If inventory is zero or negative, assign "Out of Stock", otherwise assign the product_array value

np.where(inventory_array <= 0, "Out of Stock", product_array)

array(['fruits', 'vegetables', 'cereal', 'Out of Stock', 'Out of Stock'],
      dtype='<U12')

In [138]:
my_array = np.arange(20)

my_array

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [139]:
np.where(my_array % 2 == 0, 'even', 'odd')

array(['even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even',
       'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd',
       'even', 'odd'], dtype='<U4')

In [140]:
np.where(my_array % 2 == 0, 'even', np.where(my_array == 9, my_array, 'odd'))

array(['even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even',
       '9', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd',
       'even', 'odd'], dtype='<U21')

# Assignment 6: Aggregating and Sorting Arrays

First, grab the top 3 highest priced items in our list. 

Then, calculated the mean, min, max, and median of the top three prices.

Finally, calculate the number of unique price tiers in our `price_tiers` array.

In [175]:
prices = np.array([5.99, 6.99, 22.49, 99.99, 4.99, 49.99])

prices

array([ 5.99,  6.99, 22.49, 99.99,  4.99, 49.99])

In [176]:
prices.sort()

prices

array([ 4.99,  5.99,  6.99, 22.49, 49.99, 99.99])

In [177]:
top_3 = prices[-3:]

In [178]:
print(f"Mean: {top_3.mean()}")
print(f"Min: {top_3.min()}")
print(f"Max: {top_3.max()}")
print(f"Median: {np.median(top_3)}")

Mean: 57.49
Min: 22.49
Max: 99.99
Median: 49.99


In [180]:
price_tiers = np.array(["budget", "budget", "mid-tier", "luxury", "mid-tier", "luxury"])

In [181]:
np.unique(price_tiers)

array(['budget', 'luxury', 'mid-tier'], dtype='<U8')

In [182]:
np.unique(price_tiers).size

3

## Array Aggregation Methods

`Array aggregation methods` let you calculate metrics like sum, mean, max

* `array.sum()`
* `array.mean()`
* `array.max()`
* `array.min()`

You can also aggregate across `rows` or `columns` 
* `array.sum(axis=0)` - Aggregates across rows
* `array.sum(axis=1)` - Aggregates across columns

`Array functions` let you perform other aggregations like median and percentiles
* `np.median(array)`
* `np.median(array, n)` -- returns a value in the nth percentile in an array

You can also return a `unique` list of values or the `square root` for each number
* `np.unique(array)` -- return the unique values in an array
* `np.sqrt(array)` -- returns the sqaure root of each value in an array


## Array Aggregation

In [145]:
sales_array = np.array([[0, 5, 155, 0, 518], [0, 1827, 616, 317, 325]])

sales_array

array([[   0,    5,  155,    0,  518],
       [   0, 1827,  616,  317,  325]])

In [146]:
# It's going to collapse our data down into a single row

sales_array.sum(axis=0)

array([   0, 1832,  771,  317,  843])

In [147]:
# summing across the columns and keeping that row structure intact here

sales_array.sum(axis=1)

array([ 678, 3085])

### Practice 

In [148]:
import numpy as np 

rng = np.random.default_rng(616)

price = (rng.random(10) * 10).round(2)

price

array([3.97, 8.66, 4.6 , 3.06, 5.74, 0.89, 8.82, 7.32, 7.32, 5.62])

In [149]:
inventory = rng.integers(0, 100, 10)

inventory

array([80, 33, 60,  6, 31, 35, 16, 20, 29, 43])

In [150]:
# grab the mean of price 

price.mean()

5.6

In [151]:
inventory.mean()

35.3

In [152]:
inventory.sum() 

353

In [155]:
# grab the standard deviation of inventory

inventory.std()

20.48438429633656

In [157]:
# returns the total value of our products

(price * inventory).sum().round(2)

1848.29

In [158]:
# return the index of the most valuable product

(price * inventory).argmax()

0

In [159]:
price_2d = price.reshape(5, 2)

price_2d

array([[3.97, 8.66],
       [4.6 , 3.06],
       [5.74, 0.89],
       [8.82, 7.32],
       [7.32, 5.62]])

In [160]:
# sum across rows 

price_2d.sum(axis=0)

array([30.45, 25.55])

In [162]:
# mean across columns

price_2d.mean(axis=1)

array([6.315, 3.83 , 3.315, 8.07 , 6.47 ])

## Array Functions

In [163]:
np.median(sales_array)

236.0

In [167]:
np.percentile(sales_array, 50) # equivalent to np.median(sales_array)

236.0

In [168]:
np.percentile(sales_array, 90)

737.0999999999996

In [169]:
np.unique(sales_array)

array([   0,    5,  155,  317,  325,  518,  616, 1827])

In [170]:
np.sqrt(sales_array)

array([[ 0.        ,  2.23606798, 12.4498996 ,  0.        , 22.75961335],
       [ 0.        , 42.74342055, 24.81934729, 17.80449381, 18.02775638]])

In [171]:
np.unique(np.ones(100)) # there's only one unique value in this array

array([1.])

## Sorting Arrays

The `sort()` method will `sort arrays` in place 
* Use the axis arguments to specify the dimensions to sort by 

In [172]:
sales_array

array([[   0,    5,  155,    0,  518],
       [   0, 1827,  616,  317,  325]])

In [173]:
# axis=1 by default, which sorts a two-dimensional array row by row

sales_array.sort()

sales_array

array([[   0,    0,    5,  155,  518],
       [   0,  317,  325,  616, 1827]])

In [174]:
# axis=0 will sort by columns

sales_array.sort(axis=0)

sales_array

array([[   0,    0,    5,  155,  518],
       [   0,  317,  325,  616, 1827]])

## Vectorization

`Vectorization` is the process of pushing array operations into optimized C code, which is easier and more efficient than writing for loops

**PRO TIP:** Use vectorized operations whenever possible when manipulating data, and avoid writing loops
* _Below example: Converting and multiplying two arrays -- ~86 times faster_

In [183]:
def for_loop_multiply_lists(list1, list2):
    product_list = []
    for element1, element2 in zip(list1, list2):
        product_list.append(element1 * element2)
    return product_list

def multiply_arrays(array1, array2):
    return array1 * array2

In [184]:
list1 = list(range(1000))
list2 = list(range(1000))


In [185]:
%%timeit -r 5 -n 10000 # call this function 10000 times and repeat it 5 times to get a standard deviation
for_loop_multiply_lists(list1, list2)

49.2 µs ± 109 ns per loop (mean ± std. dev. of 5 runs, 10,000 loops each)


In [186]:
array1 = np.array(list1)
array2 = np.array(list2)

In [187]:
%%timeit -r 5 -n 10000
multiply_arrays(array1, array2)

717 ns ± 35.5 ns per loop (mean ± std. dev. of 5 runs, 10,000 loops each)


## Broadcasting

`Broadcasting` lets you perform vectorized operations with arrays of different sizes, where NumPy will expand the smaller array to 'fit' the larger one
* Single values (scalars) can be broadcast into arrays of any dimensions
* Dimensions with a length greater than one must be the same size

In [188]:
test_array = np.array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])

test_array

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

In [189]:
test_array + 1

array([[2, 3, 4],
       [2, 3, 4],
       [2, 3, 4]])

In [190]:
# Dimensions 

test_array + np.array([3, 2, 1])

array([[4, 4, 4],
       [4, 4, 4],
       [4, 4, 4]])

In [191]:
test_array + np.array([3, 2, 1]).reshape(3, 1)

array([[4, 5, 6],
       [3, 4, 5],
       [2, 3, 4]])

In [192]:
# Broadcasting replies on compatible shapes/dimensions

test_array + np.array([2, 1])

ValueError: operands could not be broadcast together with shapes (3,3) (2,) 

In [193]:
test_array = np.array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])

test_array

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

In [196]:
# as long as there's one matching dimension of our two arrays, we can broadcast
# Python can figure out how to add them together

test_array[0, :] + test_array[:, 1].reshape(3, 1)

array([[3, 4, 5],
       [3, 4, 5],
       [3, 4, 5]])

In [197]:
# This also expands into multi-dimensional or three-dimensional arrays
# 2 deep by 3 rows by 3 columns
# expand this into a cube, and gives us a cube of twos

np.ones((2, 3, 3), dtype=int) + np.ones(3, dtype=int)

array([[[2, 2, 2],
        [2, 2, 2],
        [2, 2, 2]],

       [[2, 2, 2],
        [2, 2, 2],
        [2, 2, 2]]])

# Assignment 7: Bringing it All Together

Ok, final NumPy task - let's read in some data with the help of Pandas.

Our data scientist provided the code to read in a csv as a Pandas dataframe, and has converted the two columns of interest to arrays.

* Filter `sales_array` down to only sales where the product family was produce. 

* Then, randomly sample roughly half (random number < .5) of the produce sales and report the mean and median sales. Use a random seed of 2022.

* Finally, create a new array that has the values 'above_both', 'above_median', and 'below_both' based on whether the sales were above the median and mean of the sample, just above the median of the sample, or below both the median and mean of the sample. 

In [205]:
import pandas as pd
import numpy as np

retail_df = pd.read_csv(
    "../retail/retail_2016_2017.csv", skiprows=range(1, 11000), nrows=1000
)

family_array = np.array(retail_df["family"])
sales_array = np.array(retail_df["sales"])

In [208]:
produce_array = sales_array[family_array == "PRODUCE"]

In [221]:
produce_array

array([1662.394,  447.064, 2423.944,  962.866, 1236.404,  298.441,
       1077.44 , 3404.531,  962.96 ,  279.505, 1852.786, 1089.319,
        726.516, 7860.031,  446.038, 1155.385,  120.202,  862.092,
        473.952,  254.263, 1272.755, 2775.771, 2030.762, 1657.432,
       2339.906,  722.333, 1567.843, 2458.456,  673.885, 8834.15 ])

In [222]:
produce_array.size

30

In [223]:
rng = np.random.default_rng(2022)

random_array = rng.random(30)

sampled_array = produce_array[random_array < 0.5]

In [224]:
sampled_array

array([1662.394,  447.064,  962.866, 1077.44 , 3404.531,  962.96 ,
       1089.319, 7860.031,  446.038, 1272.755, 2775.771, 2339.906,
        722.333, 1567.843, 2458.456,  673.885, 8834.15 ])

In [225]:
sampled_array.size

17

In [226]:
mean = sampled_array.mean()

mean

2268.102470588235

In [227]:
median = np.median(sampled_array)

median

1272.755

In [228]:
np.where(
    sampled_array < median,
    "below_both",
    np.where(sampled_array > mean, "above_both", "above_median"),
)

array(['above_median', 'below_both', 'below_both', 'below_both',
       'above_both', 'below_both', 'below_both', 'above_both',
       'below_both', 'above_median', 'above_both', 'above_both',
       'below_both', 'above_median', 'above_both', 'below_both',
       'above_both'], dtype='<U12')

In [230]:
sampled_array

array([1662.394,  447.064,  962.866, 1077.44 , 3404.531,  962.96 ,
       1089.319, 7860.031,  446.038, 1272.755, 2775.771, 2339.906,
        722.333, 1567.843, 2458.456,  673.885, 8834.15 ])