Working with numerical data

Let's say we want to use climate data like the temperature, rainfall and humidity in a region to determine if the region is well suited for growing apples. a really simple approach for doing this would be to formulate the relationship between the annual yield of apples (tons per hectare) and the climatic conditions like the average temperature (in degrees Farenheit), rainfall (in millimeters) & average relative humidity (in percentage) as a linear equation

Yield_of_apples = w1 * temperature + w2 * rainfall + w3

In [1]:
kanto = [73, 67, 43]
johto = [91, 88, 64]
hownn = [87, 134, 58]
sinnoh = [102, 43, 37]
unova = [69, 96, 70]

# the tree numbers in each vector represent the temperature, rainfall , humidity respectively
w1 = 0.3
w2 = 0.2
w3 = 0.5

weights = [w1, w2, w3]

# we can now write a function crop_yield to calculate the yield of apples given the climate data and the respective weights

def crop_yield(region, weights):
    result = 0
    for x,y in zip(region, weights):
        result += x*y
    return result



In [2]:
print(crop_yield(kanto, weights))
print(crop_yield(johto, weights))
print(crop_yield(hownn, weights))
print(crop_yield(sinnoh, weights))
print(crop_yield(unova, weights))

56.8
76.9
81.9
57.699999999999996
74.9


Going from python list to Numpy arrays

the numpy library provides a built-in function to perform the dot product of two vectors. however, the lists must first be converted to numpy arrays before we can  perform the operation. To begin, let's import the numpy module. It is common practice to import numpy with the alias np.

In [3]:
import numpy as np

In [4]:
# numpy arrays can be created using the (np.array) function

kanto = np.array([73, 67, 43])

weights = np.array([w1, w2, w3])

type(kanto)
type(weights)

numpy.ndarray

In [5]:
# operation on Numpy arrays
# this line of code does exactly the same thing as copy_yield function
np.dot(kanto, weights)


56.8

Multi-dimensional Numpy arrays

We can now go one step further, and represent the climate data for all the regions together using a single 2-dimensional Numpy array

In [6]:
climate_data = np.array([[73, 67, 43],
                         [91, 88, 64],
                         [87, 134, 58],
                         [102, 43, 37],
                         [69, 96, 70]])

climate_data

array([[ 73,  67,  43],
       [ 91,  88,  64],
       [ 87, 134,  58],
       [102,  43,  37],
       [ 69,  96,  70]])

In [7]:
# numpy arrays can have any numbers of dimensions, and different lenghts along each dimension.
# we can inspect the lenght along each dimension using the .shape property of an array

# 2D array (matrix)
print(climate_data.shape)

# 1D array (matrix)
print(weights.shape)

(5, 3)
(3,)


In [8]:
# we can use the np.matmul function from numpy, or simply use the @ operator to perform matrix multiplication


matmul_function = np.matmul(climate_data, weights)

print(matmul_function)

operator = climate_data @ weights

print(operator)

[56.8 76.9 81.9 57.7 74.9]
[56.8 76.9 81.9 57.7 74.9]


Arithmetic operations and broadcasting

Numpy arrays support arithmetic operators like +, -, * etc. You can perform an arithmetic operation with a sigle number (also called scalar), or with another array of the same shape. This makes it really easy to write mathemtical expressions with multi-dimensional arrays

In [15]:
arr2 = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 1, 2, 3]])

arr3 = np.array([[11, 12, 13, 14],
                [15, 16, 17, 18],
                [19, 11, 12, 13]])


# adding a scalar
print(f'''Operation scalar example: 
{arr2 + 3}''')

# element-wise subtraction
print(f'''Operation with another array of the same shape: 
{arr2 - arr3}''')

# numpy arrays also support broadcasting, which allows arthmetic operations 
# between two arrays having a different number of dimensios, but compatible 
# shapes

print(arr2.shape)

arr4 = np.array([4, 5, 6, 7])
print(arr4.shape)

print(f'''Broadcasting: 
{arr2 + arr4}''')

# this works because numpy performs the replication without actually creating 
# 3 copies of the smaller dimension array.
# arr4 = np.array([[4, 5, 6, 7],
#                  [4, 5, 6, 7],
#                  [4, 5, 6, 7],])

Operation scalar example: 
[[ 4  5  6  7]
 [ 8  9 10 11]
 [12  4  5  6]]
Operation with another array of the same shape: 
[[-10 -10 -10 -10]
 [-10 -10 -10 -10]
 [-10 -10 -10 -10]]
(3, 4)
(4,)
Broadcasting: 
[[ 5  7  9 11]
 [ 9 11 13 15]
 [13  6  8 10]]
