<h1>Working with numerical data</h1>
<h2>Python Lists</h2>
<p>yield_of_apples = w1 * temperature + w2 * rainfall + w3 * humidity</p>

In [1]:
w1, w2, w3 = 0.3, 0.2, 0.5
kanto_temp = 73
kanto_rainfall = 67
kanto_humidity = 43

In [2]:
kanto_yield_apples = kanto_temp * w1 + kanto_rainfall * w2 + kanto_humidity * w3
kanto_yield_apples

56.8

In [3]:
print("The expected yield of apples in Kanto region is {} tons per hectare.".format(kanto_yield_apples))

The expected yield of apples in Kanto region is 56.8 tons per hectare.


In [4]:
kanto = [73, 67, 43]
johto = [91, 88, 64]
hoenn = [87, 134, 58]
sinnoh = [102, 43, 37]
unova = [69, 96, 70]

In [5]:
weights = [w1, w2, w3]

In [6]:
def crop_yield(region, weights):
    result = 0
    for x, w in zip(region, weights):
        result += x * w
    return result

In [7]:
crop_yield(kanto, weights)

56.8

In [8]:
crop_yield(johto, weights)

76.9

<h1>Python List to Numpy Arrays</h1>

In [9]:
import numpy as np
kanto = np.array([73, 67, 43])
kanto

array([73, 67, 43])

In [10]:
weights = np.array([w1, w2, w3])
weights

array([0.3, 0.2, 0.5])

In [11]:
type(kanto)

numpy.ndarray

<h1>Operating in Numpy Arrays</h1>

In [12]:
np.dot(kanto, weights)             # now compute the dot product of the two vectors using the np.dot function

56.8

In [13]:
(kanto * weights).sum()            # The * operator performs an element-wise multiplication of two arrays (assuming they have the same size), and the sum method calcuates the sum of numbers in an array.

56.8

<h1>Benefits of using Numpy arrays</h1>
<h4>There are a couple of important benefits of using Numpy arrays instead of Python lists for operating on numerical data:</h4>

<p>Ease of use: You can write small, concise and intutive mathematical expressions like (kanto * weights).sum() rather than using loops & custom functions like crop_yeild.</p>
<p>Performance: Numpy operations and functions are implemented internally in C++, which makes them much faster than using Python statements & loops which are interpreted at runtime
Here's a quick comparision of dot products done of vectors with a million elements each using Python loops vs. Numpy arrays.</p>

In [14]:
# Python lists
arr1 = list(range(1000))
arr2 = list(range(1000, 2000))

# Numpy arrays
arr1_np = np.array(arr1)
arr2_np = np.array(arr2)

In [15]:
%%time
result = 0
for x1, x2 in zip(arr1, arr2):
    result += x1*x2
result

Wall time: 0 ns


832333500

In [16]:
%%time
np.dot(arr1_np, arr2_np)

Wall time: 0 ns


832333500

`<h1>Multi-dimensional Numpy arrays</h1>

In [17]:
climate_data = np.array([[73, 67, 43],
                         [91, 88, 64],
                         [87, 134, 58],
                         [102, 43, 37],
                         [69, 96, 70]])
climate_data

array([[ 73,  67,  43],
       [ 91,  88,  64],
       [ 87, 134,  58],
       [102,  43,  37],
       [ 69,  96,  70]])

In [18]:
# 2D array (matrix)
climate_data.shape

(5, 3)

In [19]:
# 3D array 
arr3 = np.array([
    [[11, 12, 13], 
     [13, 14, 15]], 
    [[15, 16, 17], 
     [17, 18, 19.5]]])
arr3.shape

(2, 2, 3)

In [20]:
climate_data.dtype   # To check data type of np array.

dtype('int32')

<p>We can use the np.matmul function from Numpy, or simply use the @ operator to perform matrix multiplication.</p>

In [21]:
np.matmul(climate_data, weights)

array([56.8, 76.9, 81.9, 57.7, 74.9])

In [22]:
climate_data @ weights

array([56.8, 76.9, 81.9, 57.7, 74.9])

## Working with CSV Files

In [23]:
import urllib.request

urllib.request.urlretrieve(
    'https://hub.jovian.ml/wp-content/uploads/2020/08/climate.csv', 
    'climate.txt')

('climate.txt', <http.client.HTTPMessage at 0x1d2bfb37948>)

In [24]:
climate_data = np.genfromtxt('climate.txt', delimiter=',', skip_header=1)

In [25]:
climate_data

array([[25., 76., 99.],
       [39., 65., 70.],
       [59., 45., 77.],
       ...,
       [99., 62., 58.],
       [70., 71., 91.],
       [92., 39., 76.]])

In [26]:
climate_data.shape

(10000, 3)

### We can now use a matrix mulplication operator @ to predict the yield of apples for the entire dataset using a given set of weights.

In [27]:
weights = np.array([0.3, 0.2, 0.5])

In [28]:
yields = climate_data @ weights
yields

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

In [29]:
yields.shape

(10000,)

#### We can now add the yields back to climate_data as a fourth column using the np.concatenate function.

In [30]:
climate_results = np.concatenate((climate_data, yields.reshape(10000, 1)), axis=1)

In [31]:
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

#### Let's write the final results from our computation above back to a file using the np.savetxt function.

In [32]:
np.savetxt('climate_results.txt', 
           climate_results, 
           fmt='%.2f', 
           header='temperature,rainfall,humidity,yeild_apples', 
           comments='')

## Arithmetic operations and broadcasting

In [33]:
arr2 = np.array([[1, 2, 3, 4], 
                 [5, 6, 7, 8], 
                 [9, 1, 2, 3]])

In [34]:
arr3 = np.array([[11, 12, 13, 14], 
                 [15, 16, 17, 18], 
                 [19, 11, 12, 13]])

In [35]:
# Adding a scalar
arr2 + 3

array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12,  4,  5,  6]])

In [36]:
# Element-wise subtraction
arr3 - arr2

array([[10, 10, 10, 10],
       [10, 10, 10, 10],
       [10, 10, 10, 10]])

In [37]:
# Division by scalar
arr2 / 2

array([[0.5, 1. , 1.5, 2. ],
       [2.5, 3. , 3.5, 4. ],
       [4.5, 0.5, 1. , 1.5]])

In [38]:
# Element-wise multiplication
arr2 * arr3

array([[ 11,  24,  39,  56],
       [ 75,  96, 119, 144],
       [171,  11,  24,  39]])

In [39]:
# Modulus with scalar
arr2 % 4

array([[1, 2, 3, 0],
       [1, 2, 3, 0],
       [1, 1, 2, 3]], dtype=int32)

Numpy arrays also support *brodcasting*, which allows arthmetic operations between two array having a different number of dimensions, but compatible shapes. Let's look at an example to see how it works.

In [40]:
arr2 = np.array([[1, 2, 3, 4], 
                 [5, 6, 7, 8], 
                 [9, 1, 2, 3]])
arr2.shape

(3, 4)

In [41]:
arr4 = np.array([4, 5, 6, 7])
arr4.shape

(4,)

In [42]:
arr2 + arr4

array([[ 5,  7,  9, 11],
       [ 9, 11, 13, 15],
       [13,  6,  8, 10]])

When the expression `arr2 + arr4` is evaluated, `arr4` (which has the shape `(4,)`) is replicated 3 times to match the shape `(3, 4)` of `arr2`. This is pretty useful, because numpy performs the replication without actually creating 3 copies of the smaller dimension array.

<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png" width="360">

Broadcasting only works if one of the arrays can be replicated to exactly match the shape of the other array.

In [43]:
arr1 = np.array([[1, 2, 3], [3, 4, 5]])
arr2 = np.array([[2, 2, 3], [1, 2, 5]])

In [44]:
arr1 == arr2

array([[False,  True,  True],
       [False, False,  True]])

In [45]:
arr1 != arr2

array([[ True, False, False],
       [ True,  True, False]])

In [46]:
arr1 >= arr2

array([[False,  True,  True],
       [ True,  True,  True]])

In [47]:
arr1 < arr2

array([[ True, False, False],
       [False, False, False]])

A common use case for this is to count the number of equal elements in two arrays using the sum method. Remember that True evalues to 1 and False evaluates to 0 when booleans are used in arithmetic operations.

In [48]:
(arr1 == arr2).sum()

3

## Array indexing and slicing

Numpy extends Python's list indexing notation using `[]` to multiple dimensions in a fairly intuitive fashion. You can provide a comma separated list of indices or ranges to select a specific element or a subarray (also called slice) from a numpy array.

In [49]:
arr3 = np.array([
    [[11, 12, 13, 14], 
     [13, 14, 15, 19]], 
    
    [[15, 16, 17, 21], 
     [63, 92, 36, 18]], 
    
    [[98, 32, 81, 23],      
     [17, 18, 19.5, 43]]])

arr3.shape

(3, 2, 4)

In [50]:
# Single element
arr3[1, 1, 2]

36.0

In [51]:
# Subarray using ranges
arr3[1:, 0:1, :2]

array([[[15., 16.]],

       [[98., 32.]]])

In [52]:
# Mixing indices and ranges
arr3[1:, 1, 3]

array([18., 43.])

In [53]:
# Mixing indices and ranges
arr3[1:, 1, :3]

array([[63. , 92. , 36. ],
       [17. , 18. , 19.5]])

In [54]:
# Using fewer indices
arr3[1]

array([[15., 16., 17., 21.],
       [63., 92., 36., 18.]])

In [55]:
# Using fewer indices
arr3[:2, 1]

array([[13., 14., 15., 19.],
       [63., 92., 36., 18.]])

## Other ways of creating Numpy arrays

Numpy also provides some handy functions to create arrays of a desired shape with fixed or random values. Check the out the [official documentation](https://numpy.org/doc/stable/reference/routines.array-creation.html) or use the `help` function to learn more about the following functions.

In [56]:
# All zeros
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [57]:
# All ones
np.ones([2, 2, 3])

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

In [58]:
# Identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [59]:
# Random vector
np.random.rand(5)

array([0.79057929, 0.65004457, 0.29502067, 0.91458165, 0.49064099])

In [60]:
# Random matrix
np.random.randn(2, 3)

array([[-1.25164135, -1.08177586, -2.0785208 ],
       [ 1.34529207, -1.40426674,  0.81000066]])

In [61]:
# Fixed value
np.full([2, 3], 42)

array([[42, 42, 42],
       [42, 42, 42]])

In [62]:
# Range with start, end and step
np.arange(10, 90, 3)

array([10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52, 55, 58,
       61, 64, 67, 70, 73, 76, 79, 82, 85, 88])

In [63]:
# Equally spaced numbers in a range
np.linspace(3, 27, 9)

array([ 3.,  6.,  9., 12., 15., 18., 21., 24., 27.])