### Numpy, Pandas and Matplotlib

We have a simple example, where we try to compute the yield per hectare of apples, using the values of three variables - temperature, rainfall, and humidity. The weights for each variable are stored in a separate list.

In [1]:
weights = [0.3, 0.2, 0.5]
kanto = [73,67,43]
johto = [91,88,64]
hoen = [87,134,58]
sinnoh = [102,43,37]
unova = [69,96,70]

def crop_yield(region,weights):
    result = 0
    for x,w in zip(region,weights):
        result += x*w
    return(result)

In [2]:
crop_yield(kanto,weights)

56.8

In [3]:
crop_yield(sinnoh,weights)

57.699999999999996

Note that the zip function returns pairs of tuples, with the first element of the pair taken from the first list and the second taken from the second list. In short, it computes the dot product of the two lists.

In [4]:
import numpy as np
kanto = np.array(kanto)
weights = np.array(weights)

In [5]:
type(kanto)

numpy.ndarray

In [6]:
type(weights)

numpy.ndarray

Next, we can use the np.dot function to compute the dot product of the two np arrrays we created.

In [7]:
np.dot(kanto,weights)

56.8

In [8]:
(kanto*weights).sum()

56.8

### Examples to check the performance of numpy arrays vs lists

In [9]:
arr1 = list(range(1000000))
arr2 = list(range(1000000,2000000))
arr1_np = np.array(arr1)
arr2_np = np.array(arr2)

In [10]:
%%time
result = 0
for x1,x2 in zip(arr1,arr2):
    result += x1*x2
result

Wall time: 268 ms


833332333333500000

In [11]:
%%time
np.dot(arr1_np,arr2_np)

Wall time: 2 ms


-1942957984

### Multi Deimensional Numpy Arrays

In [13]:
climate_data = np.array(
                        [[73,67,43],
                        [91,88,64],
                        [87,134,58],
                        [102,43,37],
                        [69,96,70]]
                        )

In [14]:
climate_data

array([[ 73,  67,  43],
       [ 91,  88,  64],
       [ 87, 134,  58],
       [102,  43,  37],
       [ 69,  96,  70]])

In [15]:
climate_data.shape

(5, 3)

In [16]:
climate_data.dtype

dtype('int32')

In [17]:
np.matmul(climate_data,weights)

array([56.8, 76.9, 81.9, 57.7, 74.9])

In [18]:
climate_data @ weights

array([56.8, 76.9, 81.9, 57.7, 74.9])

Note - the @ symbol stands for matrix multiplication in numpy

In [19]:
import urllib.request
urllib.request.urlretrieve('https://hub.jovian.ml/wp-content/uploads/2020/08/climate.csv','climate.txt')

('climate.txt', <http.client.HTTPMessage at 0x22676928970>)

In [21]:
climate_data = np.genfromtxt('climate.txt',delimiter=',',skip_header=1)

In [22]:
climate_data

array([[25., 76., 99.],
       [39., 65., 70.],
       [59., 45., 77.],
       ...,
       [99., 62., 58.],
       [70., 71., 91.],
       [92., 39., 76.]])

In [23]:
climate_data.shape

(10000, 3)

In [24]:
yields = climate_data @ weights

In [25]:
yields.shape

(10000,)

In [26]:
yields

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

In [28]:
yields.reshape(10000,1)

array([[72.2],
       [59.7],
       [65.2],
       ...,
       [71.1],
       [80.7],
       [73.4]])

In [31]:
climate_results=np.concatenate((climate_data,yields.reshape(10000,1)),axis=1)

In [32]:
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

Note - we use axis = 1, because we want to concatenate into columns(the first axis of np array), if wanted to concatenate into rows, we would have used axis=0 (the zeroth axis of np array).

In [33]:
np.savetxt('climate_results.txt',climate_results,fmt='%.2f',header='temperature,rainfall,humidity,yeild_apples',comments='')

In [34]:
arr1 = np.array([[1,2,3],[3,4,5]])
arr2 = np.array([[2,2,3],[1,2,5]])

In [35]:
arr1 == arr2

array([[False,  True,  True],
       [False, False,  True]])

In [36]:
arr1 > arr2

array([[False, False, False],
       [ True,  True, False]])

In [37]:
(arr1 == arr2).sum()

3

### Array Indexing and Slicing

In [39]:
arr3 = np.array([
    
    [[11,12,13,14],
    [13,14,15,19]],
    
    [[15,16,17,21],
    [63,92,36,18]],
    
    [[98,32,81,23],
    [17,18,19.5,43]]
    
])

In [40]:
arr3.shape

(3, 2, 4)

In [41]:
arr3[1,1,2]

36.0

In [42]:
arr3[1:,0:1,:2]

array([[[15., 16.]],

       [[98., 32.]]])

In [43]:
arr3[1:,0:1,:2].shape

(2, 1, 2)

In [44]:
arr3[1:,0,:2].shape

(2, 2)

In [45]:
arr3[1:,0,:2]

array([[15., 16.],
       [98., 32.]])

In [46]:
### Uniform Distribution
np.random.rand(5)

array([0.15604808, 0.3362407 , 0.79717831, 0.88391696, 0.48308924])

In [48]:
### Normal Distribution
np.random.randn(5)

array([-0.02128306, -0.4156686 , -0.38283393, -0.81530475, -0.33545442])

In [49]:
np.arange(10,90,3)

array([10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52, 55, 58,
       61, 64, 67, 70, 73, 76, 79, 82, 85, 88])

In [50]:
np.linspace(3,27,9)

array([ 3.,  6.,  9., 12., 15., 18., 21., 24., 27.])