# Introduction to Numpy

## Introduction 



Many of the libraries you will use to perform data analysis in Python, as well as many of the mathematical functions you'll use, will involve working with Numpy. Numpy (short for Numerical Python) is used for numeric computing and includes support for multi-dimensional arrays and matrices along with a variety of mathematical functions to apply to them. In this lesson, we will learn about Numpy's primary data structures and how to apply some basic math functions to them.

## Importing Numpy



In order to use Numpy, you must first import it. It is common to also **alias** it to *np* using the as keyword so that you don't have to spell out "numpy" every time you want to call one of its methods.

In [2]:
#Import numpy here
import numpy as np

Once the library has been imported, it is ready to use.

## Numpy Arrays



The basic data structures in Numpy are **arrays**, which can be used to represent tabular data. 


You can think of arrays as lists of lists, where all the elements of a list are of the same type (typically numeric since the reason you use Numpy is to do numeric computing). A matrix is just a two-dimensional array.


In [3]:
# 2D Array with ones
np.ones((10,4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [8]:
# 2D Array with zeros
np.zeros((10,4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [14]:
#Random 2d
a = np.random.random((10,4))
print(a)

[[0.01617942 0.38996265 0.97977879 0.58779777]
 [0.11216654 0.45869984 0.62367699 0.73824416]
 [0.58843779 0.82022847 0.26977629 0.12227111]
 [0.10413465 0.7642684  0.96471988 0.20501096]
 [0.43361536 0.78573036 0.33606134 0.10963201]
 [0.86130959 0.70943598 0.58564387 0.83752222]
 [0.7338895  0.71569929 0.59480651 0.09247232]
 [0.34927051 0.69189658 0.63272394 0.01007629]
 [0.09834709 0.62673641 0.22014677 0.23410377]
 [0.11161805 0.19474381 0.9619925  0.58782724]]


The **size** of an array is the total number of elements in every list.  

The **shape** of an array is the size of the array along each dimension (e.g. number of rows and number of columns for a two-dimensional array).

Let's now calculate the shape and size of the array a using the shape and size methods.

In [15]:
#print shape and size 
a.shape

(10, 4)

In [16]:
a.size

40

As you can see, the array has a shape of 10 x 4 (just as we specified) and the total number of elements in the array is 40.


Now that we have seen an example of a basic two-dimensional array (a matrix), let's learn about how creating arrays with more dimensions than two works in Numpy. Here we will make a 3D array.

In [22]:
# random 3d
b = np.random.random((5,2,3))
print(b)

[[[0.04270963 0.74065943 0.94581437]
  [0.08520703 0.68688327 0.91145746]]

 [[0.07179344 0.51410679 0.27434393]
  [0.03876315 0.27981168 0.25361252]]

 [[0.9800798  0.33290112 0.37060055]
  [0.33858149 0.807507   0.33474051]]

 [[0.3606786  0.65971935 0.25664242]
  [0.93515678 0.0441546  0.83539625]]

 [[0.57225676 0.60493996 0.68807439]
  [0.89602732 0.54404326 0.9094691 ]]]


This created an array with **five groups** of 2 x 3 matrices. 

Let's see what happens if we pass four dimensions.

In [36]:
#random 4D array with 2 groups of 3 4x5 matrices 
c = np.random.random((2,3,4,5))
print(c)

[[[[8.77127577e-01 9.06006213e-01 5.94404034e-03 5.05126487e-01
    6.63975884e-01]
   [5.85401655e-01 1.06367358e-01 6.84349167e-01 8.07114352e-01
    2.23284107e-02]
   [4.83278422e-01 3.39832498e-01 2.13732025e-02 1.47060490e-01
    4.54002336e-01]
   [6.27880114e-01 6.83334062e-01 4.22432211e-01 9.36093516e-01
    3.58235892e-01]]

  [[5.14274485e-01 5.08926226e-01 6.23301275e-01 8.05512552e-01
    4.03720371e-01]
   [8.20401518e-01 5.64292833e-01 2.77955592e-01 2.83263380e-01
    6.33626292e-01]
   [6.80555770e-01 6.27382717e-01 3.73720571e-01 3.08718499e-01
    1.63330371e-01]
   [8.02506878e-01 3.92127253e-01 3.72467700e-02 7.03875424e-01
    1.86666014e-01]]

  [[1.11872663e-01 6.42502557e-01 8.86551581e-01 2.38958542e-01
    6.04521164e-01]
   [9.72726865e-01 7.20019589e-01 8.52915317e-01 6.73690689e-01
    8.09018482e-01]
   [7.07504108e-01 9.75812034e-01 7.63639547e-01 4.84049666e-01
    3.94476451e-01]
   [3.60742955e-01 3.83568825e-01 6.54792108e-01 6.13534494e-01
    1.29

This time, we got **two groups** of *three* 4 x 5 matrices.

## Extracting Data from Arrays


Extracting elements from arrays works just like it does for other Python data structures. We just need to reference the indexes of the values we want to extract. 

Below are some examples of how to reference specific rows, columns, and values in a two dimensional array.

In [37]:
a = np.random.random((10,4))
print(a) 

[[0.74024591 0.87532375 0.9308208  0.71659891]
 [0.57529683 0.51855885 0.10431804 0.82412266]
 [0.58095258 0.90819798 0.33994561 0.01654893]
 [0.91736055 0.28269243 0.08728579 0.66642938]
 [0.43139283 0.93736825 0.41923749 0.02768283]
 [0.72219965 0.08664036 0.23643587 0.19348195]
 [0.99653653 0.02671875 0.34348874 0.30674306]
 [0.75683787 0.9982862  0.32093589 0.2966172 ]
 [0.35968893 0.60212139 0.41595256 0.28390583]
 [0.39545656 0.82457162 0.93101163 0.68809695]]


In [38]:
# First row of matrix a
print(a[0]) 

[0.74024591 0.87532375 0.9308208  0.71659891]


In [39]:
print(a)

[[0.74024591 0.87532375 0.9308208  0.71659891]
 [0.57529683 0.51855885 0.10431804 0.82412266]
 [0.58095258 0.90819798 0.33994561 0.01654893]
 [0.91736055 0.28269243 0.08728579 0.66642938]
 [0.43139283 0.93736825 0.41923749 0.02768283]
 [0.72219965 0.08664036 0.23643587 0.19348195]
 [0.99653653 0.02671875 0.34348874 0.30674306]
 [0.75683787 0.9982862  0.32093589 0.2966172 ]
 [0.35968893 0.60212139 0.41595256 0.28390583]
 [0.39545656 0.82457162 0.93101163 0.68809695]]


In [40]:
# First column of matrix a 
print(a[:,0])

[0.74024591 0.57529683 0.58095258 0.91736055 0.43139283 0.72219965
 0.99653653 0.75683787 0.35968893 0.39545656]


In [41]:
# element at row 4 (i.e. 5), column 2 (i.e. 3) in a
print(a[4,2])

0.41923748956344065


What about arrays that have more than two dimensions? You just pass a list of indexes for the values you want, and it will return the corresponding dimensions or values.

In [42]:
c = np.random.random((2,3,4,5))
print(c) 

[[[[0.08678098 0.5804784  0.08974516 0.12495143 0.81909481]
   [0.43575185 0.3071634  0.7096875  0.04146953 0.97600159]
   [0.87359699 0.89882962 0.53804867 0.03504244 0.88728242]
   [0.12550027 0.03673878 0.28677385 0.20882083 0.3661817 ]]

  [[0.6062902  0.79022307 0.72212698 0.52742105 0.0625092 ]
   [0.61547385 0.87708731 0.60127654 0.3505286  0.48938043]
   [0.56208447 0.26379103 0.47417847 0.6129672  0.27090565]
   [0.23468663 0.08183805 0.39345705 0.09835865 0.68874647]]

  [[0.97667667 0.50380886 0.03653966 0.12220725 0.7842588 ]
   [0.90049091 0.54064192 0.61279105 0.08748938 0.46362323]
   [0.50082462 0.01763192 0.57878979 0.47492718 0.90249035]
   [0.51831825 0.38156494 0.68848708 0.52933942 0.27989388]]]


 [[[0.31559141 0.02928304 0.15285002 0.2265143  0.76414572]
   [0.38699222 0.57742915 0.12761925 0.74174251 0.4236123 ]
   [0.36289553 0.76602182 0.65328314 0.96739914 0.80004701]
   [0.64566264 0.53336449 0.43013794 0.46666215 0.85683827]]

  [[0.13104401 0.19711662 0.31

In [47]:
# First group of array c
print(c[0])

[[[0.08678098 0.5804784  0.08974516 0.12495143 0.81909481]
  [0.43575185 0.3071634  0.7096875  0.04146953 0.97600159]
  [0.87359699 0.89882962 0.53804867 0.03504244 0.88728242]
  [0.12550027 0.03673878 0.28677385 0.20882083 0.3661817 ]]

 [[0.6062902  0.79022307 0.72212698 0.52742105 0.0625092 ]
  [0.61547385 0.87708731 0.60127654 0.3505286  0.48938043]
  [0.56208447 0.26379103 0.47417847 0.6129672  0.27090565]
  [0.23468663 0.08183805 0.39345705 0.09835865 0.68874647]]

 [[0.97667667 0.50380886 0.03653966 0.12220725 0.7842588 ]
  [0.90049091 0.54064192 0.61279105 0.08748938 0.46362323]
  [0.50082462 0.01763192 0.57878979 0.47492718 0.90249035]
  [0.51831825 0.38156494 0.68848708 0.52933942 0.27989388]]]


In [49]:
# Second subgroup of the first group
print(c[0][1])

[[0.6062902  0.79022307 0.72212698 0.52742105 0.0625092 ]
 [0.61547385 0.87708731 0.60127654 0.3505286  0.48938043]
 [0.56208447 0.26379103 0.47417847 0.6129672  0.27090565]
 [0.23468663 0.08183805 0.39345705 0.09835865 0.68874647]]


In [52]:
# Third row of the second subgroup
print(c[0,1,2])
#print(c[0][1][2])

[0.56208447 0.26379103 0.47417847 0.6129672  0.27090565]
[0.56208447 0.26379103 0.47417847 0.6129672  0.27090565]


In [55]:
# Fourth column of the second subgroup
print(c[0,1,:,3])

print(c[0][1][:,3])

[0.52742105 0.3505286  0.6129672  0.09835865]
[0.52742105 0.3505286  0.6129672  0.09835865]


In [56]:
# Value in the third row and fourth column of the second subgroup
print(c[0,1,2,3])
print(c[0][1][2][3])

0.6129671970045314
0.6129671970045314


## Converting other Data Structures to Arrays 



If you have data in another type of data structure and you would like to convert it to an array so that you can take advantage of Numpy's mathematical functions, you can convert them using the array() method as follows.

In [62]:
lst_lst = [(1,2,3),(4,5,6),(7,8,9), (10,12,14)] 

In [63]:
# To array
d = np.array(lst_lst)
print(d) 

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 12 14]]


This works the same way whether you have a list of lists, a list of tuples, a tuple of lists, or a tuple of tuples.

## Numpy Math Functions



Now that we know how to create and navigate arrays, let's take a look at how to perform mathematical calculations on them.

One of the most common (and useful) functions is np.sum, which lets you obtain the sum of any elements you select from an array.

In [64]:
# Sum of all elements in matrix a
np.sum(a)

20.68610794578207

Instead of taking the sum of one entire matrix, we may also take the sum over all values in one column. In that case, we'd use axis=0.  

In [65]:
# Sum of each column in matrix a
np.sum(a, axis = 0)

array([6.47596825, 6.06047958, 4.12943242, 4.02022769])

If we want to sum of the values for all values in each row, we may use axis=1. 

In [66]:
# Sum of each row in matrix a
np.sum(a, axis = 1)

array([3.26298937, 2.02229637, 1.84564511, 1.95376814, 1.8156814 ,
       1.23875782, 1.67348708, 2.37267716, 1.66166871, 2.83913678])

In [67]:
# Show picture
from IPython.display import Image 
from IPython.core.display import HTML 
Image(url= "https://i.stack.imgur.com/dcoE3.jpg")

We can also use the sum operation for arrays of higher dimensions. 

In [68]:
print(b)
print(b.shape)

[[[0.04270963 0.74065943 0.94581437]
  [0.08520703 0.68688327 0.91145746]]

 [[0.07179344 0.51410679 0.27434393]
  [0.03876315 0.27981168 0.25361252]]

 [[0.9800798  0.33290112 0.37060055]
  [0.33858149 0.807507   0.33474051]]

 [[0.3606786  0.65971935 0.25664242]
  [0.93515678 0.0441546  0.83539625]]

 [[0.57225676 0.60493996 0.68807439]
  [0.89602732 0.54404326 0.9094691 ]]]
(5, 2, 3)


In [69]:
# Print the first two (matrix) groups from b
print(b[:2])

[[[0.04270963 0.74065943 0.94581437]
  [0.08520703 0.68688327 0.91145746]]

 [[0.07179344 0.51410679 0.27434393]
  [0.03876315 0.27981168 0.25361252]]]


In [73]:
# Sum of all the elements in the first two groups of array b
print(b[:2])

np.sum(b[:2])

[[[0.04270963 0.74065943 0.94581437]
  [0.08520703 0.68688327 0.91145746]]

 [[0.07179344 0.51410679 0.27434393]
  [0.03876315 0.27981168 0.25361252]]]


4.84516270515623

The np.mean function works the same way and is also very useful.

In [74]:
# Mean of all elements in matrix a
np.mean(a)

0.5171526986445517

In [76]:
# Mean of each column in matrix a
np.mean(a, axis = 0)

array([0.64759682, 0.60604796, 0.41294324, 0.40202277])

In [77]:
# Mean of each row in matrix a
np.mean(a, axis = 1)

array([0.81574734, 0.50557409, 0.46141128, 0.48844204, 0.45392035,
       0.30968946, 0.41837177, 0.59316929, 0.41541718, 0.70978419])

In [79]:
# Mean of all the elements in the first two groups of array b
print(b[:2])

np.mean(b[:2])

[[[0.04270963 0.74065943 0.94581437]
  [0.08520703 0.68688327 0.91145746]]

 [[0.07179344 0.51410679 0.27434393]
  [0.03876315 0.27981168 0.25361252]]]


0.4037635587630191

In addition to letting you perform calculations on individual arrays, Numpy also lets you perform calculations between arrays. For example, let's select two of the subarrays from array c to illustrate how this works.

In [80]:
# Assign subarray of c to x
x = c[0,0] 
print(x)
x.shape

[[0.08678098 0.5804784  0.08974516 0.12495143 0.81909481]
 [0.43575185 0.3071634  0.7096875  0.04146953 0.97600159]
 [0.87359699 0.89882962 0.53804867 0.03504244 0.88728242]
 [0.12550027 0.03673878 0.28677385 0.20882083 0.3661817 ]]


(4, 5)

In [83]:
# Assign subarray of c to y
y = c[0,1]
print(y)
y.shape
print(x)

[[0.6062902  0.79022307 0.72212698 0.52742105 0.0625092 ]
 [0.61547385 0.87708731 0.60127654 0.3505286  0.48938043]
 [0.56208447 0.26379103 0.47417847 0.6129672  0.27090565]
 [0.23468663 0.08183805 0.39345705 0.09835865 0.68874647]]
[[0.08678098 0.5804784  0.08974516 0.12495143 0.81909481]
 [0.43575185 0.3071634  0.7096875  0.04146953 0.97600159]
 [0.87359699 0.89882962 0.53804867 0.03504244 0.88728242]
 [0.12550027 0.03673878 0.28677385 0.20882083 0.3661817 ]]


Once we know that the two arrays have the *same dimension*, we can now add, subtract, multiply, and divide the two arrays.

In [82]:
# Add elements of x and y together
np.add(x,y)

array([[0.69307118, 1.37070147, 0.81187214, 0.65237248, 0.88160401],
       [1.0512257 , 1.1842507 , 1.31096404, 0.39199813, 1.46538202],
       [1.43568146, 1.16262065, 1.01222715, 0.64800964, 1.15818807],
       [0.3601869 , 0.11857683, 0.6802309 , 0.30717949, 1.05492817]])

In [84]:
print(x + y)

[[0.69307118 1.37070147 0.81187214 0.65237248 0.88160401]
 [1.0512257  1.1842507  1.31096404 0.39199813 1.46538202]
 [1.43568146 1.16262065 1.01222715 0.64800964 1.15818807]
 [0.3601869  0.11857683 0.6802309  0.30717949 1.05492817]]


In [91]:
# Subtract elements of x from elements of y
array = np.subtract(x,y)
type(array)
print(array)
array

[[-0.51950922 -0.20974467 -0.63238182 -0.40246962  0.75658561]
 [-0.179722   -0.56992391  0.10841097 -0.30905907  0.48662116]
 [ 0.31151252  0.63503859  0.0638702  -0.57792475  0.61637677]
 [-0.10918636 -0.04509928 -0.10668319  0.11046218 -0.32256476]]


array([[-0.51950922, -0.20974467, -0.63238182, -0.40246962,  0.75658561],
       [-0.179722  , -0.56992391,  0.10841097, -0.30905907,  0.48662116],
       [ 0.31151252,  0.63503859,  0.0638702 , -0.57792475,  0.61637677],
       [-0.10918636, -0.04509928, -0.10668319,  0.11046218, -0.32256476]])

In [86]:
# Multiply elements of x and y together
np.multiply(x,y)

array([[0.05261446, 0.45870743, 0.0648074 , 0.06590201, 0.05120096],
       [0.26819387, 0.26940912, 0.42671844, 0.01453626, 0.47763608],
       [0.4910353 , 0.23710319, 0.2551311 , 0.02147987, 0.24036982],
       [0.02945323, 0.00300663, 0.11283319, 0.02053934, 0.25220635]])

In [87]:
# Divide elements of y by elements of x
np.divide(x,y)

array([[ 0.14313439,  0.73457537,  0.12427892,  0.23691021, 13.10358772],
       [ 0.7079941 ,  0.35020847,  1.18030134,  0.11830569,  1.99436171],
       [ 1.55420944,  3.40735475,  1.13469653,  0.05716855,  3.27524516],
       [ 0.53475679,  0.44892047,  0.72885682,  2.12305502,  0.531664  ]])

This is only the tip of the iceberg. Numpy has many more functions, which you can and should explore. You can read more about them in the Numpy documentation.

## Summary



In this lesson, we learnt:

- The basics of working with Numpy. 
- How to import the library so that we could use it. 
- What Numpy arrays are, including how to extract data from them and how to convert to them from other Python data structures. 
- How to use basic mathematical functions in Numpy and apply them both to the elements of an array as well as to whole arrays themselves.