# Introduction to numpy

In [1]:
import numpy as np

## Datatypes & Attributes

Numpy's main datatype is ndarray:

In [2]:
a1 = np.array([1, 2, 3])
a1

array([1, 2, 3])

In [3]:
type(a1)

numpy.ndarray

In [4]:
a2 = np.array([[1, 2.0, 3.3],
              [4, 5, 6.5]])

a3 = np.array([[[1, 2, 3],
               [4, 5, 6],
               [7, 8, 9]],
              [[10, 11, 12],
              [13, 14, 15],
              [16, 17, 18]]])

In [5]:
a2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [6]:
a3

array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]]])

### .shape

In [7]:
a1.shape

(3,)

In [8]:
a2.shape

(2, 3)

In [9]:
a3.shape

(2, 3, 3)

### .ndim

In [10]:
a1.ndim, a2.ndim, a3.ndim

(1, 2, 3)

### .dtype

In [11]:
a1.dtype, a2.dtype, a3.dtype

(dtype('int32'), dtype('float64'), dtype('int32'))

### .size

In [12]:
a1.size, a2.size, a3.size

(3, 6, 18)

### type()

In [13]:
type(a1), type(a2), type(a3)

(numpy.ndarray, numpy.ndarray, numpy.ndarray)

In [14]:
# Creating a dataframe in pandas from a numpy array
import pandas as pd

df = pd.DataFrame(a2)
df

Unnamed: 0,0,1,2
0,1.0,2.0,3.3
1,4.0,5.0,6.5


## Creating arrays with functions

### .ones()

In [15]:
ones_array = np.ones((2, 3))
ones_array

array([[1., 1., 1.],
       [1., 1., 1.]])

The default data type for the ones is float. we can change that as below:

In [16]:
ones_array = np.ones((2, 3), dtype=int)
ones_array

array([[1, 1, 1],
       [1, 1, 1]])

### .zeros()

In [17]:
zeros_array = np.zeros((2, 3))
zeros_array

array([[0., 0., 0.],
       [0., 0., 0.]])

The default data type for the ones is float. we can change that as below:

In [18]:
zeros_array = np.zeros((2, 3), dtype=int)
zeros_array

array([[0, 0, 0],
       [0, 0, 0]])

In [19]:
type(ones_array), type(zeros_array)

(numpy.ndarray, numpy.ndarray)

In [20]:
ones_array.dtype, zeros_array.dtype

(dtype('int32'), dtype('int32'))

### .arange(start, stop, step, dtype)

In [21]:
range_array = np.arange(0, 10, 2)
range_array

array([0, 2, 4, 6, 8])

The default data type of .arange() is integer. we can change that as below:

In [22]:
range_array = np.arange(0, 10, 2, dtype=float)
range_array

array([0., 2., 4., 6., 8.])

## Creating random arrays

### .randint()

In [23]:
random_array = np.random.randint(0, 10, (3, 5))
random_array

array([[0, 4, 6, 9, 4],
       [7, 9, 2, 5, 2],
       [8, 8, 5, 1, 7]])

In [24]:
random_array.shape

(3, 5)

In [25]:
random_array.size

15

### .random()

In [26]:
random_array2 = np.random.random((5, 3))
random_array2

array([[0.44550604, 0.18908549, 0.5600484 ],
       [0.05966251, 0.89511763, 0.87908442],
       [0.71944542, 0.78416148, 0.16952822],
       [0.61193255, 0.91126739, 0.50605184],
       [0.07454646, 0.72070287, 0.72813224]])

### .rand()

In [27]:
random_array3 = np.random.rand(5, 3)
random_array3

array([[0.77389128, 0.31606112, 0.33779066],
       [0.1443327 , 0.07061009, 0.31472592],
       [0.37916413, 0.54597109, 0.39137425],
       [0.07118915, 0.72399012, 0.5617182 ],
       [0.09369905, 0.68012878, 0.74209052]])

## Pseudo-random numbers

if we run the code below, it's going to create different random arrays everytime:

In [28]:
random_array4 = np.random.randint(10, size=(5, 3))
random_array4

array([[1, 3, 5],
       [5, 9, 3],
       [2, 9, 9],
       [4, 5, 4],
       [5, 4, 5]])

Sometimes we want to avoid this and create similar results after running the code for some other developer to get the same results as we do. To do so, we can use .seed() function.

In [29]:
np.random.seed(0)
random_array5 = np.random.randint(10, size=(5, 3))
random_array5

array([[5, 0, 3],
       [3, 7, 9],
       [3, 5, 2],
       [4, 7, 6],
       [8, 8, 1]])

In [30]:
np.random.seed(1)
random_array6 = np.random.randint(10, size=(5, 3))
random_array6

array([[5, 8, 9],
       [5, 0, 0],
       [1, 7, 6],
       [9, 2, 4],
       [5, 2, 4]])

In [31]:
np.random.seed(2)
random_array7 = np.random.randint(10, size=(5, 3))
random_array7

array([[8, 8, 6],
       [2, 8, 7],
       [2, 1, 5],
       [4, 4, 5],
       [7, 3, 6]])

The numbers inside the .seed() function act like a variable and set the same results for random arrays if used before creating them in a cell:

In [32]:
np.random.seed(2)
random_array8 = np.random.randint(10, size=(5, 3))
random_array8

array([[8, 8, 6],
       [2, 8, 7],
       [2, 1, 5],
       [4, 4, 5],
       [7, 3, 6]])

They also work with other random functions too:

In [33]:
np.random.seed(3)
random_array9 = np.random.random(size=(5, 3))
random_array9

array([[0.5507979 , 0.70814782, 0.29090474],
       [0.51082761, 0.89294695, 0.89629309],
       [0.12558531, 0.20724288, 0.0514672 ],
       [0.44080984, 0.02987621, 0.45683322],
       [0.64914405, 0.27848728, 0.6762549 ]])

In [34]:
np.random.seed(3)
random_array10 = np.random.random(size=(5, 3))
random_array10

array([[0.5507979 , 0.70814782, 0.29090474],
       [0.51082761, 0.89294695, 0.89629309],
       [0.12558531, 0.20724288, 0.0514672 ],
       [0.44080984, 0.02987621, 0.45683322],
       [0.64914405, 0.27848728, 0.6762549 ]])

### Getting the unique items of an array

In [35]:
random_array4

array([[1, 3, 5],
       [5, 9, 3],
       [2, 9, 9],
       [4, 5, 4],
       [5, 4, 5]])

In [36]:
np.unique(random_array4)

array([1, 2, 3, 4, 5, 9])

## Viewing arrays and matrices

In [37]:
a1

array([1, 2, 3])

In [38]:
a2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [39]:
a3

array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]]])

In [40]:
a1[0]

1

In [41]:
a2[0]

array([1. , 2. , 3.3])

In [42]:
a3[0]

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Showing the first two items of each dimention:

In [43]:
a3[:2, :2, :2]

array([[[ 1,  2],
        [ 4,  5]],

       [[10, 11],
        [13, 14]]])

Creating a 4 dimentional array:

In [44]:
a4 = np.random.randint(10, size=(2, 3, 4, 5))
a4

array([[[[2, 1, 3, 5, 8],
         [1, 8, 7, 8, 1],
         [0, 5, 4, 1, 5],
         [4, 7, 6, 0, 0]],

        [[9, 2, 4, 5, 8],
         [8, 7, 5, 1, 1],
         [1, 5, 5, 7, 4],
         [3, 0, 0, 0, 0]],

        [[2, 2, 7, 0, 5],
         [0, 1, 4, 1, 2],
         [2, 4, 8, 0, 6],
         [0, 4, 1, 5, 1]]],


       [[[8, 8, 7, 0, 0],
         [9, 1, 7, 8, 7],
         [4, 0, 0, 4, 3],
         [0, 8, 2, 7, 2]],

        [[1, 3, 2, 4, 1],
         [2, 2, 7, 3, 4],
         [1, 6, 7, 9, 1],
         [0, 0, 5, 8, 4]],

        [[8, 8, 3, 4, 9],
         [2, 5, 4, 7, 9],
         [1, 9, 0, 7, 4],
         [8, 8, 4, 1, 4]]]])

In [45]:
a4.shape

(2, 3, 4, 5)

In [46]:
a4.ndim

4

Getting the first 4 numbers of the inner most arrays:

In [47]:
a4[:, :, :, :4]

array([[[[2, 1, 3, 5],
         [1, 8, 7, 8],
         [0, 5, 4, 1],
         [4, 7, 6, 0]],

        [[9, 2, 4, 5],
         [8, 7, 5, 1],
         [1, 5, 5, 7],
         [3, 0, 0, 0]],

        [[2, 2, 7, 0],
         [0, 1, 4, 1],
         [2, 4, 8, 0],
         [0, 4, 1, 5]]],


       [[[8, 8, 7, 0],
         [9, 1, 7, 8],
         [4, 0, 0, 4],
         [0, 8, 2, 7]],

        [[1, 3, 2, 4],
         [2, 2, 7, 3],
         [1, 6, 7, 9],
         [0, 0, 5, 8]],

        [[8, 8, 3, 4],
         [2, 5, 4, 7],
         [1, 9, 0, 7],
         [8, 8, 4, 1]]]])

## Manipulating and comparing arrays

### Arithmetic

In [48]:
a1

array([1, 2, 3])

In [49]:
ones_array = np.ones(3)
ones_array

array([1., 1., 1.])

In [50]:
a1 + ones_array

array([2., 3., 4.])

In [51]:
a1 - ones_array

array([0., 1., 2.])

In [52]:
a1

array([1, 2, 3])

In [53]:
a2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [54]:
a1 + a2

array([[2. , 4. , 6.3],
       [5. , 7. , 9.5]])

In [55]:
a2 - a1

array([[0. , 0. , 0.3],
       [3. , 3. , 3.5]])

In [56]:
a1 * ones_array

array([1., 2., 3.])

In [57]:
a1 * a2

array([[ 1. ,  4. ,  9.9],
       [ 4. , 10. , 19.5]])

In [58]:
a3

array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]]])

In [59]:
# How can you reshape a2 to be compatible with a3?
a2 * a3

ValueError: operands could not be broadcast together with shapes (2,3) (2,3,3) 

In [60]:
a1 / ones_array

array([1., 2., 3.])

In [61]:
a2 / a1

array([[1.        , 1.        , 1.1       ],
       [4.        , 2.5       , 2.16666667]])

In [62]:
a2 // a1

array([[1., 1., 1.],
       [4., 2., 2.]])

In [63]:
a2 ** 2

array([[ 1.  ,  4.  , 10.89],
       [16.  , 25.  , 42.25]])

In [64]:
a2 % 2

array([[1. , 0. , 1.3],
       [0. , 1. , 0.5]])

we can use numpy functions for doing the same operations above:

In [65]:
np.add(a1, a2)

array([[2. , 4. , 6.3],
       [5. , 7. , 9.5]])

In [66]:
np.square(a2)

array([[ 1.  ,  4.  , 10.89],
       [16.  , 25.  , 42.25]])

In [67]:
np.exp(a1)

array([ 2.71828183,  7.3890561 , 20.08553692])

In [68]:
np.log(a1)

array([0.        , 0.69314718, 1.09861229])

### Aggregation
Aggregation = performing the same operation on a number of things

In [69]:
# A python list
python_list = [1, 2, 3]
python_list

[1, 2, 3]

In [70]:
# A numpy array
a1

array([1, 2, 3])

In [71]:
sum(python_list), sum(a1)

(6, 6)

In [72]:
np.sum(python_list), np.sum(a1)

(6, 6)

What is the difference of python .sum() and numpy np.sum()?

We should use python's methods on python datatypes and use numpy's methods on numpy arrays.

The benefit of numpy functions are their faster speed and lower run time.

We can test the run time with %timeit function as below:

In [73]:
# Let's create a massive array at first
massive_array = np.random.random(100000)

# Viewing the first 10 numbers:
massive_array[:10]

array([0.45527936, 0.21798577, 0.17721338, 0.07362367, 0.89239319,
       0.64017662, 0.14333232, 0.41412692, 0.04910892, 0.20937335])

In [76]:
%timeit sum(massive_array) # Python's sum()
%timeit np.sum(massive_array) #Numpy's sum()

7.63 ms ± 62.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
41.5 µs ± 260 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


Python's sum() took 7.63 ms, while Numpy's sum() took only 0.0415 ms.

In [77]:
7.63 / 0.0415

183.85542168674698

Numpy's sum() acted about 183 times faster than python's sum in this operation.

### .mean()

In [78]:
a2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [79]:
np.mean(a2)

3.6333333333333333

### .min() & .max()

In [80]:
np.min(a2)

1.0

In [81]:
np.max(a2)

6.5

### .std() & .var() "Standard deviation and variance"

In [82]:
np.std(a2)

1.8226964152656422

In [83]:
np.var(a2)

3.3222222222222224

In [85]:
# Standard deviation = squareroot of variance
np.sqrt(np.var(a2)) == np.std(a2)

True