# Numpy Basics

Welcome to section of Numpy and Pandas. This is the most used Python libraries for data science. NumPy consists of a powerful data structure called multidimensional arrays. Pandas is another powerful Python library that provides fast and easy data analysis platform.

NumPy is a library written for scientific computing and data analysis. It stands for numerical python and also known as array oriented computing.

The most basic object in NumPy is the ndarray, or simply an array which is an n-dimensional, homogeneous array. By homogenous, we mean that all the elements in a NumPy array have to be of the same data type, which is commonly numeric (float or integer).


 # Why Numpy?
 convenience & speed
 
 Numpy is much faster than the standard python ways to do computations.
 
Vectorised code typically does not contain explicit looping and indexing etc. (all of this happens behind the scenes, in precompiled C-code), and thus it is much more concise.

Also, many Numpy operations are implemented in C which is basically being executed behind the scenes, avoiding the general cost of loops in Python, pointer indirection and per-element dynamic type checking. The speed boost depends on which operations you're performing.
 
 NumPy arrays are more compact than lists, i.e. they take much lesser storage space than lists

In [4]:
!pip install numpy



You should consider upgrading via the 'python -m pip install --upgrade pip' command.


In [5]:
import numpy

In [2]:
import numpy as np

In [7]:
arr = np.array([1, 2, 3])

In [9]:
print(arr)

array([1, 2, 3])

In [12]:
b = np.array([[1, 2, 3], [4, 5, 6], [6, 7, 8]])

In [13]:
b

array([[1, 2, 3],
       [4, 5, 6],
       [6, 7, 8]])

In [14]:
b.shape

(3, 3)

In [15]:
arr.shape

(3,)

In [16]:
b.dtype

dtype('int32')

In [17]:
arr.dtype

dtype('int32')

In [19]:
a = [1, 2, 3]

In [20]:
type(a)

list

In [21]:
print(type(arr))

<class 'numpy.ndarray'>


In [22]:
print(type(b))

<class 'numpy.ndarray'>


In [27]:
lst_1 = [i for i in range(10)]
print(lst_1)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [31]:
lst_2 = [i+2 for i in lst_1]
lst_2

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

In [30]:
sqr = [item**2 for item in range(11)]
sqr

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [25]:
np.arange(11)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

# Performance measurement
I mentioned that the key advantages of numpy are convenience and speed of computation.

You'll often work with extremely large datasets, and thus it is important point for you to understand how much computation time (and memory) you can save using numpy, compared to standard python lists.

In [32]:
c = range(10000)
%timeit [i**3 for i in c]

6.47 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [33]:
c_numpy = np.arange(10000)
%timeit c_numpy**3

36.7 µs ± 6.31 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Still not convinced? want to see one more intresting example

In [34]:
l1 = range(10000)
l2 = [i**2 for i in range(10000)]

In [35]:
#multiplying two lists elementwise
%timeit list(map(lambda x, y: x*y, l1, l2))

2.56 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [36]:
a1 = np.array(l1)
b1 = np.array(l2)

In [37]:
%timeit a1*b1

14.9 µs ± 1.91 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


so I can do everything without even writing a loop? yes... ohh wao

# Creating Numpy array

There are multiple ways to create numpy array. Lets walk over them

In [39]:
np.arange(1, 11)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [40]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [42]:
np.arange(0, 10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [51]:
np.arange(234, 456, 7)

array([234, 241, 248, 255, 262, 269, 276, 283, 290, 297, 304, 311, 318,
       325, 332, 339, 346, 353, 360, 367, 374, 381, 388, 395, 402, 409,
       416, 423, 430, 437, 444, 451])

In [52]:
np.arange(20)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [58]:
np.arange(500, 100, -100)

array([500, 400, 300, 200])

In [53]:
np.arange(0, 22, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [54]:
np.arange(2,12,2)

array([ 2,  4,  6,  8, 10])

In [61]:
np.zeros((3,3), dtype=np.int32)

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [62]:
np.zeros(1, dtype=np.int32)

array([0])

In [63]:
np.zeros((1, 1), dtype=np.int32)

array([[0]])

In [64]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [65]:
np.zeros((5,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [3]:
a = np.array([[2, 3], [2, 4], [8, 9]])

In [4]:
a

array([[2, 3],
       [2, 4],
       [8, 9]])

In [5]:
a.shape

(3, 2)

In [6]:
np.zeros_like(a)

array([[0, 0],
       [0, 0],
       [0, 0]])

In [73]:
np.ones((5,2), dtype=np.int32)

array([[1, 1],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1]])

In [74]:
np.eye(1, dtype=np.int32)

array([[1]])

In [75]:
np.eye(2)

array([[1., 0.],
       [0., 1.]])

In [9]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [10]:
np.eye(10)

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

In [11]:
np.full((5, 3), 3.3)

array([[3.3, 3.3, 3.3],
       [3.3, 3.3, 3.3],
       [3.3, 3.3, 3.3],
       [3.3, 3.3, 3.3],
       [3.3, 3.3, 3.3]])

In [81]:
np.full((3,3),2.2, dtype= np.int32)

array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

In [84]:
np.diag([2, 5, 68, 78])

array([[ 2,  0,  0,  0],
       [ 0,  5,  0,  0],
       [ 0,  0, 68,  0],
       [ 0,  0,  0, 78]])

In [85]:
x = np.diag(np.arange(1, 101))

In [86]:
x

array([[  1,   0,   0, ...,   0,   0,   0],
       [  0,   2,   0, ...,   0,   0,   0],
       [  0,   0,   3, ...,   0,   0,   0],
       ...,
       [  0,   0,   0, ...,  98,   0,   0],
       [  0,   0,   0, ...,   0,  99,   0],
       [  0,   0,   0, ...,   0,   0, 100]])

In [None]:
# Vertically tile
[[1, 2, 3] 
[1, 2, 3] 
[1, 2, 3]]

# Horizontaly tile
[1, 2, 3, 1, 2, 3, 1, 2, 3]

In [15]:
v = np.array([1,2,3])
print(v)
np.tile(v,(3, 3)) 

[1 2 3]


array([[1, 2, 3, 1, 2, 3, 1, 2, 3],
       [1, 2, 3, 1, 2, 3, 1, 2, 3],
       [1, 2, 3, 1, 2, 3, 1, 2, 3]])

In [18]:
x1 = np.array([[1, 2, 3, 4], [5, 6, 7,8 ]])
np.tile(x1, (2, 2))

array([[1, 2, 3, 4, 1, 2, 3, 4],
       [5, 6, 7, 8, 5, 6, 7, 8],
       [1, 2, 3, 4, 1, 2, 3, 4],
       [5, 6, 7, 8, 5, 6, 7, 8]])

In [36]:
np.random.random()

0.929770621274946

In [117]:
np.random.seed(123)
print(np.random.random())
print(np.random.random())
print(np.random.random())
print(np.random.random())
print(np.random.random())
print(np.random.random())

0.6964691855978616
0.28613933495037946
0.2268514535642031
0.5513147690828912
0.7194689697855631
0.42310646012446096


In [124]:
np.random.seed(53833)
print(np.random.random())
print(np.random.random())
print(np.random.random())

0.019703956740039552
0.3210059265730163
0.5240905815505337


In [130]:
x = np.random.random((4, 4))

In [131]:
x

array([[0.20225276, 0.69226061, 0.72378963, 0.4193708 ],
       [0.37456229, 0.80241098, 0.43565871, 0.98762835],
       [0.89022798, 0.35337826, 0.56293712, 0.1326371 ],
       [0.56955352, 0.5806572 , 0.85541562, 0.17058096]])

In [132]:
rounded = np.round(x, 2)

In [133]:
rounded

array([[0.2 , 0.69, 0.72, 0.42],
       [0.37, 0.8 , 0.44, 0.99],
       [0.89, 0.35, 0.56, 0.13],
       [0.57, 0.58, 0.86, 0.17]])

In [134]:
new = rounded * 100

In [135]:
new

array([[20., 69., 72., 42.],
       [37., 80., 44., 99.],
       [89., 35., 56., 13.],
       [57., 58., 86., 17.]])

In [136]:
intt = new.astype(np.int32)
intt

array([[20, 69, 72, 42],
       [37, 80, 44, 99],
       [89, 35, 56, 13],
       [56, 57, 86, 17]])

In [137]:
intt.astype(np.float32)

array([[20., 69., 72., 42.],
       [37., 80., 44., 99.],
       [89., 35., 56., 13.],
       [56., 57., 86., 17.]], dtype=float32)

In [138]:
a = np.arange(10)
a.dtype

dtype('int32')

In [139]:
#memory used by each array element in bytes
a.itemsize

4

In [140]:
x = np.arange(24)

In [141]:
x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

In [172]:
x.reshape(-1, 8)

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23]])

In [142]:
z = np.array([[1, 2, 3, 4],
         [5, 6, 7, 8],
         [9, 10, 11, 12]])
z.shape

(3, 4)

In [143]:
#  -1 is an unknown dimension to be figured out by numpy
# Flattening :- 
z.reshape(-1)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [146]:
# Now trying to reshape with (-1) . Result new shape is (12,) and is compatible with original shape (3,4)
# So we get result new shape as (12, 1).again compatible with original shape(3,4)
z.reshape(-1, 1)

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12]])

In [176]:
# Now trying to reshape with (-1, 1) . We have provided column as 1 but rows as unknown . 
z.reshape(-1,1)

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12]])

In [177]:
# New shape as (-1, 2). row unknown, column 2. 
z.reshape(-1, 2)

array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10],
       [11, 12]])

In [178]:
# Now trying to keep column as unknown. 
# New shape as (1,-1). i.e, row is 1, column unknown. 
z.reshape(1,-1)

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])

In [179]:
# New shape (2, -1). Row 2, column unknown. 
z.reshape(2, -1)

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [180]:
# New shape as (3, -1). Row 3, column unknown. 
z.reshape(3, -1)

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [192]:
np.zeros((4, 3, 2))

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

In [193]:
# -1 will automatically adjust dimention
np.arange(18).reshape(2,3,-1)

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]]])

# accessing Numpy array element

In [194]:
a = np.array([2,4,6,8,10,12,14,16])

In [195]:
a

array([ 2,  4,  6,  8, 10, 12, 14, 16])

In [196]:
a[2]

6

In [197]:
a[[2,4,6]]

array([ 6, 10, 14])

In [198]:
a[2:]

array([ 6,  8, 10, 12, 14, 16])

In [199]:
a[2:5]

array([ 6,  8, 10])

In [200]:
a[0::2]

array([ 2,  6, 10, 14])

Lets check the same for 2 D array

In [201]:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [202]:
a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [204]:
a[2,2]

9

In [205]:
a > 2

array([[False, False,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [206]:
a[a > 2]

array([3, 4, 5, 6, 7, 8, 9])

In [208]:
a[(a > 2) & (a < 5)]

array([3, 4])

# subset of numpy array

In [216]:
a = np.arange(10)

In [217]:
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [218]:
b = a

In [219]:
b

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [220]:
b[0] = 11

In [221]:
b

array([11,  1,  2,  3,  4,  5,  6,  7,  8,  9])

In [222]:
# Notice a is also changed
a

array([11,  1,  2,  3,  4,  5,  6,  7,  8,  9])

In [223]:
np.shares_memory(a,b)

True

In [226]:
a = np.arange(10)

In [227]:
b = a.copy()

In [228]:
b[0] = 11

In [231]:
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [233]:
np.shares_memory(a,b)

False

# More operations

In [234]:
a = np.array([[1,2,3],[4,5,6]])

In [235]:
a

array([[1, 2, 3],
       [4, 5, 6]])

In [236]:
a.T

array([[1, 4],
       [2, 5],
       [3, 6]])

In [237]:
b = np.array([[7,8,9],[10,11,12]])

In [238]:
a

array([[1, 2, 3],
       [4, 5, 6]])

In [239]:
b

array([[ 7,  8,  9],
       [10, 11, 12]])

In [241]:
np.vstack((a,b))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [242]:
np.hstack((a,b))

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

# MAthmatical operation

In [243]:
a = np.arange(1,10)

In [244]:
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [245]:
np.sin(a)

array([ 0.84147098,  0.90929743,  0.14112001, -0.7568025 , -0.95892427,
       -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

In [246]:
np.cos(a)

array([ 0.54030231, -0.41614684, -0.9899925 , -0.65364362,  0.28366219,
        0.96017029,  0.75390225, -0.14550003, -0.91113026])

In [247]:
np.exp(a)

array([2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01,
       1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03,
       8.10308393e+03])

In [248]:
np.sum(a)

45

In [249]:
np.median(a)

5.0

In [250]:
a.std()

2.581988897471611

In [251]:
a = np.arange(1,10).reshape(3,3)

In [252]:
a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [253]:
np.linalg.det(a)

0.0

In [254]:
np.linalg.inv(a)

LinAlgError: Singular matrix

In [255]:
np.linalg.eig(a)

(array([ 1.61168440e+01, -1.11684397e+00, -1.30367773e-15]),
 array([[-0.23197069, -0.78583024,  0.40824829],
        [-0.52532209, -0.08675134, -0.81649658],
        [-0.8186735 ,  0.61232756,  0.40824829]]))

In [256]:
a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [257]:
b = a.T

In [258]:
b

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [259]:
np.dot(a,b)


array([[ 14,  32,  50],
       [ 32,  77, 122],
       [ 50, 122, 194]])

In [260]:
a = np.array([1,1,0], dtype = bool)
b = np.array([1,0,1], dtype = bool)


In [263]:
np.logical_or(a,b)

array([ True,  True,  True])

In [262]:
np.logical_and(a,b)

array([ True, False, False])

In [264]:
np.all(a == a)

True

In [265]:
a = np.array([[1,2],[3,4]])

In [266]:
a


array([[1, 2],
       [3, 4]])

In [267]:
a.sum()

10

In [268]:
a.sum(axis=0)

array([4, 6])

In [269]:
a.sum(axis=1)

array([3, 7])

In [270]:
a.max()

4

In [272]:
a.argmax()

3

In [273]:
a

array([[1, 2],
       [3, 4]])

In [274]:
a.shape

(2, 2)

In [275]:
a[:,np.newaxis].shape # adds a new axis -> 2D

(2, 1, 2)

In [276]:
np.sort(a)

array([[1, 2],
       [3, 4]])

In [277]:
np.argsort(a)

array([[0, 1],
       [0, 1]], dtype=int64)

In [278]:
a = np.array([14, 5, 4, 2])

In [279]:
np.argsort(a)

array([3, 2, 1, 0], dtype=int64)

In [280]:
a

array([14,  5,  4,  2])

In [287]:
tf = a == 5

In [288]:
tf

array([False,  True, False, False])