# Numpy
NumPy (acronym for 'Numerical Python' or 'Numeric Python') is one of the most essential package for speedy mathematical computation on arrays and matrices in Python. It is also quite useful while dealing with multi-dimensional data. It is a blessing for integrating C, C++ and FORTRAN tools. It also provides numerous functions for Fourier transform (FT) and linear algebra.




### Why NumPy instead of lists?

One might think of why one should prefer arrays in NumPy instead we can create lists having the same data type. If this statement also rings a bell then the following reasons may convince you:


1-Numpy arrays have contiguous memory allocation. Thus if a same array stored as list will require more space as compared to arrays.

2-They are more speedy to work with and hence are more efficient than the lists.



3-They are more convenient to deal with.



## NumPy vs. Pandas

Pandas is built on top of NumPy. In other words, Numpy is required by pandas to make it work. So Pandas is not an alternative to Numpy. Instead pandas offers additional method or provides more streamlined way of working with numerical and tabular data in Python.

Importing numpy 
Firstly you need to import the numpy library. Importing numpy can be done by running the following command:


In [55]:
import numpy as np

It is a general approach to import numpy with alias as 'np'. If alias is not provided then to access the functions from numpy we shall write numpy.function. To make it easier an alias 'np' is introduced so we can write np.function. Some of the common functions of numpy are listed below -


## Functions:	Tasks

array	     :Create numpy array

ndim	      :Dimension of the array

shape	      :Size of the array (Number of rows and Columns)

size	       :Total number of elements in the array

dtype	       :Type of elements in the array, i.e., int64, character

reshape	        : Reshapes the array without changing the original shape

resize	         :Reshapes the array. Also change the original shape

arrange	          :Create sequence of numbers in array

Itemsize	     :Size in bytes of each item

diag	        :Create a diagonal matrix

vstack       :	Stacking vertically

hstack	      :Stacking horizontally

# 1D array
 Using numpy an array is created by using np.array:

In [57]:

a=np.array([1,2,3])
print(a)


[1 2 3]


 Absence of square bracket introduces an error. To print the array we can use print(a).

In [3]:
# Because of Python's dynamic typing, we can even create heterogeneous lists:
L3 = [True, "2", 3.0, 4]
print(L3)

[True, '2', 3.0, 4]


# Changing the datatype

np.array( ) has an additional parameter of dtype through which one can define whether the elements are integers or floating points or complex numbers.

In [60]:
a.dtype
a = np.array([15,25,14,78,96],dtype = "float")
print(a)
a.dtype

[ 15.  25.  14.  78.  96.]


dtype('float64')

Initially datatype of 'a' was 'int32' which on modifying becomes 'float64'.

1- int32 refers to number without a decimal point. '32' means number can be in between -2147483648 and 2147483647. 

Similarly, int16 implies number can be in range -32768 to 32767

2-float64 refers to number with decimal place.

In [4]:
# Remember that unlike Python lists, NumPy is constrained to arrays that all contain the same type.
# If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):
np.array([3.14, 4, 2, 3])
# If we want to explicitly set the data type of the resulting array, we can use the dtype keyword:

np.array([1, 2, 3, 4], dtype='float32')

array([ 1.,  2.,  3.,  4.], dtype=float32)

# Creating the sequence of numbers
If you want to create a sequence of numbers then using np.arange we can get our sequence. To get the sequence of numbers from 20 to 29 we run the following comman

In [61]:
b = np.arange(start = 20,stop = 30)  
b

array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

 In np.arange the end point is always excluded.


# Create an Arithmetic Progression 
np.arange provides an option of step which defines the difference between 2 consecutive numbers. If step is not provided then it takes the value 1 by default.

Suppose we want to create an arithmetic progression with initial term 20 and common difference 2, upto 30; 30 being excluded.

In [62]:
c = np.arange(20,30,2)   #30 is excluded.
c 

array([20, 22, 24, 26, 28])

It is to be taken care that in np.arange( ) the stop argument is always excluded.


# Reshaping the arrays 
To reshape the array we can use reshape( )

In [64]:
f = np.arange(101,113)
f.reshape(3,4)
f

array([101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112])

Note that reshape() does not alter the shape of the original array. Thus to modify the original array we can use resize( ) 

In [65]:
f.resize(3,4)
f

array([[101, 102, 103, 104],
       [105, 106, 107, 108],
       [109, 110, 111, 112]])

If a dimension is given as -1 in a reshaping, the other dimensions are automatically calculated provided that the given dimension is a multiple of total number of elements in the array.


In [66]:
f.reshape(3,-1)

array([[101, 102, 103, 104],
       [105, 106, 107, 108],
       [109, 110, 111, 112]])

In the above code we only directed that we will have 3 rows. Python automatically calculates the number of elements in other dimension i.e. 4 columns.

# Missing Data
The missing data is represented by NaN (acronym for Not a Number). You can use the command np.nan

In [10]:
# Create a length-10 integer array filled with zeros
np.zeros(10,dtype=int)

array([[ 3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14]])

In [74]:
val = np.array([15,10, np.nan, 3, 2, 5, 6, 4])
val.sum()


nan

To ignore missing values, you can use np.nansum(val) which returns 45

To check whether array contains missing value, you can use the function isnan( )

In [69]:
np.isnan(val)

array([False, False,  True, False, False, False, False, False], dtype=bool)

In [75]:
np.nansum(val) 

45.0

# 2D arrays
A 2D array in numpy can be created in the following manner:

In [70]:
g = np.array([(10,20,30),(40,50,60)])
#Alternatively
g = np.array([[10,20,30],[40,50,60]])
g

array([[10, 20, 30],
       [40, 50, 60]])

The dimension, total number of elements and shape can be ascertained by ndim, size and shape respectively:

In [71]:
print(g.ndim)
print(g.size)
print(g.shape)

2
6
(2, 3)


# Creating some usual matrices 
numpy provides the utility to create some usual matrices which are commonly used for linear algebra.
To create a matrix of all zeros of 2 rows and 4 columns we can use np.zeros( ):

In [73]:
np.zeros( (2,4) )

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

Here the dtype can also be specified. For a zero matrix the default dtype is 'float'. To change it to integer we write 'dtype = np.int16'

In [76]:
np.zeros([2,4],dtype=np.int16)   

array([[0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int16)

To get a matrix of all random numbers from 0 to 1 we write np.empty. 


In [78]:
np.empty( (2,3) )       

array([[  4.94065646e-323,   9.88131292e-323,   1.48219694e-322],
       [  1.97626258e-322,   2.47032823e-322,   2.96439388e-322]])

### Note: The results may vary everytime you run np.empty.
To create a matrix of unity we write np.ones( ). We can create a 3 * 3 matrix of all ones by:

In [79]:
np.ones([3,3])

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [11]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

To create a diagonal matrix we can write np.diag( ). To create a diagonal matrix where the diagonal elements are 14,15,16 and 17 we write:

In [80]:
np.diag([14,15,16,17])

array([[14,  0,  0,  0],
       [ 0, 15,  0,  0],
       [ 0,  0, 16,  0],
       [ 0,  0,  0, 17]])

To create an identity matrix we can use np.eye( ) .


In [81]:
np.eye(5,dtype = "int")

array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1]])

__By default the datatype in np.eye( ) is 'float' thus we write dtype = "int" to convert it to integers.__


## Reshaping 2D arrays
To get a flattened 1D array we can use ravel( ) 

In [97]:
g = np.array([(10,20,30),(40,50,60)])
print(g.ravel())
print(g)

[10 20 30 40 50 60]
[[10 20 30]
 [40 50 60]]


To change the shape of 2D array we can use reshape. Writing -1 will calculate the other dimension automatically and does not modify the original array.

In [98]:
print(g.reshape(3,-1))  # returns the array with a modified shape
                        #It does not modify the original array
print(g)
print(g.shape)


[[10 20]
 [30 40]
 [50 60]]
[[10 20 30]
 [40 50 60]]
(2, 3)


__Similar to 1D arrays, using resize( ) will modify the shape in the original array.__


In [99]:
g.resize((3,2)) #resize modifies the original array
g  

array([[10, 20],
       [30, 40],
       [50, 60]])

# Time for some matrix algebra
Let us create some arrays A,b and B and they will be used for this section:

In [100]:
A = np.array([[2,0,1],[4,3,8],[7,6,9]])
b = np.array([1,101,14])
B = np.array([[10,20,30],[40,50,60],[70,80,90]])

In [103]:
print(A.T)             #transpose
print(A.transpose())  #transpose

print(np.trace(A))    # Return the sum along diagonals of the array
print(np.linalg.inv(A))  #Inverse
print(A)

[[2 4 7]
 [0 3 6]
 [1 8 9]]
[[2 4 7]
 [0 3 6]
 [1 8 9]]
14
[[ 0.53846154 -0.15384615  0.07692308]
 [-0.51282051 -0.28205128  0.30769231]
 [-0.07692308  0.30769231 -0.15384615]]
[[2 0 1]
 [4 3 8]
 [7 6 9]]


Matrix addition and subtraction can be done in the usual way:


__Note that transpose does not modify the original array.__

In [105]:
print(A)
print(B)
print(A+B)
print(A-B)

[[2 0 1]
 [4 3 8]
 [7 6 9]]
[[10 20 30]
 [40 50 60]
 [70 80 90]]
[[12 20 31]
 [44 53 68]
 [77 86 99]]
[[ -8 -20 -29]
 [-36 -47 -52]
 [-63 -74 -81]]


Matrix multiplication of A and B can be accomplished by __A.dot(B)__. Where A will be the 1st matrix on the left hand side and B will be the second matrix on the right side.


In [106]:
A.dot(B)

array([[  90,  120,  150],
       [ 720,  870, 1020],
       [ 940, 1160, 1380]])

To solve the system of linear equations: __Ax = b__ we use __np.linalg.solve( )__


In [108]:
np.linalg.solve(A,b)#Solve a linear matrix equation, or system of linear scalar equations.



array([-13.92307692, -24.69230769,  28.84615385])

The eigen values and eigen vectors can be calculated using __np.linalg.eig( ) __


In [109]:
np.linalg.eig(A)

(array([ 14.0874236 ,   1.62072127,  -1.70814487]),
 array([[-0.06599631, -0.78226966, -0.14996331],
        [-0.59939873,  0.54774477, -0.81748379],
        [-0.7977253 ,  0.29669824,  0.55608566]]))

The first row are the various eigen values and the second matrix denotes the matrix of eigen vectors where each column is the eigen vector to the corresponding eigen value.

# Some Mathematics functions 
We can have various trigonometric functions like sin, cosine etc. using numpy:

In [111]:
B = np.array([[0,-20,36],[40,50,1]])
np.sin(B)

array([[ 0.        , -0.91294525, -0.99177885],
       [ 0.74511316, -0.26237485,  0.84147098]])

In order to obtain if a condition is satisfied by the elements of a matrix we need to write the criteria. For instance, to check if the elements of B are more than 25 we write:

In [112]:
B>25

array([[False, False,  True],
       [ True,  True, False]], dtype=bool)

We get a matrix of Booleans where True indicates that the corresponding element is greater than 25 and False indicates that the condition is not satisfied.

In a similar manner __np.absolute__, __np.sqrt__ and __np.exp__ return the matrices of absolute numbers, square roots and exponentials respectively.


In [116]:
print(np.absolute(A))
print(np.sqrt(A))
print(np.exp(A))

[[2 0 1]
 [4 3 8]
 [7 6 9]]
[[ 1.41421356  0.          1.        ]
 [ 2.          1.73205081  2.82842712]
 [ 2.64575131  2.44948974  3.        ]]
[[  7.38905610e+00   1.00000000e+00   2.71828183e+00]
 [  5.45981500e+01   2.00855369e+01   2.98095799e+03]
 [  1.09663316e+03   4.03428793e+02   8.10308393e+03]]


In [117]:
#Now we consider a matrix A of shape 3*3:
A = np.arange(1,10).reshape(3,3)
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [119]:
#To find the sum, minimum, maximum, mean, standard deviation and variance respectively we use the following commands:
print(A.sum())
print(A.min())
print(A.max())
print(A.mean())
print(A.std())   #Standard deviation
print(A.var())  #Variance

45
1
9
5.0
2.58198889747
6.66666666667


In [120]:
#In order to obtain the index of the minimum and maximum elements we use argmin( ) and argmax( ) respectively.
print(A.argmin())
print(A.argmax())

0
8


In [121]:
#If we wish to find the above statistics for each row or column then we need to specify the axis:
print(A.sum(axis=0))              
print(A.mean(axis = 0))
print(A.std(axis = 0))
print(A.argmin(axis = 0))

[12 15 18]
[ 4.  5.  6.]
[ 2.44948974  2.44948974  2.44948974]
[0 0 0]


__By defining axis = 0, calculations will move in downward direction i.e. it will give the statistics for each column.
To find the min and index of maximum element fow each row, we need to move in rightwise direction so we write axis = 1:__


In [133]:

print(A.min(axis=1))
print(A)
print(A.argmax(axis = 1))  #return the index of maximum number \
print(A.max(axis=0))



[1 4 7]
[[1 2 3]
 [4 5 6]
 [7 8 9]]
[2 2 2]
[7 8 9]


In [134]:
#To find the cumulative sum along each row we use cumsum( )

print(A)
A.cumsum(axis=1)


[[1 2 3]
 [4 5 6]
 [7 8 9]]


array([[ 1,  3,  6],
       [ 4,  9, 15],
       [ 7, 15, 24]])

# Creating 3D arrays
Numpy also provides the facility to create 3D arrays. A 3D array can be created as:

In [135]:
X = np.array( [[[  1, 2,3],              
                [ 4, 5, 6]],
               [[7,8,9],
                [10,11,12]]])
print(X.shape)
print(X.ndim)
print(X.size)

(2, 2, 3)
3
12


__X contains two 2D arrays  Thus the shape is 2,2,3. Totol number of elements is 12.
To calculate the sum along a particular axis we use the axis parameter as follows:__

In [136]:
print(X.sum(axis = 0))
print(X.sum(axis = 1))
print(X.sum(axis = 2))

[[ 8 10 12]
 [14 16 18]]
[[ 5  7  9]
 [17 19 21]]
[[ 6 15]
 [24 33]]


axis = 0 returns the sum of the corresponding elements of each 2D array. axis = 1 returns the sum of elements in each column in each matrix while axis = 2 returns the sum of each row in each matrix.

In [138]:
X.ravel()  #ravel( ) writes all the elements in a single array.


array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

# Indexing in arrays
It is important to note that Python indexing starts from 0. The syntax of indexing is as follows -<Br>
x[start:end] : Elements in array x start through the end (but the end is excluded)<Br>
x[start:]       : Elements start through the end<Br>
x[:end]        : Elements from the beginning through the end (but the end is excluded)<Br>

If we want to extract 3rd element we write the index as 2 as it starts from 0.<Br>

In [140]:
x = np.arange(10)
print(x[2])
print(x[2:5])

2
[2 3 4]


Note that in x[2:5] elements starting from 2nd index upto 5th index(exclusive) are selected.
If we want to change the value of all the elements from starting upto index 7,excluding 7, with a step of 3 as 123 we write:

In [142]:
x[:7:3] = 123
x

array([123,   1,   2, 123,   4,   5, 123,   7,   8,   9])

In [143]:
x = np.arange(10)
x[ : :-1]                                 # reversed x

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [144]:
# Note that the above command does not modify the original array.
# Consider a 3D array:
X = np.array( [[[  1, 2,3],            
                [ 4, 5, 6]],
               [[7,8,9],
                [10,11,12]]])

In [145]:
# To extract the 2nd matrix we write:
X[1,...]                                   # same as X[1,:,:] or X[1]

array([[ 7,  8,  9],
       [10, 11, 12]])

In [12]:
#create a 3x5 array filled with 3.14
np.full((3,3), 3.14)

array([[ 3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14]])

In [146]:
# Remember python indexing starts from 0 that is why we wrote 1 to extract the2nd 2D array.
# To extract the first element from all the rows we write:
X[...,0]                                   # same as X[:,:,0] 

array([[ 1,  4],
       [ 7, 10]])

## Find out position of elements that satisfy a given condition


### Indexing with Arrays of Indices
Consider a 1D array.

In [149]:
x = np.arange(11,35,2)                    
x

array([11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33])

In [148]:
a = np.array([8, 3, 7, 0, 4, 2, 5, 2])
np.where(a > 4)   #np.where locates the positions in the array where element of array is greater than 4. 


(array([0, 2, 6]),)

In [150]:
#We form a 1D array i which subsets the elements of x as follows:
i = np.array( [0,1,5,3,7,9 ] )
x[i]   

array([11, 13, 21, 17, 25, 29])

In [152]:
#In a similar manner we create a 2D array j of indices to subset x.
j = np.array( [ [ 0, 1], [ 6, 2 ] ] )    
x[j]             

array([[11, 13],
       [23, 15]])

In [153]:
#Similarly we can create both  i and j as 2D arrays of indices for x
x = np.arange(15).reshape(3,5)
x
i = np.array( [ [0,1],                        # indices for the first dim
                [2,0] ] )
j = np.array( [ [1,1],                        # indices for the second dim
                [2,0] ] )

In [154]:
#To get the ith index in row and jth index for columns we write:
x[i,j]                                     # i and j must have equal shape

array([[ 1,  6],
       [12,  0]])

In [155]:
#To extract ith index from 3rd column we write:
x[i,2]

array([[ 2,  7],
       [12,  2]])

In [157]:
#For each row if we want to find the jth index we write:
x[:,j]                                    

array([[[ 1,  1],
        [ 2,  0]],

       [[ 6,  6],
        [ 7,  5]],

       [[11, 11],
        [12, 10]]])

In [158]:
#Fixing 1st row and jth index,fixing 2nd row jth index, fixing 3rd row and jth index.

#You can also use indexing with arrays to assign the values:
x = np.arange(10)
x
x[[4,5,8,1,2]] = 0
x

array([0, 0, 0, 3, 0, 0, 6, 7, 0, 9])

In [159]:
# 0 is assigned to 4th, 5th, 8th, 1st and 2nd indices of x.
# When the list of indices contains repetitions then it assigns the last value to that index:
x = np.arange(10)
x
x[[4,4,2,3]] = [100,200,300,400]
x

array([  0,   1, 300, 400, 200,   5,   6,   7,   8,   9])

In [160]:
# Notice that for the 5th element(i.e. 4th index) the value assigned is 200, not 100.
# Caution: If one is using += operator on repeated indices then it carries out the operator only once on
# repeated indices.
x = np.arange(10)
x[[1,1,1,7,7]]+=1
x

array([0, 2, 2, 3, 4, 5, 6, 8, 8, 9])

Although index 1 and 7 are repeated but they are incremented only once.


# Indexing with Boolean arrays 
We create a 2D array and store our condition in b. If we the condition is true it results in True otherwise False.

In [161]:
a = np.arange(12).reshape(3,4)
b = a > 4
b    

array([[False, False, False, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)

In [162]:
# Note that 'b' is a Boolean with same shape as that of 'a'.
# To select the elements from 'a' which adhere to condition 'b' we write:
a[b]                                       

array([ 5,  6,  7,  8,  9, 10, 11])

In [163]:
# Now 'a' becomes a 1D array with the selected elements
# This property can be very useful in assignments:
a[b] = 0                                  
a

array([[0, 1, 2, 3],
       [4, 0, 0, 0],
       [0, 0, 0, 0]])

All elements of 'a' higher than 4 become 0
As done in integer indexing we can use indexing via Booleans:
Let x be the original matrix and 'y' and 'z' be the arrays of Booleans to select the rows and columns.

In [164]:
x = np.arange(15).reshape(3,5)
y = np.array([True,True,False])             # first dim selection
z = np.array([True,True,False,True,False])       # second dim selection

In [166]:
#We write the x[y,:] which will select only those rows where y is True.
print(x[y,:])                             # selecting rows
print(x[y]  )                                   # same thing
#Writing x[:,z] will select only those columns where z is True.
print(x[:,z]  )                                 # selecting columns

[[0 1 2 3 4]
 [5 6 7 8 9]]
[[0 1 2 3 4]
 [5 6 7 8 9]]
[[ 0  1  3]
 [ 5  6  8]
 [10 11 13]]


# Stacking various arrays
Let us consider 2 arrays A and B:

In [168]:
A = np.array([[10,20,30],[40,50,60]])
B = np.array([[100,200,300],[400,500,600]])

In [169]:
#To join them vertically we use np.vstack( ).
np.vstack((A,B))   #Stacking vertically

array([[ 10,  20,  30],
       [ 40,  50,  60],
       [100, 200, 300],
       [400, 500, 600]])

In [170]:
#To join them horizontally we use np.hstack( ).
np.hstack((A,B))   #Stacking horizontally

array([[ 10,  20,  30, 100, 200, 300],
       [ 40,  50,  60, 400, 500, 600]])

In [172]:
# newaxis  helps in transforming a 1D row vector to a 1D column vector.
# from numpy import newaxis
print(a = np.array([4.,1.]))
print(b = np.array([2.,8.]))
print(a[:,np.newaxis] ) 

array([[ 4.],
       [ 1.]])

In [175]:
#The function np.column_stack( ) stacks 1D arrays as columns into a 2D array. It is equivalent to hstack only for 1D arrays:
np.column_stack((a[:,np.newaxis],b[:,np.newaxis]))
np.hstack((a[:,np.newaxis],b[:,np.newaxis])) # same as column_stack

array([[ 4.,  2.],
       [ 1.,  8.]])

# Splitting the arrays

In [179]:

#Consider an array 'z' of 15 elements:
z = np.arange(1,16)
print(z)
#Using np.hsplit( ) one can split the arrays
print("\n",np.hsplit(z,5))   # Split a into 5 arrays

#It splits 'z' into 5 arrays of eqaual length.
#On passing 2 elements we get:
print(np.hsplit(z,(3,5)))   

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]

 [array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9]), array([10, 11, 12]), array([13, 14, 15])]
[array([1, 2, 3]), array([4, 5]), array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15])]


In [180]:
#It splits 'z' after the third and the fifth element.
#For 2D arrays np.hsplit( ) works as follows:
A = np.arange(1,31).reshape(3,10)
print(A)
print(np.hsplit(A,5) )  # Split a into 5 arrays

[[ 1  2  3  4  5  6  7  8  9 10]
 [11 12 13 14 15 16 17 18 19 20]
 [21 22 23 24 25 26 27 28 29 30]]
[array([[ 1,  2],
       [11, 12],
       [21, 22]]), array([[ 3,  4],
       [13, 14],
       [23, 24]]), array([[ 5,  6],
       [15, 16],
       [25, 26]]), array([[ 7,  8],
       [17, 18],
       [27, 28]]), array([[ 9, 10],
       [19, 20],
       [29, 30]])]


__In the above command A gets split into 5 arrays of same shape.
To split after the third and the fifth column we write:__

In [None]:

np.hsplit(A,(3,5))  

# Copying
Consider an array x

In [184]:
x = np.arange(1,16)
#We assign y as x and then say 'y is x'
y = x          
y is x 
# Let us change the shape of y
y.shape = 3,5 
# Note that it alters the shape of x
print(x.shape)

(3, 5)


# Creating a view of the data
Let us store z as a view of x by:

In [185]:
z = x.view()
z is x  

False

In [191]:
# Thus z is not x.
# Changing the shape of z
z.shape = 5,3  
# Creating a view does not alter the shape of x
print(x.shape)

# Changing an element in z
z[0,0] = 1234     
# Note that the value in x also get alters:
x

(3, 5)


array([[1234,    2,    3,    4,    5],
       [   6,    7,    8,    9,   10],
       [  11,   12,   13,   14,   15]])

__Thus changes in the display does not hamper the original data but changes in values of view will affect the original data.__

# Creating a copy of the data:
Now let us create z as a copy of x:


In [192]:
z = x.copy()                        
#Note that z is not x
z is x
#Changing the value in z
z[0,0] = 9999
#No alterations are made in x.
x

array([[1234,    2,    3,    4,    5],
       [   6,    7,    8,    9,   10],
       [  11,   12,   13,   14,   15]])

Python sometimes may give 'setting with copy' warning because it is unable to recognize whether the new dataframe or array (created as a subset of another dataframe or array) is a view or a copy. Thus in such situations user needs to specify whether it is a copy or a view otherwise Python may hamper the results.

In [13]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0,20,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [15]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0,1,5)

array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])

In [16]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.normal(0,1,(3,3))

array([[-0.29834626, -1.10027182,  0.28811566],
       [-0.66430813,  0.06023781,  1.22030034],
       [ 0.12435259,  0.15463083, -1.74274102]])

In [18]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0,10,(3,3))

array([[1, 6, 9],
       [2, 2, 0],
       [1, 3, 5]])

In [22]:
# Create a 3x3 identity matrix
np.eye(3)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [23]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

array([ 1.,  1.,  1.])

In [24]:
x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array):

Other attributes include itemsize, which lists the size (in bytes) of each array element, and nbytes, which lists the total size (in bytes) of the array:
In general, we expect that nbytes is equal to itemsize times size.



In [31]:
print(x1)

print("\n",x2)
print("\n\n",x3)

print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

# Another useful attribute is the dtype, the data type of the array
print("dtype:", x3.dtype)

print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

[4 8 0 1 0 6]

 [[9 8 3 9]
 [8 1 3 5]
 [4 5 0 3]]


 [[[4 6 6 4 4]
  [0 8 3 8 6]
  [3 8 1 6 6]
  [0 6 8 2 4]]

 [[6 5 9 4 7]
  [8 6 0 1 5]
  [3 0 9 3 8]
  [0 5 4 9 3]]

 [[1 1 4 6 4]
  [5 4 7 6 8]
  [5 3 5 6 5]
  [5 4 8 9 4]]]
x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60
dtype: int64
itemsize: 8 bytes
nbytes: 480 bytes


Array Indexing: Accessing Single Elements¶
If you are familiar with Python's standard list indexing, indexing in NumPy will feel quite familiar. In a one-dimensional array, the ith value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists:


In [32]:
x1

array([4, 8, 0, 1, 0, 6])

In [33]:
x1[0]

4

In [34]:
# To index from the end of the array, you can use negative indices:
x1[-1]

6

In [35]:
x1[-2]

0

In [36]:
# In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:
x2[2,1]

5

In [38]:
# Values can also be modified using any of the above index notation:
x2[0,0]=12
x2

array([[12,  8,  3,  9],
       [ 8,  1,  3,  5],
       [ 4,  5,  0,  3]])

Keep in mind that, unlike Python lists, NumPy arrays have a fixed type. This means, for example, that if you attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don't be caught unaware by this behavior!

In [39]:
x1[0] = 3.14159  # this will be truncated!
x1

array([3, 8, 0, 1, 0, 6])

__Array Slicing: Accessing Subarrays
Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon (:) character. The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array x, use this:
x[start:stop:step]__

If any of these are unspecified, they default to the values start=0, stop=size of dimension, step=1. We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.


In [40]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [41]:
x[:5]  # first five elements

array([0, 1, 2, 3, 4])

In [42]:
x[5:]  # elements after index 5

array([5, 6, 7, 8, 9])

In [43]:
x[4:7]  # middle sub-array


array([4, 5, 6])

In [44]:
x[::2]  # every other element

array([0, 2, 4, 6, 8])

In [47]:
x[1::3]  # every other element, starting at index 1


array([1, 4, 7])

__A potentially confusing case is when the step value is negative. In this case, the defaults for start and stop are swapped. This becomes a convenient way to reverse an array:__

In [49]:
x[::-1]  # every other element


array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [50]:
x[5::-2]  # reversed every other from index 5

array([5, 3, 1])

# Multi-dimensional subarrays
Multi-dimensional slices work in the same way, with multiple slices separated by commas. For example:

In [51]:
x2

array([[12,  8,  3,  9],
       [ 8,  1,  3,  5],
       [ 4,  5,  0,  3]])

In [52]:
x2[:2, :3]  # two rows, three columns

array([[12,  8,  3],
       [ 8,  1,  3]])

In [58]:
x2[:3, ::2]  # all rows, every other column

array([[12,  3],
       [ 8,  3],
       [ 4,  0]])

In [60]:
x2[:3, :2]  # all rows with the frist two values

array([[12,  8],
       [ 8,  1],
       [ 4,  5]])

In [75]:
x2[:3, 2::]  # the last two columns 

array([[3, 9],
       [3, 5],
       [0, 3]])

In [76]:
x2[1:3, 2::]  

array([[3, 5],
       [0, 3]])

In [77]:
x2[:3, 2:3:]  # the last two columns 

array([[3],
       [3],
       [0]])

In [79]:
x2[:3, 3::]  # the last two columns 

array([[9],
       [5],
       [3]])

In [54]:
# Finally, subarray dimensions can even be reversed together:
x2[::-1, ::-1]

array([[ 3,  0,  5,  4],
       [ 5,  3,  1,  8],
       [ 9,  3,  8, 12]])

Accessing array rows and columns
One commonly needed routine is accessing of single rows or columns of an array. This can be done by combining indexing and slicing, using an empty slice marked by a single colon (:):

In [88]:
print(x2,"\n")
print(x2[:, 0],"\n")  # first column of x2
print(x2[0, :],"\n")  # first row of x2
# In the case of row access, the empty slice can be omitted for a more compact syntax:
print(x2[0])  # equivalent to x2[0, :]

[[12  8  3  9]
 [ 8  1  3  5]
 [ 4  5  0  3]] 

[12  8  4] 

[12  8  3  9] 

[12  8  3  9]


# Subarrays as no-copy views

One important–and extremely useful–thing to know about array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider our two-dimensional array from before:

While executing the functions, some of them return a copy of the input array, while some return the view. When the contents are physically stored in another location, it is called Copy. If on the other hand, a different view of the same memory content is provided, we call it as View. The differences can be seen in the follwing example.

In [98]:
l=[1,2,3,4,5] #pyton list
print("l=",l)
k=l[1:3]  
print("\nk=",k)
l[1]=100        # if we change an item in the l list, the change cannot be seen in the k list 
print("\nl=",l)
print("\nk=",k)

a=np.array([1,2,3,4])
print("\na=",a)
b=a[1:3]
print("\nb=",b)
a[1]=200         #if we change an item in the a list, the change can be seen in the b list 
print("\na=",a)
print("\nb=",b)



l= [1, 2, 3, 4, 5]

k= [2, 3]

l= [1, 100, 3, 4, 5]

k= [2, 3]

a= [1 2 3 4]

b= [2 3]

a= [  1 200   3   4]

b= [200   3]


This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

In [102]:
print(x2) 
# Let's extract a 2×2 subarray from this
x2_sub = x2[:2, :2]
print(x2_sub)
# Now if we modify this subarray, we'll see that the original array is changed! Observe
x2_sub[0, 0] = 99
print(x2_sub)
print(x2)

[[99  8  3  9]
 [ 8  1  3  5]
 [ 4  5  0  3]]
[[99  8]
 [ 8  1]]
[[99  8]
 [ 8  1]]
[[99  8  3  9]
 [ 8  1  3  5]
 [ 4  5  0  3]]


# Creating copies of arrays:

Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the copy() method:

In [103]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

[[99  8]
 [ 8  1]]


If we now modify this subarray, the original array is not touched:

In [104]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)
print(x2)

[[42  8]
 [ 8  1]]
[[99  8  3  9]
 [ 8  1  3  5]
 [ 4  5  0  3]]


Reshaping of Arrays:

Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape method. For example, if you want to put the numbers 1 through 9 in a 3×3 grid, you can do the following:

In [105]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


Note that for this to work, the size of the initial array must match the size of the reshaped array. Where possible, the reshape method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case.

Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix. This can be done with the reshape method, or more easily done by making use of the newaxis keyword within a slice operation:

In [110]:
x = np.array([1, 2, 3])
# row vector via reshape
x.reshape((1, 3))

array([[1, 2, 3]])

In [111]:
# row vector via newaxis
x[np.newaxis, :]

array([[1, 2, 3]])

In [112]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [113]:
# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

Array Concatenation and Splitting

All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.


Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines np.concatenate, np.vstack, and np.hstack. np.concatenate takes a tuple or list of arrays as its first argument, as we can see here:

In [114]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [115]:
# You can also concatenate more than two arrays at once:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 99 99]


In [116]:
# It can also be used for two-dimensional arrays:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])
# concatenate along the first axis
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [117]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [118]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=0)

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1).Many operation can take place along one of these axes. For example, we can sum each row of an array, in which case we operate along columns, or axis 1:

In [120]:
print(grid)
np.sum(grid,axis=0)

[[1 2 3]
 [4 5 6]]


array([5, 7, 9])

In [121]:
np.sum(grid,axis=1)

array([ 6, 15])

For working with arrays of mixed dimensions, it can be clearer to use the np.vstack (vertical stack) and np.hstack (horizontal stack) functions:

In [122]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [123]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

In [124]:
grid

array([[9, 8, 7],
       [6, 5, 4]])

In [139]:
# Similary, np.dstack will stack arrays along the third axis.
a = np.array([1,2,3])
b = np.array([2,3,4])
print(a)
print(b)
print(np.dstack((a,b)))#Stack arrays in sequence depth wise (along third axis).
print(np.stack((a, b))) #Join a sequence of arrays along a new axis.
print(np.stack((a, b), axis=-1))


[1 2 3]
[2 3 4]
[[[1 2]
  [2 3]
  [3 4]]]
[[1 2 3]
 [2 3 4]]
[[1 2]
 [2 3]
 [3 4]]


Splitting of arrays
The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points:

In [168]:
x =np.array( [1, 2, 3, 99, 99, 3, 2, 1])
print(np.split(x,2))
x1, x2, x3 = np.split(x, [3, 5])
print("\n",x1, x2, x3)

[array([ 1,  2,  3, 99]), array([99,  3,  2,  1])]

 [1 2 3] [99 99] [3 2 1]


Notice that N split-points, leads to N + 1 subarrays. The related functions np.hsplit and np.vsplit are similar:

In [169]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [170]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [171]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


In [181]:
print(grid.ndim)
x = np.arange(16.0).reshape(2, 2, 4)
print(x)
print(x.ndim)
print(np.dsplit(x, 2)) #dsplit only works on arrays of 3 or more dimensions
print(np.ndim(np.dsplit(x,2)))

2
[[[  0.   1.   2.   3.]
  [  4.   5.   6.   7.]]

 [[  8.   9.  10.  11.]
  [ 12.  13.  14.  15.]]]
3
[array([[[  0.,   1.],
        [  4.,   5.]],

       [[  8.,   9.],
        [ 12.,  13.]]]), array([[[  2.,   3.],
        [  6.,   7.]],

       [[ 10.,  11.],
        [ 14.,  15.]]])]
4


__Up until now, we have been discussing some of the basic nuts and bolts of NumPy; in the next few sections, we will dive into the reasons that NumPy is so important in the Python data science world. Namely, it provides an easy and flexible interface to optimized computation with arrays of data.__

# Python NumPy Array v/s List

We use python numpy array instead of a list because of the below three reasons:


### Less Memory

### Fast

### Convenient


The very first reason to choose python numpy array is that it occupies less memory as compared to list. Then, it is pretty fast in terms of execution and at the same time it is very convenient to work with numpy. So these are the major advantages that python numpy array has over list. Don’t worry, I am going to prove the above points one by one practically in PyCharm. Consider the below example:




In [185]:
import time
import sys
S= range(1000)
print(sys.getsizeof(5)*len(S))
 
D= np.arange(1000)
print(D.size*D.itemsize)

28000
8000


The above output shows that the memory allocated by list (denoted by S) is 14000 whereas the memory allocated by the numpy array is just 4000. From this, you can conclude that there is a major difference between the two and this makes python numpy array as the preferred choice over list.

In [192]:
SIZE = 1000000
 
L1= range(SIZE)
L2= range(SIZE)
A1= np.arange(SIZE)
A2=np.arange(SIZE)
 
start= time.time()
result=[(x,y) for x,y in zip(L1,L2)]
print((time.time()-start)*1000)
 
start=time.time()
result= A1+A2
print((time.time()-start)*1000)


127.97880172729492
34.47580337524414


In the above code, we have defined two lists and two numpy arrays. Then, we have compared the time taken in order to find the sum of lists and sum of numpy arrays both. If you see the output of the above program, there is a significant change in the two values. List took 380ms whereas the numpy array took almost 49ms. Hence, numpy array is faster than list. Now, if you noticed we had run a ‘for’ loop for a list which returns the concatenation of both the lists whereas for numpy arrays, we have just added the two array by simply printing A1+A2. That’s why working with numpy is much easier and convenient when compared to the lists.

Therefore, the above examples proves the point as to why you should go for python numpy array rather than a list!

Moving forward in python numpy tutorial, let’s focus on some of its operations.

You may go through this recording of Python NumPy tutorial where our instructor has explained the topics in a detailed manner with examples that will help you to understand this concept better.



__Python list are made for heterogeneous types while numpy.Array works on homogeneous types.__

Python list adding and removing elements where numpy.Array does not.

The used functions are not described anywhere therefore we cannot assume that the algorithm is the same

The sum of python list takes more time than the mean which is suppose to take more time. And numpymean takes double the time of numpy sum

The protocol only runs the function once, where it should run the function at least 100+ times and get an average

In [29]:
from numpy import arange
from datetime import datetime
def calculate_time(expression):

    narray = arange(100000)
    larray = range(100000)
    start = datetime.now()
    val = eval(expression)
    print(val)
    end = datetime.now()
    return "%d micro seconds  %s" %((end-start).microseconds,expression)

In [30]:
numpy_op1 = "narray.sum()"

list_op1 = "sum(larray)"

print (calculate_time(numpy_op1))

print (calculate_time(list_op1))

4999950000
2062 micro seconds  narray.sum()
4999950000
9560 micro seconds  sum(larray)


In [31]:
numpy_op2 = "narray.min()"

list_op2 = "sorted(larray)[0]"

print (calculate_time(numpy_op2))

print (calculate_time(list_op2))

0
797 micro seconds  narray.min()
0
20317 micro seconds  sorted(larray)[0]


In [32]:
numpy_op3 = "narray.mean()"

list_op3= "sum(larray)/len(larray)"

print (calculate_time(numpy_op3))

print (calculate_time(list_op3))

49999.5
2023 micro seconds  narray.mean()
49999.5
10206 micro seconds  sum(larray)/len(larray)


In [33]:
numpy_op4 = "narray.max()"

list_op4 = "sorted(larray,reverse=True)[0]"

print (calculate_time(numpy_op4))

print (calculate_time(list_op4))

99999
1600 micro seconds  narray.max()
99999
21885 micro seconds  sorted(larray,reverse=True)[0]


https://www.scipy.org/scipylib/faq.html#what-advantages-do-numpy-arrays-offer-over-nested-python-lists

# What advantages do NumPy arrays offer over (nested) Python lists?

Python’s lists are efficient general-purpose containers. They support (fairly) efficient insertion, deletion, appending, and concatenation, and Python’s list comprehensions make them easy to construct and manipulate. However, they have certain limitations: they don’t support “vectorized” operations like elementwise addition and multiplication, and the fact that they can contain objects of differing types mean that Python must store type information for every element, and must execute type dispatching code when operating on each element. This also means that very few list operations can be carried out by efficient C loops – each iteration would require type checks and other Python API bookkeeping.

In [36]:
x = np.array([-2, -1, 0, 1, 2])
abs(x)

array([2, 1, 0, 1, 2])

In [40]:
x = np.arange(1, 6)
print(x)
np.add.reduce(x)

[1 2 3 4 5]


15

In [41]:
np.add.accumulate(x)

array([ 1,  3,  6, 10, 15])

In [42]:
np.multiply.accumulate(x)

array([  1,   2,   6,  24, 120])

We saw in the previous section how NumPy's universal functions can be used to vectorize operations and thereby remove slow Python loops. Another means of vectorizing operations is to use NumPy's broadcasting functionality. Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes.

Introducing Broadcasting:

Recall that for arrays of the same size, binary operations are performed on an element-by-element basis:

In [43]:
a = np.array([0, 1, 2])
b = np.array([5, 5, 5])
a + b

array([5, 6, 7])

Broadcasting allows these types of binary operations to be performed on arrays of different sizes–for example, we can just as easily add a scalar (think of it as a zero-dimensional array) to an array:

In [44]:
a + 5

array([5, 6, 7])

In [47]:
a = np.arange(3)
b = np.arange(3)[:, np.newaxis]

print(a)
print(b)

[0 1 2]
[[0]
 [1]
 [2]]


In [50]:
m = np.random.random((6, 2))
print(m)
m.take([0, 2, 5], axis=0)

[[ 0.42877087  0.22164951]
 [ 0.64109942  0.15716184]
 [ 0.3563722   0.83836196]
 [ 0.95433898  0.48963485]
 [ 0.19642583  0.53002584]
 [ 0.74516524  0.76451485]]


array([[ 0.42877087,  0.22164951],
       [ 0.3563722 ,  0.83836196],
       [ 0.74516524,  0.76451485]])

In [52]:
a=np.array([1,2,3,4,5,13,11,33,66,32,77,55,38,99,85,19])
np.where(a>6)# returns the index for those values is bigger than 6

(array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15]),)

# References:

https://jakevdp.github.io/PythonDataScienceHandbook/index.html   <br> 
http://www.pansop.com/1021/<br> 
https://www.listendata.com/2017/12/numpy-tutorial.html<br> 
https://www.edureka.co/blog/python-numpy-tutorial/?utm_source=facebook&utm_medium=content-link&utm_campaign=social-media-edureka-july-aj<br> 