# What is Numpy

It stands for "Numerical Python". NumPy is a Python module that provides fast and efficient array operations of homogeneous data. It is the core library for scientific computing in Python providing a high-performance multidimensional array object, and tools for working with arrays.

NumPy is one of the many packages that are extremely essential in your data science journey because this library equips you with an array data structure that offers some benefits over the traditional data structures of Python like lists.

## NumPy Arrays

The central feature of NumPy is the array object class, also called the ndarray. Arrays are very similar to lists in Python, except that every element of an array must be of the same type (in lists you can hold data which have different types), typically a numeric type like float or int. It is very much similar to an n-dimensional matrix which looks like:

arrays

Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists. You can choose to create arrays of n dimensions (Python list is an array of pointers to Python objects, at least 4 bytes per pointer plus 16 bytes for even the smallest Python object; 4 for type pointer, 4 for reference count, 4 for value and the memory allocators rounds up to 16. A NumPy array is an array of uniform values -- single-precision numbers takes 4 bytes each, double-precision ones, 8 bytes).

# Creating NumPy arrays
The syntax of creating a NumPy array is:

numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)

Here, the arguments

object: Any object exposing the array interface
dtype: Desired data type of array, optional
copy: Optional. By default (true), the object is copied
order: C (row-major) or F (column-major) or A (any) (default)
subok: By default, returned array forced to be a base class array. If true, sub-classes passed through
ndim: Specifies minimum dimensions of the resultant array
Let's see how you can create a simple array using NumPy by first importing the package numpy as np

import numpy as np
a = np.array([1,2,3,4])               # creates a 1-dimensional array
b = np.array([[1,2,3,4], [5,6,7,8]])    # creates a 2-dimensional array
print(a)
print('----')
print(b)
Its output will be

[1 2 3 4]
----
[[1 2 3 4]
 [5 6 7 8]]
 
# Advantages of using NumPy

Absolutely free since open-sourced
Faster access in reading and writing items
Time and space complexity of tasks is much lower when compared with traditional data structures
Has a lot of built-in functions for linear algebra


In [15]:
import numpy as np

a=np.array([[1,2,3],[5,6,7]])
b=np.arange(1,11)
print('Array element in a : ',a)
print('Shape of Aray a : ',a.shape)
print('Dimension of a : ',a.ndim)
print('Size of a : ',a.size)
print('Arrray element in b : ',b)
print('Shape of b : ',b.shape)
reshaped=b.reshape(5,2)
print('Reshaped Array is : ',reshaped)

Array element in a :  [[1 2 3]
 [5 6 7]]
Shape of Aray a :  (2, 3)
Dimension of a :  2
Size of a :  6
Arrray element in b :  [ 1  2  3  4  5  6  7  8  9 10]
Shape of b :  (10,)
Reshaped Array is :  [[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]]


# Check Attributes of the array
In this task, you will check out the important attributes for a NumPy array

Make a NumPy array also named array which consists of the first 10 natural numbers using the np.arange(1, 11) command (you will learn how to create such forms later on during the course).
Check out its dimension and save it as dim.
Reshape the above array into a (5, 2) array using the .reshape(5,2) on the previous array. You have to create a new array altogether for this operation so do not forget to assign to a new variable reshaped.
Check out the new dimension and save it as new_dim.
Print out dim and new_dim.
Test Cases: The calculated value of dim should be 1.
The calculated value of new_dim should be 2.

In [16]:
import numpy as np
# Code starts here

# initialize NumPy array
array=np.arange(1,11)

# check dimensions
dim=(array.ndim)
print(dim)


# reshaped array
reshaped=array.reshape(5,2)
print(reshaped)
# check shape

new_dim=(reshaped.ndim)
print(new_dim)
# Code ends here

1
[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]]
2


In [22]:
#Creates an uninitialized (arbitrary) array of specified shape and dtype

import numpy as np
np.empty((5,4),dtype='int8')

array([[   0,  -56,   63,   59],
       [-127,    2,    0,    0],
       [   0,    0,    0,    0],
       [   0,    0,    0,    0],
       [   3,    0,    2,    0]], dtype=int8)

In [25]:
#Creates a new array of specified size, filled with zeros
np.zeros((3,4),dtype='int8')

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int8)

In [26]:
np.ones((3,4),dtype='int8')

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int8)

In [27]:
#Creates a new array of given shape and type, filled with a constant value

np.full((3,4),7)

array([[7, 7, 7, 7],
       [7, 7, 7, 7],
       [7, 7, 7, 7]])

In [28]:
#Creates a 2-D array with ones on the diagonal and zeros elsewhere

np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [29]:
import numpy as np
# identity 4x4 matrix
array=np.eye((4),dtype='float32')

# display
print(array)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [32]:
#python list
a=[1,2,3]
print(type(a))
#convert to NumPy array
b=np.asarray(a)
print(b)
print(type(b))

<class 'list'>
[1 2 3]
<class 'numpy.ndarray'>


In [33]:
#python tuples
a=((1,2),(3,4))

#convery to NumPy array
b=np.asarray(a)
print(b)


[[1 2]
 [3 4]]


In [38]:
a=np.fromiter([1,2,3,4],dtype='int8')
b=np.fromiter((1,2,3,4),dtype='int8')
c=np.fromiter(range(1,5),dtype='int8')
d=np.fromiter('Raunak',dtype='S50')

print("Array a is ",a)
print("Array b is ",b)
print("Array c is ",c)
print("Array d is ",d)
print("Data type of d is ",type(d))

Array a is  [1 2 3 4]
Array b is  [1 2 3 4]
Array c is  [1 2 3 4]
Array d is  [b'R' b'a' b'u' b'n' b'a' b'k']
Data type of d is  <class 'numpy.ndarray'>


In [39]:
# NumPy array from 1 to 19
print(np.arange(1,20,dtype='int32'))

# NumPy array from 1 to 19 with step size 2
print(np.arange(1,20,2,dtype='int8'))

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[ 1  3  5  7  9 11 13 15 17 19]


In [53]:
# NumPy array from 1 to 20 with 100 numbers in between
print(np.linspace(1,20,100,7,8,'int8'))

(array([ 1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,  4,
        4,  4,  4,  4,  5,  5,  5,  5,  5,  5,  6,  6,  6,  6,  6,  7,  7,
        7,  7,  7,  8,  8,  8,  8,  8,  9,  9,  9,  9,  9, 10, 10, 10, 10,
       10, 10, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13,
       14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 17,
       17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 20],
      dtype=int8), 0.1919191919191919)


In [55]:
np.array([[2,3,4],[5,6,7],[8,9,10,11]])

array([list([2, 3, 4]), list([5, 6, 7]), list([8, 9, 10, 11])],
      dtype=object)

In [56]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [58]:
import numpy as np
a = np.arange(9).reshape(3,3)
print(a)
print(a[1,1])

[[0 1 2]
 [3 4 5]
 [6 7 8]]
4


In [68]:
# Code starts here
import numpy as np
# initialize array
array=np.array([3,4.5,3+5j,0])

# boolean filter

real=np.isreal(array)
real_array=array[real]
print(real_array)

# boolean filter

imag=np.iscomplex(array)
imag_array=array[imag]
print(imag_array)


# Code ends here

[3. +0.j 4.5+0.j 0. +0.j]
[3.+5.j]


# What is vectorization?
Vectorization is the ability of NumPy by which we can perform operations on entire arrays rather than on a single element. When looping over an array or any data structure in Python, there’s a lot of overhead involved. Vectorized operations in NumPy delegate the looping internally to highly optimized C and Fortran functions, making for a cleaner and faster Python code.

Examples
You have already come across such an example in Boolean indexing.

import numpy as np
a = np.array([1,2,3,4,5,6,7])
print(a[a > 2])
The above codeblock will output the array [3, 4, 5, 6, 7] as it compares each element being greater than or less than 2.

Vectorized operations
No, let us look at how you can do some elementary vectorized operations like addition, subtraction, multiplication etc. Images below depict the type of operations and their corresponding output.

Addition
Two ways to go about it, using either + or np.add()

a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b=np.array([[10,11,12],[13,14,15],[16,17,18]])

# element wise addition
print(a+b)
print('==========')
print(np.add(a,b))
Output

[[11 13 15]
 [17 19 21]
 [23 25 27]]
==========
[[11 13 15]
 [17 19 21]
 [23 25 27]]
Subtraction
Two ways, - or np.subtract()

# element wise subtractions
print(a-b)
print('==========')
print(np.subtract(a,b))
Output

[[-9 -9 -9]
 [-9 -9 -9]
 [-9 -9 -9]]
==========
[[-9 -9 -9]
 [-9 -9 -9]
 [-9 -9 -9]]
Multiplication
Two ways, * or np.multiply()

# element wise multiplication
print(a*b)
print('==========')
print(np.multiply(a,b))
Output

[[ 10  22  36]
 [ 52  70  90]
 [112 136 162]]
==========
[[ 10  22  36]
 [ 52  70  90]
 [112 136 162]]
Division
Two ways, / or np.divide()

# element wise division
print(a/b)
print('==========')
print(np.divide(a,b))
Output

[[0.1        0.18181818 0.25      ]
 [0.30769231 0.35714286 0.4       ]
 [0.4375     0.47058824 0.5       ]]
==========
[[0.1        0.18181818 0.25      ]
 [0.30769231 0.35714286 0.4       ]
 [0.4375     0.47058824 0.5       ]]
Square root transform
Use np.sqrt()

# element wise square root transform
a = np.array([[1,4,9],[16,25,36]])
print(np.sqrt(a))
Output

[[1. 2. 3.]
 [4. 5. 6.]]
Log transform
Use np.log()

# element wise square root transform
a = np.array([[1,4,9],[16,25,36]])
print(np.log(a))
Output

[[0.         1.38629436 2.19722458]
 [2.77258872 3.21887582 3.58351894]]
Aggregrate operations
Aggregration operations are those where we perform some operation on the entire array. Some commonly used aggregrate operations are listed below:

Command	Description
a.sum()	Array-wise sum
a.min()	Array-wise minimum value
a.max(axis=0)	Maximum value of an array row
a.cumsum(axis=1)	Cumulative sum of the elements
a.mean()	Mean
np.median(a)	Median
np.corrcoef(a)	Correlation coefficient
np.std(a)	Standard deviation
Array comparison
You already saw how you can perform element-wise comparison of array elements. With NumPy you also perform entire array comparisons. Use the command np.array_equal() for array comparison. It is illustrated with examples below:

a = np.array([1,2,3,4])
b = np.array([1,2,3,4])
print(np.array_equal(a,b))
Output

True
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])
print(np.array_equal(a,b))
Output

False
Understanding Axes notation
In NumPy, an axis refers to a single dimension of a multidimensional array. By changing axis you can compute across dimensions, whereas not specifying axis will result in computation over the entire array.

a = np.array([[1,4,9],[16,25,36]])

# computes sum over columns
print(a.sum(Axis=0))
print('==========')

# computes sum over rows
print(a.sum(axis=1))
print('==========')

# computes total sum
print(a.sum())
Output

[17 29 45]
==========
[14 77]
==========
91
In the image above you can calculate the sum over the rows, columns and the entire array just by playing around the parameter axis. Try it more on arrays with 3 or more dimensions!



In [76]:
a = np.array([[1,2,3,4],[2,3,4,5]])
b = np.array([1,2,3,4])
np.sum(a+b)
print(a.cumsum(axis=1))
a.sum()
np.corrcoef(a)

[[ 1  3  6 10]
 [ 2  5  9 14]]


array([[1., 1.],
       [1., 1.]])

In [78]:
a = np.array([[1,4,9],[16,25,36]])
print(a)

[[ 1  4  9]
 [16 25 36]]


In [79]:
a.sum(axis=0)

array([17, 29, 45])

In [80]:
a.sum(axis=1)

array([14, 77])

In [82]:
b=np.array([[5,6,7],[1,2,3],[2,3,4]])

In [84]:
b.sum(axis=0)

array([ 8, 11, 14])

In [85]:
b.sum(axis=1)

array([18,  6,  9])

# Buy/ Sell for maximum profit
In this task, you will combine vectorization and aggregation methods to solve a BUY-SELL problem. You have a range of prices at 3 intervals of a day for 2 consecutive days and you can buy and sell only once. First, you will buy and then sell for the maximum profit.

Initialize an array prices = [[40, 35, 20], [21, 48, 70]]. Here, 40 is the market price during the first interval of the first day and 35 during the second interval and so on. Similarly, 21 is the price during the first half of the second day and 48 during the second half
Flatten the array with the .flatten() method. This method will convert your 2-D array into a 1-D array and store it in variable prices.
Find the minimum over the flattened array using np.min(), this will be your buying_price
Also find the index of buying price by first converting the array into a list using list(array) and then use the .index(buying_price) attribute to pick the index of the buying price and store it in a variable index
Create a new subset starting from the index (created in the previous step) till the end of the array and find the maximum value using np.max() in it. This will be your selling_price
Find the difference between buying and selling price and save the same in variable profit
Test Cases: The calculated value of buying_price should be 20.
The calculated value of selling_price should be 70.
The calculated value of profit should be 50.

In [88]:
import numpy as np
# initialize array for prices
prices = np.array([[40,35,20],[21,48,70]])

print(prices)
# Code starts here

# flatten the array
prices=prices.flatten()
# print(prices)
print(prices)

# minimum price
buying_price=np.min(prices)
print(buying_price)

# index of buying price
index=list(prices).index(buying_price)
print(index)
print(prices)
# create subset
selling_price=np.max(prices[index:])

# print(selling_price)
print(selling_price)

# selling price


# profit

profit=selling_price-buying_price

# display
print(profit)


# Code ends here

[[40 35 20]
 [21 48 70]]
[40 35 20 21 48 70]
20
2
[40 35 20 21 48 70]
70
50


# What is broadcasting?
Have you wondered how this operation np.array([1, 2, 3]) + 4 was successfully carried out? It was all due to the broadcasting power of NumPy arrays. Let's discuss this property in details.

In NumPy you do not need arrays to be of the same shape while performing operations among them until these conditions are satisfied:

If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
The arrays can be broadcast together if they are compatible in all dimensions.
After broadcasting, each array behaves as if it had shape equal to the element-wise maximum of shapes of the two input arrays.
In any dimension where one array had size 1 and the other array had a size greater than 1, the first array behaves as if it were copied along that dimension.


In [89]:
np.arange(3)+5

array([5, 6, 7])

In [92]:
np.ones((3,3))+np.arange(3)

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [94]:
np.arange(3).reshape((3,1))+np.arange(3)

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

# Normalize a 5x5 matrix
In this task, you will normalize i.e. subtract the minimum value and divide by the range

Create a random 5x5 matrix with the help of np.random.random((5,5)) and save it as z
Calculate the minimum and maximum value with the help of .min() and .max() methods respectively for the 5x5 array. Save them as zmax and zmin for maximum and minimum respectively
Now using the power of broadcasting subtract the minimum value from each element and divide by their range (maximum-minimum) to normalize. Save this normalized as z_norm
Print the standardized array
Test Cases: The calculated value of zmax should be 0.97.
The calculated value of zmin should be 0.02.
The calculated value of z_norm should be 0.03.

In [97]:
import numpy as np
np.random.seed(21)

# Code starts here

# create random 5x5 array
z=np.random.random((5,5))
print(z)

# minimum and maximum values
zmax=np.max(z)
print('%.2f'%zmax)
zmin=np.min(z)
print('%.2f'%zmin)

# normalize
z_norm=(z-zmin)/(zmax-zmin)

# display
print(z_norm)

[[0.04872488 0.28910966 0.72096635 0.02161625 0.20592277]
 [0.05077326 0.30227189 0.66391029 0.30811439 0.58359128]
 [0.06957095 0.86740448 0.13324052 0.17812466 0.49592955]
 [0.86369964 0.75894384 0.97048513 0.75930255 0.38425003]
 [0.40871833 0.71336043 0.27066977 0.85410287 0.91316397]]
0.97
0.02
[[0.02856942 0.28190767 0.73703555 0.         0.19423813]
 [0.03072817 0.29577917 0.67690496 0.3019365  0.59225784]
 [0.05053881 0.89136471 0.1176393  0.16494209 0.49987233]
 [0.88746023 0.77705951 1.         0.77743756 0.38217481]
 [0.40796162 0.72901978 0.26247412 0.87734633 0.93959001]]


# Speed of NumPy vs Python lists
Code snippet Speed comparison while performing element-wise addition of two Python lists and element-wise addition of two NumPy.

In [98]:
np.random.seed(21)

# import packages
import time
import numpy as np

# initialize variable
num = 10000

# initialize lists
l1, l2 = [i for i in range(num)], [i+2 for i in range(num)]

# initialize arrays
a1, a2 = np.array(l1), np.array(l2)

# start time
start_list = time.time()

# element-wise addition for both lists
sum_lists = [i+j for i, j in zip(l1, l2)]

# stop time
stop_list = time.time()

# display time
print(stop_list - start_list)


# start time
start_array = time.time()

# element-wise addition for both arrays
sum_arrays = a1 + a2

# stop time
stop_array = time.time()

# display time
print(stop_array - start_array)

0.0030167102813720703
0.0009794235229492188


# Quadratic Equation
Find roots of quadratic equation
The quadratic equation that you will be solving is x^2 -4x + 4 = 0x 
2
 −4x+4=0
Make an array coeff to store the coefficients (1,-4, 4) of the quadratic equation
Pass this array coeff as argument to np.roots() method which will return the roots for the quadratic equation x^2 -4x + 4 = 0x 
2
 −4x+4=0. Save it as roots
Print out roots to see your result
Test Cases: roots[0] should be 2.0 roots[1] should be 2.0.
coeff[0] should be 1, coeff[1] should be -4 and coeff[2] should be 4.

In [99]:
# import packages
import numpy as np

# Code starts here

# co-efficients of x
coeff=np.array([1,-4,4])

# roots of equation
roots=np.roots(coeff)


# display roots
print(roots)


# Code ends here

[2. 2.]


In [100]:
# Code starts here
import numpy as np
# centigrade temperatures
centigrade_temps = np.array([0, 10, 25, 32, 80, 99.99])

# function for conversion
convert = lambda fahrenheit_temps : (centigrade_temps*9+32*5)/5

# def convert(celcius):
#     fahrenheit_temps=(celcius*9+32*5)/5
#     return fahrenheit_temps

fahrenheit_temps=(convert(centigrade_temps))

# display fahrehheit temperatures
print(fahrenheit_temps)

# Code ends here

[ 32.     50.     77.     89.6   176.    211.982]


# Solve for X

In [101]:
# Code starts here
import numpy as np
# initialize matrix A and b
A=np.array([[2,1,2],[3,0,1],[1,1,-1]])
b=np.array([[-3],[5],[2]])
print(A)
print(b)

# Solve for x
x=np.linalg.solve(A,b)
print(x)

# Check solution
check=np.allclose(np.dot(A,x),b)
print(check)

# Code ends here

[[ 2  1  2]
 [ 3  0  1]
 [ 1  1 -1]]
[[-3]
 [ 5]
 [ 2]]
[[ 2.5]
 [-3. ]
 [-2.5]]
True


In [102]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

# Find a non singular matrix
Assume that A is a non-singular matrix
Resolve the equation 3A = A^2 + AB3A=A 
2
 +AB until it reduces to a simpler form for you to carry out simple array addition. Array A should be coming as A = 3I - BA=3I−B where I is the Identity matrix
Initialize array B using np.array() and I using np.identity()
Save resultant array as A which is given mathematically by A = 3I - BA=3I−B
Display matrix A
Test Cases: The calculated value of A should be [[1, 0, 1], [0, 1, 1], [1, 0, 2]].

In [103]:
import numpy as np
# initialize array B and Identity matrix
B=np.array([[2,0,-1],[0,2,-1],[-1,0,1]])
I=np.identity(3)
print(B)
print(I)

# calculate result

A=3*I-B

print(A)


# Code ends here

[[ 2  0 -1]
 [ 0  2 -1]
 [-1  0  1]]
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[1. 0. 1.]
 [0. 1. 1.]
 [1. 0. 2.]]
