# Numpy

NumPy is a general-purpose array-processing package. It provides a high-performance multidimensional array object, and tools for working with these arrays. It is the fundamental package for scientific computing with Python

## Why is NumPy fast?
- Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just “behind the scenes” in optimized, pre-compiled C code. Vectorized code has many advantages, among which are:

- vectorized code is more concise and easier to read

- fewer lines of code generally means fewer bugs

- the code more closely resembles standard mathematical notation (making it easier, typically, to correctly code mathematical constructs)

- vectorization results in more “Pythonic” code. Without vectorization, our code would be littered with inefficient and difficult to read for loops.

## What is an array
- An array is a data structure that stores values of same data type. In Python, this is the main difference between arrays and lists. While python lists can contain values corresponding to different data types, arrays in python can only contain values corresponding to same data type

In [1]:
import numpy as np

In [2]:
my_lst=[1,2,3,4,5]

arr=np.array(my_lst)

In [3]:
print(arr)

[1 2 3 4 5]


In [4]:
type(arr)

numpy.ndarray

In [12]:
## Multinested array
my_lst1=[1,2,3,4,5]
my_lst2=[2,3,4,5,6]
my_lst3=[9,7,6,8,9]

arr1=np.array([my_lst1,my_lst2,my_lst3])

In [13]:
arr1

array([[1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [9, 7, 6, 8, 9]])

In [7]:
## check the shape of the array

arr.shape

(3, 5)

## Indexing/ Slicing arrays
Slicing in python means taking elements from one given index to another given index.

We pass slice instead of index like this: [start:end].

We can also define the step, like this: [start:end:step].

If we don't pass start its considered 0

If we don't pass end its considered length of array in that dimension

If we don't pass step its considered 1

In [9]:
#Note: The result includes the start index, but excludes the end index.
my_lst=[1,2,3,4,5]

arr=np.array(my_lst)

In [10]:
## Accessing the array elements

arr

array([1, 2, 3, 4, 5])

In [11]:
arr[3]

4

In [14]:
arr1

array([[1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [9, 7, 6, 8, 9]])

In [15]:
arr1[1:,:2]

array([[2, 3],
       [9, 7]])

In [16]:
arr1[:,3:]

array([[4, 5],
       [5, 6],
       [8, 9]])

In [17]:
arr

array([1, 2, 3, 4, 5])

In [18]:
arr[3:]=100

In [19]:
arr

array([  1,   2,   3, 100, 100])

In [24]:
arr[2:]=100

In [25]:
arr

array([  1,   2, 100, 100, 100])

In [5]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[-3:-1])

[5 6]


In [6]:
#Use the step value to determine the step of the slicing:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5:2])

[2 4]


In [7]:
#Return every other element from the entire array
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[::2])

[1 3 5 7]


In [8]:
#Slicing 2-D Arrays
import numpy as np

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print(arr[1, 1:4])

[7 8 9]


In [11]:
#From both elements, return index 2:
import numpy as np

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print(arr[0:2, 2])

[3 8]


In [12]:
#From both elements, slice index 1 to index 4 (not included), this will return a 2-D array:
import numpy as np

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print(arr[0:2, 1:4])

[[2 3 4]
 [7 8 9]]


[15 20 25]


## Creating Arrays

### From list / tuple

In [26]:
a = np.array([1, 2, 3])

In [27]:
a

array([1, 2, 3])

### Zero/Ones

In [29]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [30]:
np.ones((3, 2))

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

### Range / Random

In [36]:
np.random.randint(1, 10, size=(3, 3))


array([[4, 2, 9],
       [4, 3, 6],
       [1, 4, 5]])

In [31]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [32]:
np.arange(0, 10, 2)

array([0, 2, 4, 6, 8])

In [33]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [34]:
np.random.rand(3, 3)

array([[0.15076019, 0.08682067, 0.90004063],
       [0.70664481, 0.47422101, 0.59558624],
       [0.60037933, 0.82522903, 0.73629568]])

In [35]:
np.random.randn(3, 3)

array([[ 0.79601987,  1.3656323 , -0.12959906],
       [ 0.58313045,  0.53964368, -1.11627258],
       [-0.0639479 , -0.6626367 ,  0.78479084]])

### Array Info & Properties

In [41]:
arr.itemsize #how much memory one element takes

4

In [37]:
arr.shape

(5,)

In [38]:
arr.ndim

1

In [39]:
arr.size #how many elements

5

In [40]:
arr.dtype

dtype('int32')

### Indexing and Slicing

In [53]:
arr = arr1[1:3, 0:2]
arr

array([[2, 3],
       [9, 7]])

In [42]:
arr[0]

1

In [45]:
arr1

array([[1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [9, 7, 6, 8, 9]])

In [44]:
arr1[1, 2]

4

In [46]:
arr[:2]

array([1, 2])

In [48]:
arr1[:, 1]

array([2, 3, 7])

### Reshaping and Transpose

In [32]:
#Convert the following 1-D array with 12 elements into a 2-D array.
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(4, 3)

print(newarr)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [33]:
#Convert the following 1-D array with 12 elements into a 3-D array.
#The outermost dimension will have 2 arrays that contains 3 arrays, each with 2 elements:

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(2, 3, 2)

print(newarr)


[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]


In [34]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

print(arr.reshape(2, 4).base)

[1 2 3 4 5 6 7 8]


### Unknown Dimension
- You are allowed to have one "unknown" dimension.

- Meaning that you do not have to specify an exact number for one of the dimensions in the reshape method.

- Pass -1 as the value, and NumPy will calculate this number for you.

In [35]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

newarr = arr.reshape(2, 2, -1)

print(newarr)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


In [36]:
#Convert the array into a 1D array:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

newarr = arr.reshape(-1)

print(newarr)

[1 2 3 4 5 6]


In [37]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
newarr = arr.reshape(6)
print(newarr)

[1 2 3 4 5 6]


In [62]:
arr1

array([[1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [9, 7, 6, 8, 9]])

In [56]:
arr1.reshape(5, 3)

array([[1, 2, 3],
       [4, 5, 2],
       [3, 4, 5],
       [6, 9, 7],
       [6, 8, 9]])

In [58]:
arr1.flatten() # conver into one dimensional and safe copy

array([1, 2, 3, 4, 5, 2, 3, 4, 5, 6, 9, 7, 6, 8, 9])

In [59]:
arr1.ravel() # conver into one dimensional and fast view

array([1, 2, 3, 4, 5, 2, 3, 4, 5, 6, 9, 7, 6, 8, 9])

In [63]:
arr1.T

array([[1, 2, 9],
       [2, 3, 7],
       [3, 4, 6],
       [4, 5, 8],
       [5, 6, 9]])

### Mathematical Operations

In [73]:
np.sqrt(a)

array([[1.38629436, 1.79175947],
       [2.07944154, 2.30258509]])

In [65]:
arr + 10

array([[12, 13],
       [19, 17]])

In [66]:
arr * 2

array([[ 4,  6],
       [18, 14]])

In [70]:
arr1 = ([4,6],[8,10])
arr2 = ([10,12],[14,16])
a= np.array(arr1)
b= np.array(arr1)
a+b

array([[ 8, 12],
       [16, 20]])

In [72]:
a.dot(b)

array([[ 64,  84],
       [112, 148]])

In [74]:
np.exp(a)


array([[   54.59815003,   403.42879349],
       [ 2980.95798704, 22026.46579481]])

In [75]:
np.log(a)

array([[1.38629436, 1.79175947],
       [2.07944154, 2.30258509]])

### Aggregation Functions

In [76]:
np.sum(a)

28

In [77]:
np.mean(a)


7.0

In [79]:
np.median(a) #Median of all values

7.0

In [83]:
np.std(a)

2.23606797749979

In [81]:
np.median(a, axis=0) #Column-wise median

array([6., 8.])

In [82]:
np.median(a, axis=1) #Row-wise median

array([5., 9.])

In [84]:
np.var(a)

5.0

In [85]:
np.min(a)

4

In [86]:
np.max(a)

10

In [89]:
np.sum(a, axis=0)  # column-wise

array([12, 16])

In [88]:
np.sum(a, axis=1)  # row-wise

array([10, 18])

In [90]:
a > 5

array([[False,  True],
       [ True,  True]])

In [91]:
a[a > 5]


array([ 6,  8, 10])

In [93]:
np.where(a > 5, 1, 0) #Performs conditional element-wise selection True = 1 and False = 0.

array([[0, 1],
       [1, 1]])

### Statistical Functions (Very Important)

In [96]:
np.percentile(a, 60) #Returns the 60th percentile of the data.


7.6

In [97]:
np.corrcoef(a, b)

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [98]:
np.unique(a)

array([ 4,  6,  8, 10])

### Linear Algebra



In [99]:
np.dot(a, b)

array([[ 64,  84],
       [112, 148]])

In [100]:
a @ b

array([[ 64,  84],
       [112, 148]])

In [101]:
np.linalg.det(a) #Determinant

-7.999999999999998

In [102]:
np.linalg.inv(a) #Inverse of matrix

array([[-1.25,  0.75],
       [ 1.  , -0.5 ]])

In [103]:
np.linalg.eig(a) #Eigenvalues & Eigenvectors

EigResult(eigenvalues=array([-0.54983444, 14.54983444]), eigenvectors=array([[-0.79681209, -0.49436913],
       [ 0.60422718, -0.86925207]]))

### Stack, Split & Join

#### Joining NumPy Arrays
- Joining means putting contents of two or more arrays in a single array.

- In SQL we join tables based on a key, whereas in NumPy we join arrays by axes.

- We pass a sequence of arrays that we want to join to the concatenate() function, along with the axis. If axis is not explicitly passed, it is taken as 0.

#### Joining Arrays Using Stack Functions
- Stacking is same as concatenation, the only difference is that stacking is done along a new axis.

- We can concatenate two 1-D arrays along the second axis which would result in putting them one over the other, ie. stacking.

- We pass a sequence of arrays that we want to join to the stack() method along with the axis. If axis is not explicitly passed it is taken as 0.

In [104]:
np.vstack((a, b)) #Stacks arrays row-wise (top to bottom) or column

array([[ 4,  6],
       [ 8, 10],
       [ 4,  6],
       [ 8, 10]])

In [105]:
np.hstack((a, b)) #Stacks arrays column-wise (side by side) or Row

array([[ 4,  6,  4,  6],
       [ 8, 10,  8, 10]])

In [106]:
np.concatenate((a, b), axis=0) #General stacking function axis=0 = same as vstack

array([[ 4,  6],
       [ 8, 10],
       [ 4,  6],
       [ 8, 10]])

In [49]:
#Join two arrays

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.concatenate((arr1, arr2))

print(arr)

[1 2 3 4 5 6]


In [50]:
#Join two 2-D arrays along rows (axis=1):

import numpy as np

arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=1)

print(arr)

[[1 2 5 6]
 [3 4 7 8]]


#### Stacking Along Height (depth)
- NumPy provides a helper function: dstack() to stack along height, which is the same as depth

In [51]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.dstack((arr1, arr2))

print(arr)

[[[1 4]
  [2 5]
  [3 6]]]


#### Splitting NumPy Arrays
- Splitting is reverse operation of Joining.

- Joining merges multiple arrays into one and Splitting breaks one array into multiple.

- We use array_split() for splitting arrays, we pass it the array we want to split and the number of splits.

In [108]:
arr = np.array([1, 2, 3, 4, 5, 6])
np.split(arr, 3)

[array([1, 2]), array([3, 4]), array([5, 6])]

In [53]:
#If the array has less elements than required, it will adjust from the end accordingly.
arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 4)

print(newarr)

[array([1, 2]), array([3, 4]), array([5]), array([6])]


In [54]:
#Access the splitted arrays:
arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)

print(newarr[0])
print(newarr[1])
print(newarr[2])

[1 2]
[3 4]
[5 6]


In [55]:
#Split the 2-D array into three 2-D arrays.
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])

newarr = np.array_split(arr, 3)

print(newarr)

[array([[1, 2],
       [3, 4]]), array([[5, 6],
       [7, 8]]), array([[ 9, 10],
       [11, 12]])]


In [56]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])

newarr = np.array_split(arr, 3)

print(newarr)

[array([[1, 2, 3],
       [4, 5, 6]]), array([[ 7,  8,  9],
       [10, 11, 12]]), array([[13, 14, 15],
       [16, 17, 18]])]


In [57]:
#Split the 2-D array into three 2-D arrays along columns.

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])

newarr = np.array_split(arr, 3, axis=1)

print(newarr)

[array([[ 1],
       [ 4],
       [ 7],
       [10],
       [13],
       [16]]), array([[ 2],
       [ 5],
       [ 8],
       [11],
       [14],
       [17]]), array([[ 3],
       [ 6],
       [ 9],
       [12],
       [15],
       [18]])]


In [58]:
#Use the hsplit() method to split the 2-D array into three 2-D arrays along columns.

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])

newarr = np.hsplit(arr, 3)

print(newarr)

[array([[ 1],
       [ 4],
       [ 7],
       [10],
       [13],
       [16]]), array([[ 2],
       [ 5],
       [ 8],
       [11],
       [14],
       [17]]), array([[ 3],
       [ 6],
       [ 9],
       [12],
       [15],
       [18]])]


- Note: Similar alternates to vstack() and dstack() are available as vsplit() and dsplit().

### Handling Missing / Special Values

In [109]:
np.nan

nan

In [110]:
np.isnan(arr)

array([False, False, False, False, False, False])

In [111]:
np.isinf(arr)

array([False, False, False, False, False, False])

In [112]:
np.nanmean(arr)

3.5

### Broadcasting 

In [113]:
arr = np.array([[1,2,3],[4,5,6]])
arr + np.array([10,20,30])


array([[11, 22, 33],
       [14, 25, 36]])

#### Some conditions very useful in Exploratory Data Analysis 

In [114]:
val=2

arr[arr<3]

array([1, 2])

In [115]:
# Create arrays and reshape

np.arange(0,10).reshape(5,2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [116]:
arr1=np.arange(0,10).reshape(2,5)

In [117]:
arr2=np.arange(0,10).reshape(2,5)

In [118]:
arr1 * arr2

array([[ 0,  1,  4,  9, 16],
       [25, 36, 49, 64, 81]])

In [1]:
import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

print(arr[0, 1, 2])

6


In [3]:
import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

print(arr[1, 1, 2])


12


In [None]:
# arr[0, 1, 2] prints the value 6.

# And this is why:

# The first number represents the first dimension, which contains two arrays:
# [[1, 2, 3], [4, 5, 6]]
# and:
# [[7, 8, 9], [10, 11, 12]]
# Since we selected 0, we are left with the first array:
# [[1, 2, 3], [4, 5, 6]]

# The second number represents the second dimension, which also contains two arrays:
# [1, 2, 3]
# and:
# [4, 5, 6]
# Since we selected 1, we are left with the second array:
# [4, 5, 6]

# The third number represents the third dimension, which contains three values:
# 4
# 5
# 6
# Since we selected 2, we end up with the third value:
# 6

## Data Types in NumPy
NumPy has some extra data types, and refer to data types with one character, like i for integers, u for unsigned integers etc.

Below is a list of all data types in NumPy and the characters used to represent them.

- i - integer
- b - boolean
- u - unsigned integer
- f - float
- c - complex float
- m - timedelta
- M - datetime
- O - object
- S - string
- U - unicode string
- V - fixed chunk of memory for other type ( void )


In [14]:
arr = np.array([1, 2, 3, 4])

print(arr.dtype)

int32


In [15]:
arr = np.array(['apple', 'banana', 'cherry'])

print(arr.dtype)

<U6


In [16]:
# We use the array() function to create arrays, this function can take an optional argument: dtype that allows us to define the expected data type of the array elements:
arr = np.array([1, 2, 3, 4], dtype='S')

print(arr)
print(arr.dtype)

[b'1' b'2' b'3' b'4']
|S1


| dtype   | Meaning        |
| ------- | -------------- |
| `'S'`   | Byte string    |
| `'U'`   | Unicode string |
| `int`   | Integer        |
| `float` | Decimal number |
| `b`     | byte string    |

- dtype='S' creates fixed-length byte strings, while dtype='U' creates fixed-length Unicode strings.
- dtype=str is internally treated as Unicode ('U').
- For text processing, dtype='U' or str is preferred.
- S1 incicates length of integer

In [17]:
#Create an array with data type 4 bytes integer:

import numpy as np

arr = np.array([1, 2, 3, 4], dtype='i4')

print(arr)
print(arr.dtype)

[1 2 3 4]
int32


In [18]:
#A non integer string like 'a' can not be converted to integer (will raise an error):

import numpy as np

arr = np.array(['a', '2', '3'], dtype='i')

ValueError: invalid literal for int() with base 10: 'a'

## Converting Data Type on Existing Arrays
1. The best way to change the data type of an existing array, is to make a copy of the array with the astype() method.

2. The astype() function creates a copy of the array, and allows you to specify the data type as a parameter.

3. The data type can be specified using a string, like 'f' for float, 'i' for integer etc. or you can use the data type directly like float for float and int for integer.

In [19]:
import numpy as np

arr = np.array([1.1, 2.1, 3.1])

newarr = arr.astype('i')

print(newarr)
print(newarr.dtype)

[1 2 3]
int32


In [20]:
arr = np.array([1.1, 2.1, 3.1])

newarr = arr.astype(int)

print(newarr)
print(newarr.dtype)

[1 2 3]
int32


In [21]:
# Change data type from integer to boolean:

import numpy as np

arr = np.array([1, 0, 3])

newarr = arr.astype(bool)

print(newarr)
print(newarr.dtype)

[ True False  True]
bool


## The Difference Between Copy and View
- The main difference between a copy and a view of an array is that the copy is a new array, and the view is just a view of the original array.

- The copy owns the data and any changes made to the copy will not affect original array, and any changes made to the original array will not affect the copy.

- The view does not own the data and any changes made to the view will affect the original array, and any changes made to the original array will affect the view.

In [22]:
#Make a copy, change the original array, and display both arrays:
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
arr[0] = 42

print(arr)
print(x)

[42  2  3  4  5]
[1 2 3 4 5]


In [23]:
#Make a view, change the original array, and display both arrays:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
arr[0] = 42

print(arr)
print(x)

[42  2  3  4  5]
[42  2  3  4  5]


### Check if Array Owns its Data
- As mentioned above, copies owns the data, and views does not own the data, but how can we check this?

- Every NumPy array has the attribute base that returns None if the array owns the data.

- Otherwise, the base  attribute refers to the original object.

In [24]:
## The copy returns None.
## The view returns the original array.
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

x = arr.copy()
y = arr.view()

print(x.base)
print(y.base)

None
[1 2 3 4 5]


## Shape of an Array
The shape of an array is the number of elements in each dimension.

### Get the Shape of an Array
NumPy arrays have an attribute called shape that returns a tuple with each index having the number of corresponding elements.

In [25]:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print(arr.shape)

(2, 4)


- Create an array with 5 dimensions using ndmin using a vector with values 1,2,3,4 and verify that last dimension has value 4:

In [28]:
arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print('shape of array :', arr.shape)

[[[[[1 2 3 4]]]]]
shape of array : (1, 1, 1, 1, 4)


- (.ndim) returns the number of dimensions (axes) of the array, while (.shape) returns a tuple representing the size of the array along each dimension.

#### 2D Array

In [31]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) 
print(arr.ndim)

(2, 3)
2


#### 3D Array

In [30]:
arr = np.array([[[1,2],[3,4]]])
print(arr.ndim)
print(arr.shape)

3
(1, 2, 2)


1. shape(1,2,2)
- 1 blocks

- Each block me 2 rows

- Each row me 2 columns

### Iterating Arrays
- Iterating means going through elements one by one.

- As we deal with multi-dimensional arrays in numpy, we can do this using basic for loop of python.

- If we iterate on a 1-D array it will go through each element one by one.

In [38]:
#Iterate on the elements of the following 1-D array:
arr = np.array([1, 2, 3])

for x in arr:
    print(x)

1
2
3


In [39]:
#Iterate on the elements of the following 2-D array:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
    print(x)

[1 2 3]
[4 5 6]


In [40]:
#To return the actual values, the scalars, we have to iterate the arrays in each dimension.



import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
    for y in x:
        print(y)

1
2
3
4
5
6


In [41]:
# Iterate on the elements of the following 3-D array:

import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
    print(x)

[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]


In [42]:
#Iterate down to the scalars:

import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
    for y in x:
        for z in y:
            print(z)

1
2
3
4
5
6
7
8
9
10
11
12


### Iterating Arrays Using nditer()
- The function **nditer()** is a helping function that can be used from very basic to very advanced iterations. It solves some basic issues which we face in iteration, lets go through it with examples.

#### Iterating on Each Scalar Element
- In basic for loops, iterating through each scalar of an array we need to use n for loops which can be difficult to write for arrays with very high dimensionality.

In [43]:
# Iterate through the following 3-D array:

import numpy as np

arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

for x in np.nditer(arr):
  print(x)

1
2
3
4
5
6
7
8


### Iterating Array With Different Data Types
We can use **op_dtypes** argument and pass it the expected datatype to change the datatype of elements while iterating.

NumPy does not change the data type of the element in-place (where the element is in array) so it needs some other space to perform this action, that extra space is called buffer, and in order to enable it in **nditer() we pass flags=['buffered']**.

In [45]:
#Iterate through the array as a string:

import numpy as np

arr = np.array([1, 2, 3])

for x in np.nditer(arr, flags=['buffered'], op_dtypes=['S']):
    print(x)

b'1'
b'2'
b'3'


In [46]:
#Iterate through every scalar element of the 2D array skipping 1 element:

import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for x in np.nditer(arr[:, ::2]):
    print(x)

1
3
5
7


### Enumerated Iteration Using ndenumerate()
- Enumeration means mentioning sequence number of somethings one by one.

- Sometimes we require corresponding index of the element while iterating, the ndenumerate() method can be used for those usecases.

In [47]:
#Enumerate on following 1D arrays elements:

import numpy as np

arr = np.array([1, 2, 3])

for idx, x in np.ndenumerate(arr):
    print(idx, x)

(0,) 1
(1,) 2
(2,) 3


In [48]:
#Enumerate on following 2D array's elements:

import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for idx, x in np.ndenumerate(arr):
    print(idx, x)

(0, 0) 1
(0, 1) 2
(0, 2) 3
(0, 3) 4
(1, 0) 5
(1, 1) 6
(1, 2) 7
(1, 3) 8


### Searching Arrays
- You can search an array for a certain value, and return the indexes that get a match.

- To search an array, use the where() method.

In [59]:
arr = np.array([1, 2, 3, 4, 5, 4, 4])

x = np.where(arr == 4)

print(x)

(array([3, 5, 6], dtype=int64),)


In [60]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

x = np.where(arr%2 == 0)

print(x)

(array([1, 3, 5, 7], dtype=int64),)


In [61]:
arr = np.array([6, 7, 8, 9])

x = np.searchsorted(arr, 7)

print(x)

1


In [62]:
#Find the indexes where the value 7 should be inserted, starting from the right:
arr = np.array([6, 7, 8, 9])

x = np.searchsorted(arr, 7, side='right')

print(x)

2


In [63]:
arr = np.array([1, 3, 5, 7])

x = np.searchsorted(arr, [2, 4, 6])

print(x)

[1 2 3]


The return value is an array: [1 2 3] containing the three indexes where 2, 4, 6 would be inserted in the original array to maintain the order

### Sorting Arrays
- Sorting means putting elements in an ordered sequence.

- Ordered sequence is any sequence that has an order corresponding to elements, like numeric or alphabetical, ascending or descending.

- The NumPy ndarray object has a function called sort(), that will sort a specified array.

In [64]:
arr = np.array([3, 2, 0, 1])

print(np.sort(arr))

[0 1 2 3]


- Note: This method returns a copy of the array, leaving the original array unchanged.

In [65]:
arr = np.array(['banana', 'cherry', 'apple'])

print(np.sort(arr))

['apple' 'banana' 'cherry']


In [66]:
arr = np.array([True, False, True])

print(np.sort(arr))

[False  True  True]


In [67]:
arr = np.array([[3, 2, 4], [5, 0, 1]])

print(np.sort(arr))

[[2 3 4]
 [0 1 5]]


### Filtering Arrays
- Getting some elements out of an existing array and creating a new array out of them is called filtering.

- In NumPy, you filter an array using a boolean index list.

    - A boolean index list is a list of booleans corresponding to indexes in the array.

- If the value at an index is True that element is contained in the filtered array, if the value at that index is False that element is excluded from the filtered array.

In [68]:
arr = np.array([41, 42, 43, 44])

x = [True, False, True, False]

newarr = arr[x]

print(newarr)

[41 43]


- Because the new array contains only the values where the filter array had the value True, in this case, index 0 and 2.

In [69]:
import numpy as np

arr = np.array([41, 42, 43, 44])

# Create an empty list
filter_arr = []

# go through each element in arr
for element in arr:
    
  # if the element is higher than 42, set the value to True, otherwise False:
    if element > 42:
        filter_arr.append(True)
    else:
        filter_arr.append(False)

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False, False, True, True]
[43 44]


In [71]:
arr = np.array([1, 2, 3, 4, 5, 6, 7])

# Create an empty list
filter_arr = []

# go through each element in arr
for element in arr:
  # if the element is completely divisble by 2, set the value to True, otherwise False
    if element % 2 == 0:
        filter_arr.append(True)
    else:
        filter_arr.append(False)

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False, True, False, True, False, True, False]
[2 4 6]


In [72]:
import numpy as np

arr = np.array([41, 42, 43, 44])

filter_arr = arr > 42

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False False  True  True]
[43 44]


### Random Numbers in NumPy

In [73]:
# Generate a random integer from 0 to 100:
from numpy import random

x = random.randint(100)

print(x)

70


In [75]:
#Generate a random float from 0 to 1:

from numpy import random

x = random.rand()

print(x)

0.6277402374603388


In [77]:
#Generate a 1-D array containing 5 random integers from 0 to 100:

from numpy import random

x=random.randint(100, size=(5))

print(x)

[33 70 52  4 72]


In [79]:
#Generate a 2-D array with 3 rows, each row containing 5 random integers from 0 to 100:

from numpy import random

x = random.randint(100, size=(3, 5))

print(x)

[[73 60 32  3 18]
 [37 73 41 12 23]
 [87 76 71 57 83]]


In [80]:
#Generate a 1-D array containing 5 random floats:

from numpy import random

x = random.rand(5)

print(x)

[0.32794916 0.90136115 0.76708027 0.33391183 0.6087352 ]


In [82]:
#Generate a 1-D array containing 5 random floats:

from numpy import random

x = random.rand(3,5)

print(x)

[[0.09482498 0.29751199 0.1428368  0.73461042 0.44525366]
 [0.27793656 0.63339796 0.08207848 0.44897731 0.85200079]
 [0.79571343 0.71077341 0.21161842 0.58289421 0.10019994]]


In [83]:
#Return one of the values in an array:

from numpy import random

x = random.choice([3, 5, 7, 9])

print(x)

5


In [84]:
#Generate a 2-D array that consists of the values in the array parameter (3, 5, 7, and 9):

from numpy import random

x = random.choice([3, 5, 7, 9], size=(3, 5))

print(x)

[[7 9 7 3 9]
 [3 7 9 9 3]
 [7 5 5 7 7]]


### What is Data Distribution?
- Data Distribution is a list of all possible values, and how often each value occurs.

- Such lists are important when working with statistics and data science.

- The random module offer methods that returns randomly generated data distributions.



#### Random Distribution
- A random distribution is a set of random numbers that follow a certain probability density function.

In [85]:
# Generate a 1-D array containing 100 values, where each value has to be 3, 5, 7 or 9.

# The probability for the value to be 3 is set to be 0.1

# The probability for the value to be 5 is set to be 0.3

# The probability for the value to be 7 is set to be 0.6

# The probability for the value to be 9 is set to be 0

from numpy import random

x = random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(100))

print(x)

[7 7 5 7 7 7 7 3 5 7 5 7 7 5 7 7 7 7 7 7 5 7 5 5 5 7 7 5 7 7 5 7 7 7 7 7 7
 5 5 7 5 3 7 7 7 5 3 5 7 5 7 7 5 7 5 7 3 7 7 7 7 7 7 5 3 7 7 7 5 7 5 7 5 7
 7 7 5 3 5 5 5 5 5 7 7 3 5 5 7 3 5 5 5 5 3 5 3 7 5 7]


In [86]:
from numpy import random

x = random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(3, 5))

print(x)

[[7 7 5 3 7]
 [7 7 7 5 3]
 [7 7 3 7 7]]


- Here 9 will never occur because rpobability of 9 is 0.0

### Random Permutations of Elements
- A permutation refers to an arrangement of elements. e.g. [3, 2, 1] is a permutation of [1, 2, 3] and vice-versa.

- The NumPy Random module provides two methods for this: shuffle() and permutation().

#### Shuffling Arrays
- Shuffle means changing arrangement of elements in-place. i.e. in the array itself.

In [87]:
from numpy import random
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

random.shuffle(arr)

print(arr)

[5 4 3 1 2]


In [88]:
from numpy import random
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(random.permutation(arr))

[2 1 3 4 5]


### NumPy ufuncs


#### What are ufuncs?
- ufuncs stands for "Universal Functions" and they are NumPy functions that operate on the ndarray object.

#### Why use ufuncs?
- ufuncs are used to implement vectorization in NumPy which is way faster than iterating over elements.

- They also provide broadcasting and additional methods like reduce, accumulate etc. that are very helpful for computation.

ufuncs also take additional arguments, like:

- **where** boolean array or condition defining where the operations should take place.

- **dtype** defining the return type of elements.

- **out** output array where the return value should be copied.

#### What is Vectorization?
- Converting iterative statements into a vector based operation is called vectorization.

- It is faster as modern CPUs are optimized for such operations.

In [91]:
#Without ufunc, we can use Python's built-in zip() method:

x = [1, 2, 3, 4]
y = [4, 5, 6, 7]
z = []

for i, j in zip(x, y):
    z.append(i + j)
print(z)

[5, 7, 9, 11]


In [92]:
#NumPy has a ufunc for this, called add(x, y) that will produce the same result.
#With ufunc, we can use the add() function:

import numpy as np

x = [1, 2, 3, 4]
y = [4, 5, 6, 7]
z = np.add(x, y)

print(z)

[ 5  7  9 11]


In [93]:
# #Create your own ufunc for addition:
# function - the name of the function. - myadd
# inputs - the number of input arguments (arrays). -2
# outputs - the number of output arrays.- 1
import numpy as np

def myadd(x, y):
    return x+y

myadd = np.frompyfunc(myadd, 2, 1)

print(myadd([1, 2, 3, 4], [5, 6, 7, 8]))

[6 8 10 12]


### Rounding Decimals
There are primarily five ways of rounding off decimals in NumPy:

- truncation
- fix
- rounding
- floor
- ceil


In [94]:
arr = np.trunc([-3.1666, 3.6667])

print(arr)

[-3.  3.]


In [95]:
arr = np.fix([-3.1666, 3.6667])

print(arr)

[-3.  3.]


In [96]:
#Round off 3.1666 to 2 decimal places:

import numpy as np

arr = np.around(3.1666, 2)

print(arr)

3.17


In [97]:
#The floor() function rounds off decimal to nearest lower integer.


import numpy as np

arr = np.floor([-3.1666, 3.6667])

print(arr)

[-4.  3.]


In [98]:
import numpy as np

arr = np.ceil([-3.1666, 3.6667])

print(arr)

[-3.  4.]


### Logs
NumPy provides functions to perform log at the base 2, e and 10.

We will also explore how we can take log for any base by creating a custom ufunc.

All of the log functions will place -inf or inf in the elements if the log can not be computed.

In [101]:
#Find log at base 2 of all elements of following array:

import numpy as np

arr = np.arange(1, 11)

print(np.log2(arr))

[0.         1.         1.5849625  2.         2.32192809 2.5849625
 2.80735492 3.         3.169925   3.32192809]


In [102]:
#Find log at base 10 of all elements of following array:

import numpy as np

arr = np.arange(1, 10)

print(np.log10(arr))

[0.         0.30103    0.47712125 0.60205999 0.69897    0.77815125
 0.84509804 0.90308999 0.95424251]


In [103]:
#Find log at base e of all elements of following array:

import numpy as np

arr = np.arange(1, 10)

print(np.log(arr))


[0.         0.69314718 1.09861229 1.38629436 1.60943791 1.79175947
 1.94591015 2.07944154 2.19722458]


- NumPy does not provide any function to take log at any base, so we can use the frompyfunc() function along with inbuilt function math.log() with two input parameters and one output parameter:

In [105]:
from math import log
import numpy as np

nplog = np.frompyfunc(log, 2, 1)

print(nplog(100, 15))


1.7005483074552052


### NumPy Summations

In [106]:
import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 2, 3])

newarr = np.add(arr1, arr2)

print(newarr)

[2 4 6]


In [107]:
# Sum the values in arr1 and the values in arr2:

import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 2, 3])

newarr = np.sum([arr1, arr2])

print(newarr)

12


In [108]:
#Perform summation in the following array over 1st axis:

import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 2, 3])

newarr = np.sum([arr1, arr2], axis=1)

print(newarr)

[6 6]


#### Cummulative Sum
- Cummulative sum means partially adding the elements in array.

- E.g. The partial sum of [1, 2, 3, 4] would be [1, 1+2, 1+2+3, 1+2+3+4] = [1, 3, 6, 10].

- Perfom partial sum with the cumsum() function.

In [109]:
arr = np.array([1, 2, 3])

newarr = np.cumsum(arr)

print(newarr)

[1 3 6]


In [110]:
#Find the product of the elements of this array:

import numpy as np

arr = np.array([1, 2, 3, 4])

x = np.prod(arr)

print(x)


24


In [111]:
#Find the product of the elements of two arrays:

import numpy as np

arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])

x = np.prod([arr1, arr2])

print(x)

40320


In [112]:
#Perform summation in the following array over 1st axis:

import numpy as np

arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])

newarr = np.prod([arr1, arr2], axis=1)

print(newarr)


[  24 1680]


In [113]:
#Take cummulative product of all elements for following array:

import numpy as np

arr = np.array([5, 6, 7, 8])

newarr = np.cumprod(arr)

print(newarr)

[   5   30  210 1680]


### Finding LCM (Lowest Common Multiple)
The Lowest Common Multiple is the smallest number that is a common multiple of two numbers.

In [114]:
num1 = 4
num2 = 6

x = np.lcm(num1, num2)

print(x)

12


In [115]:
arr = np.array([3, 6, 9])

x = np.lcm.reduce(arr)

print(x)

18


In [116]:
#Find the LCM of all values of an array where the array contains all integers from 1 to 10:

import numpy as np

arr = np.arange(1, 11)

x = np.lcm.reduce(arr)

print(x)

2520


### Finding GCD (Greatest Common Divisor)
The GCD (Greatest Common Divisor), also known as HCF (Highest Common Factor) is the biggest number that is a common factor of both of the numbers.

In [117]:
num1 = 6
num2 = 9

x = np.gcd(num1, num2)

print(x)

3


In [118]:
arr = np.array([20, 8, 32, 36, 16])

x = np.gcd.reduce(arr)

print(x)

4


### Trigonometric Functions
NumPy provides the ufuncs sin(), cos() and tan() that take values in radians and produce the corresponding sin, cos and tan values.

In [119]:
x = np.sin(np.pi/2)

print(x)

1.0


In [120]:
arr = np.array([np.pi/2, np.pi/3, np.pi/4, np.pi/5])

x = np.sin(arr)

print(x)

[1.         0.8660254  0.70710678 0.58778525]


In [121]:
#Convert all of the values in following array arr to radians:

import numpy as np

arr = np.array([90, 180, 270, 360])

x = np.deg2rad(arr)

print(x)

[1.57079633 3.14159265 4.71238898 6.28318531]


In [122]:
#Convert all of the values in following array arr to degrees:

import numpy as np

arr = np.array([np.pi/2, np.pi, 1.5*np.pi, 2*np.pi])

x = np.rad2deg(arr)

print(x)

[ 90. 180. 270. 360.]


#### Finding Angles
- Finding angles from values of sine, cos, tan. E.g. sin, cos and tan inverse (arcsin, arccos, arctan).

- NumPy provides ufuncs arcsin(), arccos() and arctan() that produce radian values for corresponding sin, cos and tan values given.

In [123]:
x = np.arcsin(1.0)

print(x)

1.5707963267948966


In [124]:
arr = np.array([1, -1, 0.1])

x = np.arcsin(arr)

print(x)

[ 1.57079633 -1.57079633  0.10016742]


#### Hypotenues
Finding hypotenues using pythagoras theorem in NumPy.

- NumPy provides the hypot() function that takes the base and perpendicular values and produces hypotenues based on pythagoras theorem.

In [125]:
base = 3
perp = 4

x = np.hypot(base, perp)

print(x)

5.0


### Hyperbolic Functions
- NumPy provides the ufuncs sinh(), cosh() and tanh() that take values in radians and produce the corresponding sinh, cosh and tanh values..

In [126]:
x = np.sinh(np.pi/2)

print(x)

2.3012989023072947


In [127]:
arr = np.array([np.pi/2, np.pi/3, np.pi/4, np.pi/5])

x = np.cosh(arr)

print(x)

[2.50917848 1.60028686 1.32460909 1.20397209]


In [128]:
#Find the angle of 1.0:

import numpy as np

x = np.arcsinh(1.0)

print(x)

0.881373587019543


In [129]:
#Find the angle for all of the tanh values in array:

import numpy as np

arr = np.array([0.1, 0.2, 0.5])

x = np.arctanh(arr)

print(x)

[0.10033535 0.20273255 0.54930614]


### NumPy Set Operations
- A set in mathematics is a collection of unique elements.

- Sets are used for operations involving frequent intersection, union and difference operations.

#### Create Sets in NumPy
We can use NumPy's unique() method to find unique elements from any array. E.g. create a set array, but remember that the set arrays should only be 1-D arrays.

In [130]:
arr = np.array([1, 1, 1, 2, 3, 4, 5, 5, 6, 7])

x = np.unique(arr)

print(x)

[1 2 3 4 5 6 7]


In [131]:
#To find the unique values of two arrays, use the union1d() method.

#Find union of the following two set arrays:

import numpy as np

arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([3, 4, 5, 6])

newarr = np.union1d(arr1, arr2)

print(newarr)

[1 2 3 4 5 6]


In [132]:
#Find intersection of the following two set arrays:

import numpy as np

arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([3, 4, 5, 6])

newarr = np.intersect1d(arr1, arr2, assume_unique=True)

print(newarr)

[3 4]


In [133]:
#Find the difference of the set1 from set2:

import numpy as np

set1 = np.array([1, 2, 3, 4])
set2 = np.array([3, 4, 5, 6])

newarr = np.setdiff1d(set1, set2, assume_unique=True)

print(newarr)

[1 2]


In [134]:
#Find the symmetric difference of the set1 and set2:

import numpy as np

set1 = np.array([1, 2, 3, 4])
set2 = np.array([3, 4, 5, 6])

newarr = np.setxor1d(set1, set2, assume_unique=True)

print(newarr)

[1 2 5 6]
