<a href="https://colab.research.google.com/github/Hatsuhinode/Feature-Engineering/blob/main/Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Arrays in Numpy

Array in Numpy is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers

 An array class in Numpy is called as **ndarray**.

---

In [None]:
import numpy as np

---

## Creating Numpy Array

##### In Numpy, number of dimensions of the array is called **rank of the array**.

---

### np.array

#### Rank 1 Array

In [None]:
rank1Array = np.array([10,20,30])
rank1Array

array([10, 20, 30])

In [None]:
type(rank1Array)

numpy.ndarray

In [None]:
print('The dimension of array is : ',rank1Array.ndim)

The dimension of array is :  1


---

#### Rank 2 Array

In [None]:
rank2Array = np.array([[10,20,30],
                       [50,60,70],
                       [80,90,100]])
rank2Array

array([[ 10,  20,  30],
       [ 50,  60,  70],
       [ 80,  90, 100]])

In [None]:
type(rank2Array)

numpy.ndarray

In [None]:
print('The dimension of array is : ',rank2Array.ndim)

The dimension of array is :  2


---

#### Rank 3 Array

In [None]:
rank3Array = np.array([[[10.,20],[30,40]],
                      [[40,50],[60,70]],
                       [[80,90],[100,110]]])
rank3Array

array([[[ 10.,  20.],
        [ 30.,  40.]],

       [[ 40.,  50.],
        [ 60.,  70.]],

       [[ 80.,  90.],
        [100., 110.]]])

**Axis 0**: This is the first dimension and often represents different samples or instances in the data.

**Axis 1**: The second dimension, often denoting features or attributes for each sample.

**Axis 2**: The third dimension, when dealing with a 3D array, represents additional structure or depth in the data.

In [None]:
type(rank3Array)

numpy.ndarray

In [None]:
print('The dimension of array is : ',rank3Array.ndim)

The dimension of array is :  3


---

#### Boolean array

In [None]:
boolArray1 = np.array([True, False, False, True])
boolArray1

array([ True, False, False,  True])

In [None]:
boolArray1.dtype

dtype('bool')

---

### np.random.choice

In [None]:
boolArray2 = np.random.choice([True, False], size=4)
boolArray2

array([ True,  True,  True,  True])

With the help of **choice()** method, we can get the random samples of one dimensional array and return the random samples of numpy array.

---

### Boolean array based on condition

In [None]:
Narray = np.array([1, 2, 3, 4, 5])
result = Narray > 2

In [None]:
print(result)
print(type(result))
print(result.dtype)

[False False  True  True  True]
<class 'numpy.ndarray'>
bool


---

### np.random.randint

#### Generating random integer array

In [None]:
 # Creating 1D array of 9 random integers between  0 (inclusive) and 50 (exclusive)
intArray = np.random.randint(0,50,9)

print(intArray)
print(type(intArray))

[45  8 38 39 34 31 36  4 28]
<class 'numpy.ndarray'>


In [None]:
 # Creating array of given shape between 0 and 50
shapeArray = np.random.randint(1, 9, (3, 2, 6))
shapeArray

array([[[6, 2, 2, 5, 3, 6],
        [7, 5, 3, 2, 8, 2]],

       [[7, 5, 3, 7, 8, 4],
        [6, 4, 4, 2, 7, 5]],

       [[1, 2, 3, 2, 8, 5],
        [1, 6, 6, 4, 6, 5]]])

---

### np.random.random

#### Generating random float array

In [None]:
floatArray = np.random.random(5)
# Generating an array of 5 random floats between 0 and 1.
print(floatArray)
print(type(floatArray))

[0.62283401 0.73995152 0.41866343 0.36761763 0.16902881]
<class 'numpy.ndarray'>


---

### np.random.normal

#### Generating array of numbers from normal distribution

In [None]:
NormalArray = np.random.normal(0, 10, 5)
# Generating an array of 5 random numbers from normal distribution.

NormalArray

array([ 5.16084037,  4.5539286 ,  5.93686203,  3.70506334, 13.45378072])

In the above code **np.random.normal**(**0**, **10**, 5):

- 0 is the mean (center) of the distribution. In this case, the mean is 0.
- 10 is the standard deviation, which determines the spread of the distribution

---

## Shape of Numpy Array

In [None]:
NumpArray1 = np.array([[10,20,30],
                    [50,60,70],
                    [80,90,100],
                    [11,22,33]])
print('The shape of array is : ',NumpArray1.shape)

The shape of array is :  (4, 3)


---

## Size of Numpy Array

In [None]:
NumpArray2 = np.array([[10,20,30],
                    [50,60,70],
                    [80,90,100],
                    [11,22,33]])
print('The size of array is : ',NumpArray1.size)

The size of array is :  12


---

### Conversion of Numpy Array

In [None]:
NumpArray3 = np.arange(1, 12, 2)
print(NumpArray3)
print(type(NumpArray3))
print(NumpArray3.shape)

[ 1  3  5  7  9 11]
<class 'numpy.ndarray'>
(6,)


##### **np.arange(start,end,step size)** generates numpy array such that the array include 'start' but 'stop' is exclusive. Step size is the difference between consecutive numbers in the array

In [None]:
reshapedArr3_1 = NumpArray3.reshape(2, 3)
print(reshapedArr3_1)
print(reshapedArr3_1.shape)

[[ 1  3  5]
 [ 7  9 11]]
(2, 3)


In [None]:
reshapedArr3_2 = NumpArray3.reshape(3, 2)
print(reshapedArr3_2)
print(reshapedArr3_2.shape)

[[ 1  3]
 [ 5  7]
 [ 9 11]]
(3, 2)


---

## Indexing in Numpy

In [None]:
# Rank 2 numpy array
nArray1 = np.array([[10,20,30],
                    [50,60,70],
                    [80,90,100]])

In [None]:
# Accessing 70
nArray1[(1,2)]
# (rowNumber, columnNumber)
# Index starts from 0.

70

In [None]:
# Rank 3 numpy array
nArray2 = np.array([[[10,20],[30,40]],
                      [[40,50],[60,70]],
                       [[80,90],[100,110]]])

In [None]:
# Accessing 100
nArray2[(2,1,0)]

100

---

## Slicing in Numpy

In [None]:
# Rank 2 numpy array
nArray3 = np.array([[10,20,30],
                    [50,60,70],
                    [80,90,100]])

In [None]:
# Accessing [10,30]
          # [50,70]


nArray3[:2 , ::2]
# (rowNumber, columnNumber)
# Index starts from 0.

array([[10, 30],
       [50, 70]])

In the above code **nArray3**[**:2** , **::2**]

**:2** in the first position indicates that you're selecting rows up to, but not including, index 2 (from the start of the array).

**::2** in the second position indicates that you're selecting columns with a step size of 2. This means you'll select every other element along the columns starting from the beginning.

In [None]:
# Rank 2 numpy array
nArray4 = np.array([[10,20,30],
                    [50,60,70],
                    [80,90,100],
                    [11,22,33]])

In [None]:
# Accessing [10,30]
          # [11,33]

nArray4[::3 , ::2]

# 3 and 2 represents step size.

array([[10, 30],
       [11, 33]])

In [None]:
# Rank 3 numpy array
nArray5 = np.array([[[10,20],[30,40]],
                      [[40,50],[60,70]],
                       [[80,90],[100,110]]])

In [None]:
# Accessing 100
nArray5[(2,1,0)]

100

In [None]:
# Rank 2 numpy array
nArray6 = np.array([[10,20,30,11,44,77],
                    [50,60,70,22,55,88],
                    [80,90,100,33,66,99],
                    [1,2,3,4,5,6]])

In [None]:
# Accessing [10,30,11,77]
          # [50,70,22,88]
          # [1,3,4,6]

nArray6[[0, 1, 3]][:, [0, 2, 3, 5]]

array([[10, 30, 11, 77],
       [50, 70, 22, 88],
       [ 1,  3,  4,  6]])

---

## Data types in Numpy

In [None]:
nArray7 = np.array([10, 20])
print('The datatype of numpy array is : ')
print(nArray7.dtype)

The datatype of numpy array is : 
int64


In [None]:
nArray7 = np.array([1.0, 2.0])
print('The datatype of numpy array is : ')
print(nArray7.dtype)

The datatype of numpy array is : 
float64


In [None]:
nArray8 = np.array([1, 2], dtype = np.float64)
print(nArray8.dtype)

float64


In [None]:
nArray8

array([1., 2.])

**int64** and **float64** are specific data types provided by NumPy to represent integers and floating-point numbers respectively, but with a **specific size** (in this case, **64 bits** or **8 bytes**).

In **Python**, **int** and **float** can **vary in size** based on the platform and are typically of fixed precision.
**NumPy's** **int64** and **float64** provide specific precision.


---

## Searching in Numpy

### np.where

In [None]:
nArray9 = np.array([10,20,30,40,50,60])

In [None]:
index9 = np.where(nArray9 == 40)
index9

(array([3]),)

This is a **tuple with one element**. That single element is a **NumPy array**.

Inside the tuple, there's a NumPy array with one element, which is 3.

In [None]:
index9[0]

array([3])

In [None]:
index9[0].dtype

dtype('int64')

---

In [None]:
nArray10 = np.array([[10,20,30,11,44,77],
                    [50,60,70,22,55,88],
                    [80,90,100,33,66,99],
                    [1,2,3,4,5,6]])

In [None]:
index10 = np.where(nArray10 == 33)
index10

(array([2]), array([3]))

####np.where returns a tuple of arrays (which are **instances of ndarray**). These arrays are NumPy arrays (**ndarray objects**) that contain the indices where the specified condition is met in the input array.

---

In [None]:
nArray11 = np.array([[10,20,33,11,44,77],
                    [50,60,70,22,55,88],
                    [80,90,100,33,66,99],
                    [1,2,3,4,5,6]])

In [None]:
index11 = np.where(nArray11 == 33)
index11

(array([0, 2]), array([2, 3]))

Variable **index11** is an iterable with the index of our searched value as the first element.

In [None]:
for numbers in index11[0] :
    print(numbers)

0
2


In [None]:
index11[1].dtype

dtype('int64')

##### The exact shape and number of arrays in the tuple returned by **np.where** depends on the dimensionality and the number of conditions applied to the input array.

---

## Statistical operations

 ### Calculating Mean, Median, and Standard Deviation

#### Mean

In [None]:
array1 = np.random.randint(0, 10, 5)
array1

array([7, 4, 1, 8, 8])

In [None]:
meanValue1 = np.mean(array1)
print('The mean of array is : ', meanValue1)

The mean of array is :  2.2


---

In [None]:
np.random.seed(11)
array2 = np.random.randint(0, 10, 5)
array2

array([9, 0, 1, 7, 1])

In [None]:
meanValue2 = np.mean(array2)
print('The mean of array is : ', meanValue2)

The mean of array is :  3.6


##### np.random.seed() is used to set the seed for the NumPy random number generator. It initializes the random number generator, allowing you to **produce reproducible results**. **When you set the seed to a specific value, the sequence of random numbers generated will be the same every time you run your code**.



In [None]:
np.random.seed(11)
array2_1 = np.random.randint(0, 10, 5)
array2_1

array([9, 0, 1, 7, 1])

---

In [None]:
array3 = np.random.randint(0, 10, (2,2))
array3

array([[7, 2],
       [8, 0]])

In [None]:
meanValue3 = np.mean(array3)
print('The mean of array is : ', meanValue3)

The mean of array is :  4.25


#### Median

In [None]:
array4 = np.random.randint(0, 10,5)
array4

array([0, 4, 2, 1, 5])

In [None]:
medianValue4 = np.median(array4)
print('The median of array is : ', medianValue4)

The median of array is :  2.0


---

In [None]:
array5 = np.random.randint(0, 10, (2,2))
array5

array([[5, 7],
       [4, 1]])

In [None]:
medianValue5 = np.median(array5)
print('The median of array is : ', medianValue5)

The median of array is :  4.5


---

#### Standard deviation

In [None]:
array6 = np.random.randint(0, 10,5)
array6

array([8, 8, 1, 3, 6])

In [None]:
sdValue6 = np.std(array6)
print('The standard deviation of array is : ', sdValue6)

The standard deviation of array is :  2.785677655436824


---

### Finding Min, Max, and Sum

In [None]:
array7 = np.array([5, 10, 15, 20, 25, -10])
array7

array([  5,  10,  15,  20,  25, -10])

In [None]:
minValue = np.min(array7)
maxValue = np.max(array7)
totalSum = np.sum(array7)

In [None]:
print('The minimum value is : ', minValue)
print('The maximum value is : ', maxValue)
print('The total sum is : ', totalSum)

The minimum value is :  -10
The maximum value is :  25
The total sum is :  65


---

## Sorting Numpy Array

In [None]:
array8 = np.array([3, 1, 6, 5, 4])
sortedArray8 = np.sort(array8)

In [None]:
print('The unsorted array is : ')
print(array8)
print('The sorted array is : ')
print(sortedArray8)

The unsorted array is : 
[3 1 6 5 4]
The sorted array is : 
[1 3 4 5 6]


In [None]:
# Sorting in descending order
DecSortedArray8 = np.sort(array8)[::-1]

**[::-1]**: It a slice notation used to reverse the sorted array obtained from **np.sort(array8)**

**The step value, indicating how many elements to move forward for each step. When set to -1, it reverses the sequence.**


In [None]:
print('The sorted array in descending order is : ')
DecSortedArray8

The sorted array in descending order is : 


array([6, 5, 4, 3, 1])

---

## Deleting elements from Numpy Array

In [None]:
array9 = np.array([1, 2, 3, 4, 5])
array9

array([1, 2, 3, 4, 5])

In [None]:
newArray9 = np.delete(array9, 2)   # 2 is an index
newArray9

array([1, 2, 4, 5])

---

In [None]:
array10 = np.array([[1, 2, 3, 4, 5],
                  [10,20,30,40,50],
                  [11,22,33,44,55]])
array10

array([[ 1,  2,  3,  4,  5],
       [10, 20, 30, 40, 50],
       [11, 22, 33, 44, 55]])

In [None]:
array10.shape

(3, 5)

In [None]:
newArray10 = np.delete(array10, 1,axis = 1)
newArray10

array([[ 1,  3,  4,  5],
       [10, 30, 40, 50],
       [11, 33, 44, 55]])

##### **np.delete**(array10, **1**,**axis = 1**) is used for deleting elements **at index 1**  along **axis 1 (columns).**

In [None]:
newArray11 = np.delete(array10, 1,axis = 0)
newArray11

array([[ 1,  2,  3,  4,  5],
       [11, 22, 33, 44, 55]])

---