<a href="https://colab.research.google.com/github/sahanal2603/Data-Science-and-Analytics/blob/master/Practice_Numpy_Lab_2_%5BNumpy%5D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# This content is taken directly from the Tutorial session of the TEQIP III sponsored Faculty Development program, conducted by IIT Ghandinagar, during 11 to 23 January, 2021

# **Importing NumPy**
We'll start with the standard NumPy import, under the alias *'np'*.

In [2]:
import numpy as np

# **Creating Arrays from Python Lists**
Use np.array to create arrays from Python lists.

In [None]:
# integer array:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

Unlike Python lists, NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible.

In [None]:
#Here, integers are up-cast to floating point:
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

In [None]:
#Use the 'dtype' keyword to explicitly set data type of the resulting array:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

In [None]:
a=[]
for i in [2,4,6]:
    temp=np.arange(i,i+4)
    a.append(temp)


print(len(a))
print(a)

3
[array([2, 3, 4, 5]), array([4, 5, 6, 7]), array([6, 7, 8, 9])]


In [None]:
#Initializing a multidimensional array using a list of lists:
b=np.array([range(i, i + 4) for i in [2, 4, 6]])
len(b)
print(b)
print(b[0,2])

[[2 3 4 5]
 [4 5 6 7]
 [6 7 8 9]]
4


# **Creating Arrays from Scratch**

In [None]:
# Create a length-5 integer array filled with zeros
np.zeros(5, dtype=int)

array([0, 0, 0, 0, 0])

In [None]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [None]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [None]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [None]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [None]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
a=np.random.random(size=(3,5))
print(a)

[[0.55329853 0.00637329 0.52392451 0.5378456  0.49919287]
 [0.37288652 0.87837944 0.4621888  0.12026353 0.36454461]
 [0.86523128 0.54717397 0.4216565  0.43825028 0.33394441]]


In [None]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

# **Basics of NumPy Arrays**

## **1. NumPy Array Attributes**
*Determining the size, shape, memory consumption, and data types of arrays*

In [4]:
#Define three random arrays, a one-dimensional, two-dimensional, and three-dimensional array. 
#We'll use NumPy's random number generator.
#Use 'seed' to ensure that the same random arrays are generated each time this code is run.

np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

In [5]:
#Each array has attributes:
# (1) ndim (the number of dimensions), 
# (2) shape (the size of each dimension), and 
# (3) size (the total size of the array)

print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


## **2. NumPy Array Indexing**
*Getting and setting the value of individual array elements*

In [6]:
#Print x1
x1

array([5, 0, 3, 3, 7, 9])

In [7]:
#Access first element from x1 (1D Array)
#This is similar to accessing elements from lists in Python
x1[0]

5

In [8]:
#Access second last element from x1
x1[-2]

7

In [9]:
#Print x2
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [10]:
#Access element at first row, fist column in x2 (2D Array)
x2[0, 0]

3

In [11]:
#Access element at third row, fourth column from reverse in x2 
print("Old value:", x2[2, -4])

#Set value of element at third row, fourth column from reverse in x2 to 11
x2[2, -4] = 11
print("New value:", x2[2, -4])

Old value: 1
New value: 11


## **3. NumPy Array Slicing**
*Getting and setting smaller subarrays within a larger array*

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:
``` python
x[start:stop:step]
```
If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.
We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

### **3.1 One-dimensional subarrays**

In [12]:
x1

array([5, 0, 3, 3, 7, 9])

In [13]:
print(x1[:3])  # first three elements
print(x1[4:])  # elements after index 4
print(x1[1:4])  # middle sub-array
print(x1[::-1])  # all elements, reversed
print(x1[4::-2])  # reversed every other from index 4
print(x1[::3])

[5 0 3]
[7 9]
[0 3 3]
[9 7 3 3 0 5]
[7 3 5]
[5 3]


### **3.2 Multi-dimensional subarrays**

In [14]:
x2

array([[ 3,  5,  2,  4],
       [ 7,  6,  8,  8],
       [11,  6,  7,  7]])

In [15]:
x2[:2, :3]  # two rows, three columns

array([[3, 5, 2],
       [7, 6, 8]])

In [16]:
x2[:3, ::2]  # all rows, every other column

array([[ 3,  2],
       [ 7,  8],
       [11,  7]])

In [17]:
print(x2[:, 0])  # first column of x2

[ 3  7 11]


In [18]:
print(x2[-1, :])  # last row of x2

[11  6  7  7]


### **3.3 Creating copies of arrays**

In [19]:
print(x2)

#Extract a  2×2  subarray from x2
x2_sub = x2[:2, :2]
print("\n", x2_sub)

[[ 3  5  2  4]
 [ 7  6  8  8]
 [11  6  7  7]]

 [[3 5]
 [7 6]]


In [20]:
#Now if we modify this subarray, we'll see that the original array is changed!
x2_sub[0, 0] = 99
print(x2_sub)
print("\n", x2)

[[99  5]
 [ 7  6]]

 [[99  5  2  4]
 [ 7  6  8  8]
 [11  6  7  7]]


In [21]:
#To copy the data within an array or a subarray use the copy() method
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

x2_sub_copy[0, 0] = 42
print("\n", x2_sub_copy)

print("\n", x2)

[[99  5]
 [ 7  6]]

 [[42  5]
 [ 7  6]]

 [[99  5  2  4]
 [ 7  6  8  8]
 [11  6  7  7]]


## **4. NumPy Array Reshaping**
*Changing the shape of a given array*

In [22]:
grid = np.arange(1, 10)
print(grid)

grid = np.arange(1, 10).reshape((3, 3))
print("\n", grid)

[1 2 3 4 5 6 7 8 9]

 [[1 2 3]
 [4 5 6]
 [7 8 9]]


In [23]:
x = np.array([1, 2, 3])
print(x)

# row vector via reshape
print("\n", x.reshape((1, 3)))

# column vector via reshape
print("\n", x.reshape((3, 1)))

[1 2 3]

 [[1 2 3]]

 [[1]
 [2]
 [3]]


## **5. NumPy Array Concatenation and Splitting**
*Combining multiple arrays into one, and splitting one array into many*

In [24]:
#Array concatenation: One-dimensional array

x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = np.array([99, 99, 99])
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 99 99]


In [25]:
#Array concatenation: Two-dimensional array

grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

# concatenate along the first axis
print(np.concatenate([grid, grid],axis=0))

# concatenate along the second axis (zero-indexed)
print("\n", np.concatenate([grid, grid], axis=1))

[[1 2 3]
 [4 5 6]
 [1 2 3]
 [4 5 6]]

 [[1 2 3 1 2 3]
 [4 5 6 4 5 6]]


In [26]:
#Array concatenation: Arrays with mixed dimensions

#When working with arrays of mixed dimensions, it can be clearer to use:
# (1) np.vstack (vertical stack) and 
# (2) np.hstack (horizontal stack) functions

x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
print(np.vstack([x, grid]))

# horizontally stack the arrays
y = np.array([[99],
              [99]])
print("\n", np.hstack([grid, y]))

[[1 2 3]
 [9 8 7]
 [6 5 4]]

 [[ 9  8  7 99]
 [ 6  5  4 99]]


In [27]:
#Splitting of arrays

x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2 = np.split(x, [3]) #Here '3' and '5' are split indices
print(x1, x2)

[1 2 3] [99 99  3  2  1]


In [28]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [29]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print("\n", lower)

[[0 1 2 3]
 [4 5 6 7]]

 [[ 8  9 10 11]
 [12 13 14 15]]


In [30]:
left, right = np.hsplit(grid, [2])
print(left)
print("\n", right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]

 [[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


# **Computation on NumPy Arrays: Universal Functions (ufuncs)**

In [3]:
#Set 1: Array Arithmetic

x = np.arange(4)
print("x     =", x)
print("x + 2 =", x + 2)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division
print("-x     = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2  = ", x % 2)

x     = [0 1 2 3]
x + 2 = [2 3 4 5]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]
-x     =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2  =  [0 1 0 1]


In [None]:
np.add(x, 2)

The following table lists the arithmetic operators implemented in NumPy:

| Operator	    | Equivalent ufunc    | Description                           |
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|


In [33]:
#Set 2: Absolute Value

x = np.array([-2, -1, 0, 1, 2])
print(abs(x))
print(np.absolute(x))
print(np.abs(x))

[2 1 0 1 2]
[2 1 0 1 2]
[2 1 0 1 2]


In [32]:
#Set 3: Trigonometric Functions

theta = np.linspace(0, np.pi, 3)
print("theta      = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))

x = [-1, 0, 1]
print("\nx         = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))

theta      =  [0.         1.57079633 3.14159265]
sin(theta) =  [0.0000000e+00 1.0000000e+00 1.2246468e-16]
cos(theta) =  [ 1.000000e+00  6.123234e-17 -1.000000e+00]
tan(theta) =  [ 0.00000000e+00  1.63312394e+16 -1.22464680e-16]

x         =  [-1, 0, 1]
arcsin(x) =  [-1.57079633  0.          1.57079633]
arccos(x) =  [3.14159265 1.57079633 0.        ]
arctan(x) =  [-0.78539816  0.          0.78539816]


In [31]:
#Set 4: Exponents and Logarithms

x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))

x = [1, 2, 4, 10]
print("\nx        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

x     = [1, 2, 3]
e^x   = [ 2.71828183  7.3890561  20.08553692]
2^x   = [2. 4. 8.]
3^x   = [ 3  9 27]

x        = [1, 2, 4, 10]
ln(x)    = [0.         0.69314718 1.38629436 2.30258509]
log2(x)  = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]


# **Aggregations: Min, Max, and Everything In Between**

In [34]:
x = np.arange(1, 11)
print(x)

[ 1  2  3  4  5  6  7  8  9 10]


In [None]:
#Find min, max and sum over 1D array
np.min(x), np.max(x), np.sum(x)

(1, 10, 55)

In [35]:
#Create a multi-dimensional 3x3 array with values from 1 to 9
M = np.arange(1, 10).reshape((3, 3))
print(M)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [36]:
#Find min, max and sum over multidimensional array
np.min(M), np.max(M), np.sum(M)

(1, 9, 45)

In [37]:
# To find minimum value within each column specify 'axis=0'
print(np.min(M, axis=0))

# To find minimum value within each row specify 'axis=0'
print(np.max(M, axis=1))

[1 2 3]
[3 6 9]


In [38]:
#Assume 'heights' to be a NumPy array containing heights in centimeters of 15 individuals

heights = np.array([189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183])

print("Heights:           ", heights)
print("Mean height:       ", np.mean(heights))
print("Standard deviation:", np.std(heights))
print("Minimum height:    ", np.min(heights))
print("Maximum height:    ", np.max(heights))
print("25th percentile:   ", np.percentile(heights, 25))
print("Median:            ", np.median(heights))
print("75th percentile:   ", np.percentile(heights, 75))

Heights:            [189 170 189 163 183 171 185 168 173 183 173 173 175 178 183]
Mean height:        177.06666666666666
Standard deviation: 7.637335195530498
Minimum height:     163
Maximum height:     189
25th percentile:    172.0
Median:             175.0
75th percentile:    183.0


### Other aggregation functions

NumPy provides many other aggregation functions, but we won't discuss them in detail here.
Additionally, most aggregates have a ``NaN``-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point ``NaN`` value (for a fuller discussion of missing data, see [Handling Missing Data](03.04-Missing-Values.ipynb)).
Some of these ``NaN``-safe functions were not added until NumPy 1.8, so they will not be available in older NumPy versions.

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

We will see these aggregates often throughout the rest of the book.

In [40]:
x=np.array([1,2,3,4,5,0,2,3,4])
a=np.argmin(x)
print(a)


5


# **Sorting Arrays**

In [41]:
#Sort 'heights' array in the above section

print(heights)
print(np.sort(heights))

[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183]
[163 168 170 171 173 173 173 175 178 183 183 183 185 189 189]


In [42]:
#A related function is 'argsort', which instead returns the indices of the sorted elements

i = np.argsort(heights)
print(i)

[ 3  7  1  5  8 10 11 12 13  4  9 14  6  0  2]


In [43]:
#Sorting along rows or columns

rand = np.random.RandomState(42)
X = rand.randint(0, 10, (4, 6))
print(X)

# sort each column of X
print("\n", np.sort(X, axis=0))

# sort each row of X
print("\n", np.sort(X, axis=1))

[[6 3 7 4 6 9]
 [2 6 7 4 3 7]
 [7 2 5 4 1 7]
 [5 1 4 0 9 5]]

 [[2 1 4 0 1 5]
 [5 2 5 4 3 7]
 [6 3 7 4 6 7]
 [7 6 7 4 9 9]]

 [[3 4 6 6 7 9]
 [2 3 4 6 7 7]
 [1 2 4 5 7 7]
 [0 1 4 5 5 9]]


# **Structured Data: NumPy's Structured Arrays**

Imagine that we have several categories of data on a number of people (say, name, age, and weight), and we'd like to store these values for use in a Python program. It would be possible to store these in three separate arrays.

In [44]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

But this is a bit clumsy. There's nothing here that tells us that the three arrays are related; it would be more natural if we could use a single structure to store all of this data. NumPy can handle this through structured arrays, which are arrays with compound data types.

In [45]:
x = np.zeros(4, dtype=int)
print("x          =", x)

# Use a compound data type for structured arrays
# Here 'U10' translates to "Unicode string of maximum length 10," 
#       'i4' translates to "4-byte (i.e., 32 bit) integer," and 
#       'f8' translates to "8-byte (i.e., 64 bit) float."
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})

print("data       =", data)
print("data.dtype =", data.dtype)

x          = [0 0 0 0]
data       = [('', 0, 0.) ('', 0, 0.) ('', 0, 0.) ('', 0, 0.)]
data.dtype = [('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


In [46]:
#Fill the empty container array with our list values
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


In [47]:
# Get all names
print("Names                      :", data['name'])

# Get first row of data
print("First row of data          :", data[0])

# Get the name from the last row
print("Name from last row         :", data[-1]['name'])

# Get names where age is under 30
print("Names where age is under 30:", data[data['age'] < 30]['name'])

Names                      : ['Alice' 'Bob' 'Cathy' 'Doug']
First row of data          : ('Alice', 25, 55.)
Name from last row         : Doug
Names where age is under 30: ['Alice' 'Doug']


# **NumPy: Axes and Dimensions**

In [48]:
#The vector has one axis since it is 1-dimensional. So you can only apply a 
#function across axis-0. Axes are always 0 indexed.

np.sum([1,2,3,4], axis=0)

10

In [49]:
#Matrix is a 2-dimensional data so it has 2 axes.

#Applying sum function across columns
data = np.array([[1,2,3], [4,5,6]])
print(data)

sum = np.sum(data, axis=0) 
print("\n", sum)

#Applying sum function across rows
sum = np.sum(data, axis=1)
print("\n", sum)

#Providing no axis in the arguments to get the sum of all elements together
sum = np.sum(data)
print("\n", sum)

[[1 2 3]
 [4 5 6]]

 [5 7 9]

 [ 6 15]

 21


In [50]:
#3-Dimensional data (Concept)
#3D data is a collection of 2D data-points(matrix).

#1. Applying sum function across axis-0 means you are summing all matrices together.
#2. Applying sum function across axis-1 means you are summing all vectors inside each metrics.
#3. Applying sum function across axis-2 means you are summing all scalars inside each Vector.

data = np.array([[[1,1,1],
                  [3,3,3]], 
                 
                 [[2,2,2], 
                  [4,4,4]]])

print(data)

sum = np.sum(data, axis=0) 
print("\n Output of axis=0:\n", sum)

sum = np.sum(data, axis=1) 
print("\n Output of axis=1:\n", sum)

sum = np.sum(data, axis=2) 
print("\n Output of axis=2:\n", sum)

[[[1 1 1]
  [3 3 3]]

 [[2 2 2]
  [4 4 4]]]

 Output of axis=0:
 [[3 3 3]
 [7 7 7]]

 Output of axis=1:
 [[4 4 4]
 [6 6 6]]

 Output of axis=2:
 [[ 3  9]
 [ 6 12]]
