NUMPY

NumPyâ€™s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.

For example, the array for the coordinates of a point in 3D space, [1, 2, 1], has one axis. That axis has 3 elements in it, so we say it has a length of 3. 

numpy is faster than list and hence we use numpy..

WHY NUMPY IS FASTER THAN LIST:

>"NumPy is faster because arrays are stored contiguously in memory, use fixed data types eliminating type-checking overhead, and operations execute in optimized C code rather than Python."

>Python lists store mixed types, so Python checks the type of EVERY element before operating on it

>simple rule to remember:

* Numbers + Math + Big Data = NumPy

* Mixed types + Small data + Flexibility = Python List

>ðŸ§  TOPIC 1 â€” Creating Arrays
* NumPy is used to work with arrays. The array object in NumPy is called ndarray.
* We can create a NumPy ndarray object by using the array() function

To create an ndarray, we can pass a list, tuple or any array-like object into the array() method, and it will be converted into an ndarray

In [7]:
import numpy as np

# From a list
a = np.array([1, 2, 3, 4, 5])
print(a)          # [1 2 3 4 5]
print(type(a))    # <class 'numpy.ndarray'>

# Zeros and ones â€” used constantly in ML
zeros = np.zeros(5)          # [0. 0. 0. 0. 0.]
ones = np.ones(5)            # [1. 1. 1. 1. 1.]

# Range â€” like Python's range but returns an array
r = np.arange(0, 10, 2)     # [0 2 4 6 8]

# Evenly spaced numbers â€” very useful in ML
l = np.linspace(0, 1, 5)    # [0.   0.25 0.5  0.75 1.  ]

# Random arrays
rand = np.random.randint(0, 100, 5)   # 5 random integers between 0-100
randf = np.random.rand(5)              # 5 random floats between 0-1

[1 2 3 4 5]
<class 'numpy.ndarray'>
[0.  0.5 1. ]


>ðŸ§  TOPIC 2 â€” Array Properties

In [None]:
import numpy as np

a = np.array([1, 2, 3, 4, 5])

print(a.shape)     # (5,) â€” 5 elements, 1 dimension
print(a.dtype)     # int64 â€” data type of elements
print(a.ndim)      # 1 â€” number of dimensions
print(len(a))      # 5

# 2D array â€” matrix
matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

print(matrix.shape)   # (3, 3) â€” 3 rows, 3 columns
print(matrix.ndim)    # 2

(5,)
int64
1
5
(3, 3)
2


DIMENSIONS IN ARRAYS:

>0-D Arrays

0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.


In [4]:
import numpy as np
arr = np.array(45)
print(arr)

45


>1-D Arrays

An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array

In [5]:
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr)

[1 2 3 4]


>2-D Arrays

An array that has 1-D arrays as its elements is called a 2-D array.

These are often used to represent matrix or 2nd order tensors.

In [None]:
import numpy as np
arr = np.array(([1, 2, 3, 4], 
                [2, 3, 4, 6]))
print(arr)

[[1 2 3 4]
 [2 3 4 6]]


>3-D arrays

An array that has 2-D arrays (matrices) as its elements is called 3-D array.

These are often used to represent a 3rd order tensor

In [12]:
import numpy as np
arr = np.array((
    [[1, 2, 3],
     [3, 4, 5]],
                 [[9, 10, 8],
                  [3, 5, 6]]
                 ))
print(arr)

[[[ 1  2  3]
  [ 3  4  5]]

 [[ 9 10  8]
  [ 3  5  6]]]


>use ndim attribute to check the dimension of array in numpy

In [10]:
import numpy as np

a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

0
1
2
3


An array can be of any dimension and in order to create higher dimensional array use ndmin argument..

Create an array with 5 dimensions and verify that it has 5 dimensions:

In [1]:
import numpy as np

arr = np.array(([1, 2, 3, 4]), ndmin=5)

print(arr)
print('number of dimensions :', arr.ndim)

[[[[[1 2 3 4]]]]]
number of dimensions : 5


In this array the innermost dimension (5th dim) has 4 elements, the 4th dim has 1 element that is the vector, the 3rd dim has 1 element that is the matrix with the vector, the 2nd dim has 1 element that is 3D array and 1st dim has 1 element that is a 4D array.

>#Array indexing

You can access an array element by referring to its index number.

The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.

In [15]:
import numpy as np
arr = np.array([1,2,3])
print(arr[0])

1


>For 2-D Arrays

In [16]:
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('2nd element on 1st row: ', arr[0, 1])

2nd element on 1st row:  2


>For 3-D Arrays

In [17]:
import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

print(arr[0, 1, 2])

6


arr[0, 1, 2] prints the value 6.

And this is why:

The first number represents the first dimension, which contains two arrays:

[[1, 2, 3], [4, 5, 6]]
and:
[[7, 8, 9], [10, 11, 12]]

Since we selected 0, we are left with the first array:
[[1, 2, 3], [4, 5, 6]]

The second number represents the second dimension, which also contains two arrays:

[1, 2, 3]
and:
[4, 5, 6]

Since we selected 1, we are left with the second array:
[4, 5, 6]

The third number represents the third dimension, which contains three values:

4
5
6
Since we selected 2, we end up with the third value:

6

>Negative Indexing

Use negative indexing to access an array from the end

In [21]:
# print the last element from the 2nd dim:
import numpy as np

arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('Last element from 2nd dim: ', arr[1, -1])

Last element from 2nd dim:  10


>#Slicing Arrays

We pass slice instead of index like this: [start:end].

We can also define the step, like this: [start:end:step].



In [27]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[:5]) #considered 0 at the start 
print(arr[1:5:2])
print(arr[-3: -1])

[1 2 3 4 5]
[2 4]
[5 6]


>2-D Arrays slicing

In [31]:
import numpy as np

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

"""From the second element, 
slice elements from index 1 to index 4 (not included)"""
print(arr[1, 1:4])

#From both elements, return index 2
print(arr[0:2, 2])

"""From both elements, 
slice index 1 to index 4 (not included), 
this will return a 2-D array"""
print(arr[0:2, 1:4])


[7 8 9]
[3 8]
[[2 3 4]
 [7 8 9]]


>ðŸ§  TOPIC 4 â€” Array Operations
* This is where NumPy shines. Every operation works on the whole array at once â€” no loops needed

In [None]:
import numpy as np
a = np.array([1, 2, 3, 4, 5])

# Arithmetic
print(a + 10)      # [11 12 13 14 15]
print(a * 2)       # [2  4  6  8 10]      #This is called vectorisation
print(a ** 2)      # [1  4  9 16 25]        
print(a / 2)       # [0.5 1.  1.5 2.  2.5]

# Two arrays
b = np.array([10, 20, 30, 40, 50])
print(a + b)       # [11 22 33 44 55]
print(a * b)       # [10 40 90 160 250]

# Statistical operations â€” used in every ML project
print(a.sum())     # 15
print(a.mean())    # 3.0
print(a.max())     # 5
print(a.min())     # 1
print(a.std())     # standard deviation â€” measures spread of data


# Create array
marks = np.array([78, 92, 67, 88, 45, 87])

# Boolean operations (VERY IMPORTANT)
print("\n--- Boolean Magic ---")          
print("Marks > 80:", marks > 80)            #This is called boolean masking..
print("Students who scored > 80:", marks[marks > 80])
print("How many scored > 80:", (marks > 80).sum())

[11 12 13 14 15]
[ 2  4  6  8 10]
[ 1  4  9 16 25]
[0.5 1.  1.5 2.  2.5]
[11 22 33 44 55]
[ 10  40  90 160 250]
15
3.0
5
1
1.4142135623730951

--- Boolean Magic ---
Marks > 80: [False  True False  True False  True]
Students who scored > 80: [92 88 87]
How many scored > 80: 3


>Vectorization - applying one operation to ALL elements at the same time, without writing a loop.

>"Vectorization means applying an operation to an entire array at once instead of iterating element by element. NumPy sends the entire operation to optimized C code which processes all elements simultaneously."

>Boolean masking means fitering the data by specific conditions..

ðŸ§  TOPIC 5 â€” Reshaping
* In ML you constantly need to change the shape of your data without changing the values:

In [None]:
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])

reshaped = a.reshape(2, 3)   # 2 rows, 3 columns
print(reshaped)
# [[1 2 3]
#  [4 5 6]]

reshaped2 = a.reshape(3, 2)  # 3 rows, 2 columns
print(reshaped2)
# [[1 2]
#  [3 4]
#  [5 6]]

# Flatten â€” collapse any shape back to 1D
flat = reshaped.flatten()
print(flat)   # [1 2 3 4 5 6]

Rule â€” reshape(rows, cols) â€” rows Ã— cols must equal the total number of elements. 2Ã—3=6, 3Ã—2=6. Both work for an array of 6 elements

Task 1 â€” Array Basics (20 mins)

In [29]:
import numpy as np

arr = np.arange(0,11,2)
arr2 = np.linspace(0,100,6)
matrix = np.zeros((3,3))
#print(matrix)
matrix2=np.identity(4)
print(matrix2)
print(arr.shape)
print(arr.dtype)
print(arr.ndim)
print(arr.sum())
print(arr.mean())
print(arr.std())

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
(6,)
int64
1
30
5.0
3.415650255319866


Task 2 â€” Indexing and Slicing (20 mins)

In [37]:
import numpy as np

data = np.array([
    [10, 20, 30, 40],
    [50, 60, 70, 80],
    [90, 100, 110, 120]
])

print(data[1])
print(data[:,-1])
print(data[:2,2:4])
print(data[:,[1,3]])
print(data[2][2])


[50 60 70 80]
[ 40  80 120]
[[30 40]
 [70 80]]
[[ 20  40]
 [ 60  80]
 [100 120]]
110



Task 3 â€” Operations + Real World Thinking (30 mins)

In [38]:
import numpy as np

marks = np.array([
    [78, 85, 90, 88],
    [92, 78, 85, 95],
    [65, 70, 75, 80],
    [88, 92, 78, 85],
    [55, 60, 65, 70]
])

print(marks.sum(axis=1))
print(marks.mean(axis=0))
print(marks.max())
totals = marks.sum(axis=1)         # total per student
print(np.argmax(totals))           # index of student with highest total
print(marks/marks.max())

[341 350 290 343 250]
[75.6 77.  78.6 83.6]
95
1
[[0.82105263 0.89473684 0.94736842 0.92631579]
 [0.96842105 0.82105263 0.89473684 1.        ]
 [0.68421053 0.73684211 0.78947368 0.84210526]
 [0.92631579 0.96842105 0.82105263 0.89473684]
 [0.57894737 0.63157895 0.68421053 0.73684211]]


>The numpy.argmax() and numpy.argmin() functions in Python's NumPy library return the indices (positions) of the maximum and minimum values, respectively, within an array

Task4: Rewrite Your Old Code

In [36]:
import numpy as np

# Your old CSV data as arrays
names = np.array(["Sushant", "Rahul", "Aman", "Neha"])
marks = np.array([78, 92, 67, 88])

# OLD way (what you did before - manual loop)
# highest_mark = marks[0]
# for mark in marks:
#     if mark > highest_mark:
#         highest_mark = mark

# NEW way (NumPy - clean & fast)
highest_mark = marks.max()
highest_index = marks.argmax()
highest_name = names[highest_index]

lowest_mark = marks.min()
lowest_index = marks.argmin()
lowest_name = names[lowest_index]

print(f"Highest mark: {highest_mark} by {highest_name}")
print(f"Lowest mark: {lowest_mark} by {lowest_name}")
print(f"Average: {marks.mean():.2f}")

# Bonus: Find all students who scored above average
avg = marks.mean()
above_avg_mask = marks > avg
above_avg_students = names[above_avg_mask]
print(f"\nStudents above average ({avg:.2f}):")
print(above_avg_students)



Highest mark: 92 by Rahul
Lowest mark: 67 by Aman
Average: 81.25

Students above average (81.25):
['Rahul' 'Neha']


> Quick Consolidation Task (15 minutes)

In [3]:
import numpy as np

# Test 1: Fixed type enforcement
arr = np.array([1, 2, 3])
print("NumPy array dtype:", arr.dtype)

# What happens with mixed types?
mixed = np.array([1, 2.5, 3])
print("Mixed int+float dtype:", mixed.dtype)  # observe this!

# What happens with string + int?
mixed2 = np.array([1, 2, "three"])
print("Mixed int+string dtype:", mixed2.dtype)  # observe this!

# Test 2: True vectorization
marks = np.array([78, 92, 67, 88, 45])

# This is vectorization - NO loop, operates on all at once
curved_marks = marks + 5
print("\nOriginal:", marks)
print("After curve (+5):", curved_marks)

# This is boolean masking (different from vectorization!)
passed = marks[marks >= 70]
print("Students who passed (>=70):", passed)

NumPy array dtype: int64
Mixed int+float dtype: float64
Mixed int+string dtype: <U21

Original: [78 92 67 88 45]
After curve (+5): [83 97 72 93 50]
Students who passed (>=70): [78 92 88]


>what is <U21?

< U 21

 21 characters (longest string in array)

 U = Unicode (text/string type)
 
 < = little-endian (memory storage direction)

why did NumPy do this?

NumPy cannot store mixed types. It has one strict rule:

Every element in an array MUST be the same type.

hence numpy can't store int and string together so it convert all the data to string..

1    â†’ "1"

2    â†’ "2"

"three" â†’ "three"  (already string)

Longest string = "three" = 5 chars

So dtype = <U21 (unicode, max 21 chars)

This is called "upcasting"** - NumPy converts everything to the most flexible type that can hold all values.

**The upcasting hierarchy:**
```
int â†’ float â†’ complex â†’ string
(least flexible)        (most flexible)

In [4]:
import numpy as np

# int + float = float (not string!)
arr1 = np.array([1, 2, 2.5])
print(arr1.dtype)  # float64

# int + string = string (string wins)
arr2 = np.array([1, 2, "three"])
print(arr2.dtype)  # <U21

# int only = int
arr3 = np.array([1, 2, 3])
print(arr3.dtype)  # int64

float64
<U21
int64


> Last task

In [5]:
import numpy as np

# Array shape - VERY important for ML later
arr_1d = np.array([1, 2, 3, 4, 5])
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6]])

print("1D shape:", arr_1d.shape)   # (5,)
print("2D shape:", arr_2d.shape)   # (2, 3)
print("1D dimensions:", arr_1d.ndim)  # 1
print("2D dimensions:", arr_2d.ndim)  # 2

# Accessing elements in 2D array
print("\nFirst row:", arr_2d[0])        # [1, 2, 3]
print("Element row1,col2:", arr_2d[0, 2])  # 3
print("Entire column 0:", arr_2d[:, 0])    # [1, 4]

1D shape: (5,)
2D shape: (2, 3)
1D dimensions: 1
2D dimensions: 2

First row: [1 2 3]
Element row1,col2: 3
Entire column 0: [1 4]
