# NumPy

### What is NumPy?
1. NumPy (Numerical Python) is a python library for fast numerical computations.
2. It provides
   1. ndarray (n-dimentional array) like a powerful version of python list
   2. Vectorized operations, apply operations to entire arrays without writing loops
   3. Linear algebra, statistics and mathematical functions
   4. Interface with C, C++, Fortran Code for performance

### Why NumPy is Important?
1. Performance
   1. Python list are slow (dynamic typing, object overhead)
   2. NumPy arrays are stored in contiguous memory and use C-optimized code -> 10-100x faster
2. Foundation of Data Science
   1. Libraries like Pandas, SciPy, Scikit-learn, TensorFlow, PyTorch are all built on top of NumPy
3. Math & Data Friendly
   1. Easy matrix operations, which are critical in ML (linear algebra, optimization etc)

### Key Features of NumPy
1. N-dimentional arrays (ndarray)
2. Vectorized operations (no python loops needed)
3. Broadcasting (automatic expansion of arrays for operations)
4. Random number generation (important in ML & simulations)
5. Linear algebra functions (dot product, eigenvalues, matrix multiplications)
6. Fourier transforms (signal processing)
7. Integration with C/C++/Fortran for high performance

### Real-Life Use Cases of NumPy
1. Data Science & Analytics
   1. Handling large datasets
   2. Preprocessing data (normalization, scaling, missing value handling)
   3. Example: Converting customer transaction logs into structured numeric data for modeling
2. Machine Learning
   1. Underlying computaions in TensorFlow, PyTorch, Scikit-learn are powered by NumPy
   2. Tasks like Gradient descent, Feature scaling, Loss calculations etc
   3. Example: Training a model for hotel price prediction on travel portal
3. Image Processing (Computer Vision)
   1. Images are matrices of pixel values
   2. NumPy makes it easy to manupulate them
   3. Example: compressing, filtering or detecting objects in travel photos
4. Financial Analysis
   1. Stock price movements, risk models, simulations
   2. Example: Monte carlo simulations to predict hotel booking demand in travel portal
5. Scientific Research
   1. Physics
   2. Biology
   3. Chemistry
6. Recommendation Systems
   1. Matrix factorization
   2. Examples: Recommend destinations to customers based on booking history
7. Natural Language Processing (NLP)
   1. Text -> vectors (Bag of words, Word2Vec)
   2. Example: Converting customer reviews into embeddings for sentiment analysis

### NumPy in Travel Portal Context
1. Customer Behaviour Analysis -> Understand booking patterns across destinations
2. Ad Spend Optimization -> Vectorized calculation of ROAS (Return on Ad Spend)
3. Dynamic Pricing -> Real-time adjustment of hotel/destination pricing
4. Fraud Detection -> Detect unusual booking/payment behaviors
5. Recommendation Engine -> Suggest hotels/destinations like Amazon suggests products

In [1]:
import numpy as np
import sys

In [25]:
# Heterogeneous List
# A list contains an elements of different data types
# Very flexible, but can make operation slower and less efficient because pythons has to deal with multiple types
p_list = [1,2,3,"First",4,True,5]
print(p_list)
for item in p_list:
    print(f"Element {item} ({type(item)}): {sys.getsizeof(item)} bytes")


# Homogeneous List
# A list where all elemenets are of the same type
# Makes operations easier and often faster as interpreter knows what type of data it is dealing with
# In NumPy list only homogeneous is allowed
# Memory can be allocated in a contiguous block

np_array = np.array([1,2,3,"First",4,True,5])
print(np_array)
for item in np_array:
    print(f"Element {item} ({type(item)}): {sys.getsizeof(item)} bytes")

[1, 2, 3, 'First', 4, True, 5]
Element 1 (<class 'int'>): 28 bytes
Element 2 (<class 'int'>): 28 bytes
Element 3 (<class 'int'>): 28 bytes
Element First (<class 'str'>): 54 bytes
Element 4 (<class 'int'>): 28 bytes
Element True (<class 'bool'>): 28 bytes
Element 5 (<class 'int'>): 28 bytes
['1' '2' '3' 'First' '4' 'True' '5']
Element 1 (<class 'numpy.str_'>): 82 bytes
Element 2 (<class 'numpy.str_'>): 82 bytes
Element 3 (<class 'numpy.str_'>): 82 bytes
Element First (<class 'numpy.str_'>): 86 bytes
Element 4 (<class 'numpy.str_'>): 82 bytes
Element True (<class 'numpy.str_'>): 85 bytes
Element 5 (<class 'numpy.str_'>): 82 bytes


In [3]:
%%time
# Execution time of simple python list
p_list = range(1000000)
list_mul = []
for i in p_list:
    list_mul.append(i * 2)

for i in p_list:
    list_mul.append(i * 5)

print(len(list_mul))

2000000
CPU times: user 589 ms, sys: 87.5 ms, total: 676 ms
Wall time: 200 ms


In [4]:
%%time
# Creating numpy array
np_array = np.array(range(1000000))

CPU times: user 66.7 ms, sys: 9.36 ms, total: 76 ms
Wall time: 76.6 ms


In [5]:
%%time
# Execution time of numpy array
array_mul = np_array * 2

array_mul_2 = np_array * 5
array_mul = np.append(array_mul, array_mul_2, axis=0)

print(len(array_mul))

2000000
CPU times: user 3.25 ms, sys: 5.28 ms, total: 8.53 ms
Wall time: 7.06 ms


In [6]:
# Check number of direction
array_mul.ndim

1

In [7]:
# Shape of an array
array_mul.shape

(2000000,)

In [8]:
# Two dimentional array
arr2 = np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])
arr2.ndim

2

In [9]:
arr2.shape

(4, 3)

### Vectorization
- Replacing explicit loops in Python with array/metrix operations that are executed internally in C/C++ code
- Instead of processing data element by element, we process whole array at once
- Replacing loops with optimized array operations

#### Why is Vectirization Fast?
- Uses low-level optimized code (C/Fortran)
- Runs on contiguous memory (NumPy arrays)
- Can leverage SIMD instructions & parallelism

#### Where do we use vectorization?
- NumPy & Pandas: Array and dataframe operations.
- Machine Learning: Training models (matrix multiplications, dot products).
- Deep Learning: Libraries like TensorFlow & PyTorch are fully vectorized.
- Data Analytics: Faster filtering, aggregations, and transformations.

In [10]:
%%time
# Without vectorization
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = []
for num in numbers:
    result.append(num * 2)
print(result)

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
CPU times: user 45 µs, sys: 3 µs, total: 48 µs
Wall time: 52.2 µs


In [11]:
%%time
# With vectorization
numbers = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
result = numbers * 2
print(result)

[ 2  4  6  8 10 12 14 16 18 20]
CPU times: user 225 µs, sys: 93 µs, total: 318 µs
Wall time: 286 µs


In [12]:
%%time
# Dot Product - Loop way
a = [1, 2, 3]
b = [4, 5, 6]
dot = 0
for i in range(len(a)):
    dot += a[i] * b[i]
print(dot)

32
CPU times: user 74 µs, sys: 21 µs, total: 95 µs
Wall time: 122 µs


In [13]:
%%time
# Dot Product - Vectorized NumPy way
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(np.dot(a,b))

32
CPU times: user 59 µs, sys: 11 µs, total: 70 µs
Wall time: 64.8 µs


In [14]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [15]:
np.arange(2,5)

array([2, 3, 4])

In [24]:
np.arange(2,10.5,2)

array([ 2.,  4.,  6.,  8., 10.])

In [17]:
# Implicit type casting
np.array([1.2, 2.3, 4.5, True])

array([1.2, 2.3, 4.5, 1. ])

In [18]:
# Explicit type casting
np.array([1.2, 2.3, 4.5, True], dtype=float)

array([1.2, 2.3, 4.5, 1. ])

In [19]:
arr3 = np.array([1.2, 2.3, 4.5, True])
arr3.astype(int)

array([1, 2, 4, 1])

In [22]:
# Getting multiple index values and doesn't work with baseline python
arr3[[1,2]]

array([2.3, 4.5])

### Slicing
[ START_INDEX : UPTO_END_INDEX : JUMP_INDEX ]

In [26]:
arr4 = np.arange(10)

In [28]:
arr4[2]

2

In [42]:
arr5 = [0,1,2,3,4,5,6,7,8,9,10]
arr5[1:6]

[1, 2, 3, 4, 5]

In [43]:
arr5[1:7:2]

[1, 3, 5]

In [47]:
arr4[5:] = 10
arr4

array([ 0,  1,  2,  3,  4, 10, 10, 10, 10, 10])

In [49]:
arr6 = np.array(range(20))
arr6

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [74]:
print(arr6.ndim, arr6.shape)

1 (20,)


In [64]:
# Reshape array
o1 = arr6.reshape(4,5)
print(o1)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


In [70]:
o1[1:3, 1:3]

array([[ 6,  7],
       [11, 12]])

In [72]:
o1[2:, 3:]

array([[13, 14],
       [18, 19]])

In [145]:
a = np.array([[1,2,3], [4,5,6], [7,8,9]])
a.reshape(a.shape)[::-1].transpose()

array([[7, 4, 1],
       [8, 5, 2],
       [9, 6, 3]])