<a href="https://colab.research.google.com/github/prof-sd1/CSEC_CPD/blob/main/Module_3_Core_Python_Libraries_for_Data_Science.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 3: Core Python Libraries for Data Science

## 3.1 Introduction to Python Libraries for Data Science

### Why Libraries Matter

In Python, libraries are pre-written collections of code that simplify complex tasks. For data science, they provide powerful tools for:

- Numerical computation

- Data manipulation

- Visualization

- Machine learning (covered in later modules)

Instead of writing everything from scratch, we import and use these libraries to save time, boost productivity, and maintain clean code.

### Core Python Libraries in This Module

|  Library|Purpose                                                                 |
| -------------- | -------------------------------------------------------------------------- |
| **NumPy**      | Fast mathematical operations on large, multi-dimensional arrays            |
| **Pandas**     | Manipulate, clean, and analyze structured tabular data                     |
| **Matplotlib** | Create basic static charts (line, bar, scatter, etc.)                      |
| **Seaborn**    | Create beautiful and statistical plots with minimal code                   |
| **Plotly**     | Create **interactive** visualizations and dashboards (Zoom, Hover, Filter) |


### Key Benefits of Using Libraries

- Pre-built Functions: Ready-to-use tools save development time

- Community Support: Large user base, active development, good documentation

- Scalability: Optimized for performance, even with large datasets

- Interoperability: Libraries work together smoothly (e.g., Pandas + Seaborn)

## 3.2 NumPy: Arrays, Vectorized Operations, Linear Algebra

### What is NumPy?

**NumPy** stands for **Numerical Python**. It is the most basic and powerful Python library used for:

- Creating arrays (1D, 2D, or multi-dimensional)

- Doing math operations on entire arrays without writing loops

- Solving linear algebra problems like matrix multiplication or solving equations



### Why Use NumPy Instead of Lists?

| Feature         | Python Lists        | NumPy Arrays                |
| --------------- | ------------------- | --------------------------- |
| Speed           | Slow for large data | Much faster                 |
| Math operations | Manual (use loops)  | Direct (use `+`, `*`, etc.) |
| Memory          | High usage          | Optimized & efficient       |


### Import NumPy

In [1]:
import numpy as np

## 1. Creating Arrays

In [28]:
arr = np.array([1, 2, 3])
print(arr)
print(type(arr))
print(arr.shape) #shape
print(arr.ndim) #dimension
print(arr.dtype) #element datatype
print(arr.size)  #size of entire elements

[1 2 3]
<class 'numpy.ndarray'>
(3,)
1
int64
3


In [26]:
#Multi-dimensional Arrays
matrix = np.array([[1, 2], [3, 4]])
print(matrix)
type(matrix)
print(matrix.shape)
print(matrix.ndim)
print(matrix.dtype)
print(matrix.size)


[[1 2]
 [3 4]]
(2, 2)
2
int64
4


## 2. Initial Placeholders in numpy arrays

In [21]:
# create a numpy array of Zeros
x = np.zeros((4,5))
print(x)


[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


In [20]:
# create a numpy array of ones
y = np.ones((3,3))
print(y)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [17]:
# create an identity matrix
a = np.eye(3)
print(a)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [14]:
np.arange(0, 10, 2)    # [0 2 4 6 8]


array([0, 2, 4, 6, 8])

In [15]:
np.linspace(0, 1, 5)   # [0.  0.25 0.5  0.75  1.]

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [18]:
# create a numpy array with random values
b = np.random.random((3,4))
print(b)

[[0.82965816 0.34393895 0.3698941  0.05025047]
 [0.91918485 0.98892513 0.15822587 0.68490854]
 [0.15637057 0.31575348 0.6234471  0.29623733]]


In [19]:
# array of a particular value
z = np.full((5,4),5)
print(z)

[[5 5 5 5]
 [5 5 5 5]
 [5 5 5 5]
 [5 5 5 5]
 [5 5 5 5]]


In [22]:
# random integer values array within a specific range
c = np.random.randint(10,100,(3,5))
print(c)

[[72 72 26 17 99]
 [34 65 63 19 87]
 [73 55 84 71 36]]


### convert a list to a numpy array

In [23]:
list1 = [10,20,20,20,50]

np_array = np.asarray(list1)
print(np_array)
type(np_array)

[10 20 20 20 50]


numpy.ndarray

### Mathematical operations on a np array

In [30]:
a = np.random.randint(0,10,(3,3))
b = np.random.randint(10,20,(3,3))

In [35]:
print(np.add(a,b))
print(np.subtract(a,b))
print(np.multiply(a,b))
print(np.divide(a,b))

[[13 11 27]
 [17 19 20]
 [21 19 19]]
[[-13  -9  -9]
 [ -9 -17  -6]
 [-17 -15 -19]]
[[  0  10 162]
 [ 52  18  91]
 [ 38  34   0]]
[[0.         0.1        0.5       ]
 [0.30769231 0.05555556 0.53846154]
 [0.10526316 0.11764706 0.        ]]


In [36]:
a = np.array([4, 2, 3, 1])

np.sum(a)         # 10
np.mean(a)        # 2.5
np.std(a)         # 1.118
np.max(a)         # 4
np.min(a)         # 1
np.sort(a)        # [1 2 3 4]


array([1, 2, 3, 4])