# THEORY.


1. What is NumPy, and why is it widely used in Python?

NumPy (Numerical Python) is a Python library for numerical computations.
✅ Widely used because:

Provides multi-dimensional arrays (faster than lists).

Supports vectorized operations (no explicit loops).

Has functions for linear algebra, random numbers, FFT, statistics, etc.

Basis for Pandas, Scikit-learn, TensorFlow, etc.



---

2. How does broadcasting work in NumPy?

Broadcasting lets NumPy perform arithmetic on arrays of different shapes without explicitly copying data.
Example:

import numpy as np
a = np.array([1, 2, 3])
b = 2
print(a + b)   # [3 4 5]

Scalar b is broadcasted to match shape of a.


---

3. What is a Pandas DataFrame?

A 2D labeled data structure (like an Excel sheet).

Has rows & columns with labels.

Each column can have different data types.



---

4. Explain the use of groupby() method in Pandas.

Groups data based on column(s).

Allows applying functions like sum, mean, count.


df.groupby("Department")["Salary"].mean()


---

5. Why is Seaborn preferred for statistical visualizations?

Built on Matplotlib.

Provides high-level API for beautiful plots.

Has built-in themes & functions for statistical plots like violin, boxplot, heatmap.



---

6. Differences between NumPy arrays and Python lists?

Feature	NumPy Array	Python List

Speed	Faster (C backend)	Slower
Size	Fixed	Dynamic
Operations	Vectorized	Need loops
Storage	Homogeneous data	Heterogeneous



---

7. What is a heatmap, and when should it be used?

A color-coded matrix representation.

Used for showing correlation, frequency, or intensity of values.
Example: Correlation matrix of dataset.



---

8. What does vectorized operation mean in NumPy?

Performing operations on entire arrays without explicit loops.

a = np.array([1,2,3])
b = np.array([4,5,6])
print(a+b)  # [5 7 9]


---

9. How does Matplotlib differ from Plotly?

Matplotlib → Static, 2D visualizations, more customizable but verbose.

Plotly → Interactive, supports zoom, hover, 3D plots, dashboards.



---

10. Significance of hierarchical indexing in Pandas?

Allows multi-level indexing (row/column labels with more than one index).

Useful for grouped or panel data.



---

11. Role of Seaborn’s pairplot()?

Creates pairwise scatter plots of numerical variables.

Helps see relationships & correlations.



---

12. Role of describe() function in Pandas?

Provides summary statistics like mean, std, min, max, quartiles.



---

13. Why is handling missing data important?

Missing data can cause biased analysis or errors.

Pandas provides methods like .fillna(), .dropna().



---

14. Benefits of using Plotly for data visualization?

Interactive.

Supports dashboards.

2D, 3D plots.

Web integration.



---

15. How does NumPy handle multidimensional arrays?

Stores them as ndarrays.

Supports reshaping, slicing, broadcasting, matrix operations.



---

16. Role of Bokeh in data visualization?

Interactive, web-based visualizations.

Good for large datasets.

Integrates with Flask/Django.



---

17. Difference between apply() and map() in Pandas?

map() → element-wise on Series.

apply() → row/column-wise on DataFrame.



---

18. Some advanced features of NumPy?

Broadcasting.

Masked arrays.

FFT, linear algebra, random sampling.



---

19. How does Pandas simplify time series analysis?

Built-in support for datetime indexing.

Provides resampling, shifting, rolling windows.



---

20. Role of a pivot table in Pandas?

Reshape & summarize data.

Similar to Excel pivot tables.



---

21. Why is NumPy’s slicing faster than Python’s list slicing?

NumPy uses views (no data copy).

Python lists create new objects.



---

22. Common use cases for Seaborn?

Correlation analysis (heatmaps).

Regression analysis (lmplot).

Distribution visualization (distplot, boxplot).



---

# PRACTICAL

In [None]:
1. Create a 2D NumPy array and calculate the sum of each row

import numpy as np

a = np.array([[1,2,3],
              [4,5,6],
              [7,8,9]])
print("Array:\n", a)
print("Row sums:", a.sum(axis=1))

Output:

Array:
 [[1 2 3]
  [4 5 6]
  [7 8 9]]
Row sums: [ 6 15 24 ]


---

2. Pandas script to find mean of a specific column

import pandas as pd

df = pd.DataFrame({"A":[1,2,3,4], "B":[10,20,30,40]})
print("DataFrame:\n", df)
print("Mean of column B:", df["B"].mean())

Output:

DataFrame:
    A   B
0  1  10
1  2  20
2  3  30
3  4  40
Mean of column B: 25.0


---

3. Create a scatter plot using Matplotlib

import matplotlib.pyplot as plt

x = [1,2,3,4,5]
y = [2,4,6,8,10]
plt.scatter(x,y,color="red")
plt.title("Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

Output: A scatter plot with red points along a straight line.


---

4. Correlation matrix with Seaborn heatmap

import seaborn as sns

df = pd.DataFrame({
    "A":[1,2,3,4,5],
    "B":[5,4,3,2,1],
    "C":[2,3,4,5,6]
})

corr = df.corr()
sns.heatmap(corr, annot=True, cmap="coolwarm")
plt.show()

Output: Heatmap showing correlation values (A vs B is -1.0, A vs C is 1.0, etc.).


---

5. Generate a bar plot using Plotly

import plotly.express as px

df = pd.DataFrame({"Category":["A","B","C"], "Values":[10,20,15]})
fig = px.bar(df, x="Category", y="Values", title="Bar Plot")
fig.show()

Output: Interactive bar chart with 3 bars (A=10, B=20, C=15).


---

6. Create a DataFrame and add a new column

df = pd.DataFrame({"A":[10,20,30], "B":[2,4,6]})
df["C"] = df["A"] / df["B"]
print(df)

Output:

A  B    C
0  10  2  5.0
1  20  4  5.0
2  30  6  5.0


---

7. Element-wise multiplication of two NumPy arrays

a = np.array([1,2,3])
b = np.array([4,5,6])
print(a * b)

Output:

[ 4 10 18 ]


---

8. Line plot with multiple lines

import numpy as np

x = np.linspace(0,10,100)
plt.plot(x, np.sin(x), label="sin(x)")
plt.plot(x, np.cos(x), label="cos(x)")
plt.legend()
plt.title("Line Plot with Multiple Lines")
plt.show()

Output: A line graph with sin(x) and cos(x) curves.


---

9. Filter Pandas DataFrame (column > threshold)

df = pd.DataFrame({"Name":["A","B","C","D"],
                   "Marks":[45, 80, 60, 30]})
filtered = df[df["Marks"] > 50]
print(filtered)

Output:

Name  Marks
1    B     80
2    C     60


---

10. Histogram with Seaborn

sns.histplot(df["Marks"], bins=5, kde=True)
plt.title("Histogram of Marks")
plt.show()

Output: Histogram showing distribution of student marks.


---

11. Matrix multiplication using NumPy

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
print(np.dot(a,b))

Output:

[[19 22]
 [43 50]]


---

12. Load CSV and display first 5 rows

# Example CSV (replace "data.csv" with actual file)
df = pd.DataFrame({
    "ID":[1,2,3,4,5],
    "Name":["A","B","C","D","E"],
    "Score":[85,90,78,88,95]
})
print(df.head())

Output:

ID Name  Score
0   1    A     85
1   2    B     90
2   3    C     78
3   4    D     88
4   5    E     95


---

13. 3D Scatter Plot with Plotly

df = pd.DataFrame({
    "x":[1,2,3,4,5],
    "y":[10,20,30,40,50],
    "z":[5,15,25,35,45],
    "Category":["A","B","A","B","A"]
})
fig = px.scatter_3d(df, x="x", y="y", z="z", color="Category", size="z")
fig.show()

Output: Interactive 3D scatter plot (points colored by category).