
## **Theoretical Questions with Answers**

### **1. What is NumPy, and why is it widely used in Python?**  
   - NumPy (Numerical Python) is a library for numerical computing in Python. It is widely used because it provides fast operations on large arrays and matrices, supports mathematical functions, and offers efficient memory usage.

---

### **2. How does broadcasting work in NumPy?**  
   - Broadcasting allows operations on arrays of different shapes without explicit replication. It automatically expands smaller arrays to match larger arrays in element-wise operations.

---

### **3. What is a Pandas DataFrame?**  
   - A Pandas DataFrame is a two-dimensional, labeled data structure similar to a table in a database, Excel spreadsheet, or R DataFrame.

---

### **4. Explain the use of the `groupby()` method in Pandas.**  
   - The `groupby()` method is used to group data based on one or more columns and apply aggregate functions like sum, mean, or count.

---

### **5. Why is Seaborn preferred for statistical visualizations?**  
   - Seaborn is preferred because it provides built-in statistical plots, better aesthetics, and simplifies complex visualizations with minimal code.

---

### **6. What are the differences between NumPy arrays and Python lists?**  
   - NumPy arrays are **faster**, support **vectorized operations**, and use **less memory** than Python lists, which store heterogeneous data and require more loops for computations.

---

### **7. What is a heatmap, and when should it be used?**  
   - A heatmap is a graphical representation of data using colors. It is used to visualize correlations, distributions, and relationships in a matrix format.

---

### **8. What does the term “vectorized operation” mean in NumPy?**  
   - A vectorized operation applies functions to entire NumPy arrays without explicit loops, making computations faster and more efficient.

---

### **9. How does Matplotlib differ from Plotly?**  
   - **Matplotlib** is a static plotting library used for basic visualizations, while **Plotly** is an interactive library that supports dynamic and web-based visualizations.

---

### **10. What is the significance of hierarchical indexing in Pandas?**  
   - Hierarchical indexing allows multiple levels of indexing, making it easier to work with complex, multi-dimensional datasets.

---

### **11. What is the role of Seaborn’s `pairplot()` function?**  
   - The `pairplot()` function creates pairwise scatterplots for all numerical columns in a dataset, helping visualize relationships between variables.

---

### **12. What is the purpose of the `describe()` function in Pandas?**  
   - The `describe()` function provides summary statistics (mean, median, standard deviation, etc.) for numerical columns in a DataFrame.

---

### **13. Why is handling missing data important in Pandas?**  
   - Handling missing data is crucial to prevent biased analysis and errors. Techniques like imputation, dropping rows, or using interpolation can be used.

---

### **14. What are the benefits of using Plotly for data visualization?**  
   - Plotly supports interactive plots, web-based visualizations, real-time updates, and high-quality graphics.

---

### **15. How does NumPy handle multidimensional arrays?**  
   - NumPy represents multidimensional arrays as `ndarray` objects, allowing efficient storage and manipulation of data across multiple dimensions.

---

### **16. What is the role of Bokeh in data visualization?**  
   - Bokeh is a visualization library used for creating interactive, web-friendly plots with real-time updates.

---

### **17. Explain the difference between `apply()` and `map()` in Pandas.**  
   - `apply()` applies a function along an axis of the DataFrame (rows/columns), while `map()` applies a function element-wise to a Series.

---

### **18. What are some advanced features of NumPy?**  
   - Advanced features include broadcasting, vectorized operations, random sampling, linear algebra functions, and Fourier transforms.

---

### **19. How does Pandas simplify time series analysis?**  
   - Pandas provides date-time indexing, resampling, rolling windows, and built-in functions to analyze time series data efficiently.

---

### **20. What is the role of a pivot table in Pandas?**  
   - A pivot table summarizes large datasets by aggregating values based on multiple categories.

---

### **21. Why is NumPy’s array slicing faster than Python’s list slicing?**  
   - NumPy arrays store elements in contiguous memory locations, allowing efficient access and slicing, while Python lists store references, increasing overhead.

---

### **22. What are some common use cases for Seaborn?**  
   - Seaborn is commonly used for correlation heatmaps, categorical plots, violin plots, pair plots, and regression visualizations.

---

## **Practical Questions with Answers**

### **1. How do you create a 2D NumPy array and calculate the sum of each row?**
```python
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
row_sums = np.sum(arr, axis=1)
print(row_sums)
```

---

### **2. Write a Pandas script to find the mean of a specific column in a DataFrame.**
```python
import pandas as pd

data = {'A': [10, 20, 30], 'B': [40, 50, 60]}
df = pd.DataFrame(data)
mean_value = df['A'].mean()
print(mean_value)
```

---

### **3. Create a scatter plot using Matplotlib.**
```python
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 50]

plt.scatter(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot Example")
plt.show()
```

---

### **4. Calculate the correlation matrix using Seaborn and visualize it with a heatmap.**
```python
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = np.random.rand(5, 5)
df = pd.DataFrame(data, columns=["A", "B", "C", "D", "E"])
corr_matrix = df.corr()

sns.heatmap(corr_matrix, annot=True, cmap="coolwarm")
plt.show()
```

---

### **5. Generate a bar plot using Plotly.**
```python
import plotly.express as px
import pandas as pd

data = {'Category': ['A', 'B', 'C'], 'Values': [10, 20, 30]}
df = pd.DataFrame(data)

fig = px.bar(df, x='Category', y='Values', title="Bar Plot Example")
fig.show()
```

---

### **6. Create a DataFrame and add a new column based on an existing column.**
```python
df['C'] = df['A'] * 2
print(df)
```

---

### **7. Perform element-wise multiplication of two NumPy arrays.**
```python
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

result = arr1 * arr2
print(result)
```

---

### **8. Create a line plot with multiple lines using Matplotlib.**
```python
plt.plot(x, y, label="Line 1")
plt.plot(x, [i * 2 for i in y], label="Line 2")
plt.legend()
plt.show()
```

---

### **9. Filter rows where a column value is greater than a threshold.**
```python
filtered_df = df[df['A'] > 15]
print(filtered_df)
```

---

### **10. Create a histogram using Seaborn.**
```python
sns.histplot(df['A'], bins=5, kde=True)
plt.show()
```

---

### **11. Perform matrix multiplication using NumPy.**
```python
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

result = np.dot(matrix1, matrix2)
print(result)
```

---

### **12. Load a CSV file and display its first 5 rows.**
```python
df = pd.read_csv("data.csv")
print(df.head())
```

---

### **13. Create a 3D scatter plot using Plotly.**
```python
import plotly.express as px
import pandas as pd

df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6], 'z': [7, 8, 9]})
fig = px.scatter_3d(df, x='x', y='y', z='z', title="3D Scatter Plot")
fig.show()
```
