1. What is NumPy, and why is it widely used in Python?

NumPy (Numerical Python) is a powerful library for numerical computations. It's widely used for its support of large multi-dimensional arrays and matrices, and for its collection of mathematical functions that operate efficiently on these arrays.


---

2. How does broadcasting work in NumPy?

Broadcasting allows NumPy to perform operations on arrays of different shapes. It automatically expands the smaller array to match the shape of the larger one during operations like addition or multiplication.


---

3. What is a Pandas DataFrame?

A DataFrame is a 2D labeled data structure in Pandas with columns of potentially different types, similar to a table in SQL or Excel.


---

4. Explain the use of the groupby() method in Pandas

The groupby() method splits data into groups based on some criteria, then applies a function (like sum, mean) to each group independently and combines the results.

5. Why is Seaborn preferred for statistical visualizations?

Seaborn is built on top of Matplotlib and provides a high-level interface for drawing attractive and informative statistical graphics with less code.


---

6. What are the differences between NumPy arrays and Python lists?

NumPy arrays are faster and more memory efficient.

They support vectorized operations, unlike lists.

NumPy arrays are homogeneous (same data type), lists are heterogeneous.



---

7. What is a heatmap, and when should it be used?

A heatmap is a graphical representation of data using color to represent values. It's useful for visualizing correlation matrices and data density.


---

8. What does the term “vectorized operation” mean in NumPy?

A vectorized operation allows array operations without explicit loops, improving performance by applying operations to entire arrays at once.


---

9. How does Matplotlib differ from Plotly?

Matplotlib is static and better for print/publication.

Plotly offers interactive plots ideal for web-based dashboards.



---

10. What is the significance of hierarchical indexing in Pandas?

Hierarchical indexing allows multi-level (nested) indexing of data, enabling advanced data manipulation and subsetting in complex datasets.


---

11. What is the role of Seaborn’s pairplot() function?

pairplot() creates a matrix of scatter plots to visualize pairwise relationships in a dataset, often used in exploratory data analysis.
12. What is the purpose of the describe() function in Pandas?

The describe() function provides summary statistics of a DataFrame, including count, mean, std deviation, min, and max values.


---

13. Why is handling missing data important in Pandas?

Missing data can bias or distort results. Pandas provides functions like fillna() and dropna() to clean data before analysis.


---

14. What are the benefits of using Plotly for data visualization?

Plotly offers interactivity, beautiful design, support for 3D plotting, and is well-suited for web applications and dashboards.


---

15. How does NumPy handle multidimensional arrays?

NumPy supports ndarrays, which are n-dimensional arrays. It provides functions for reshaping, slicing, and mathematical operations across dimensions.

16. What is the role of Bokeh in data visualization?

Bokeh is used for creating interactive and real-time plots for web applications. It allows high-performance streaming and dynamic visualizations.


---

17. Explain the difference between apply() and map() in Pandas

map() is used for element-wise transformations on a Series.

apply() is more flexible and can be used on DataFrames as well, applying functions along an axis.



---

18. What are some advanced features of NumPy?

Broadcasting

Memory mapping

Masked arrays

Structured arrays

Vectorized computations

19. How does Pandas simplify time series analysis?

Pandas has robust support for date/time indexing, resampling, shifting, and window functions, making time series manipulation and analysis easy.


---

20. What is the role of a pivot table in Pandas?

A pivot table summarizes data by aggregating it based on two dimensions (index and columns), making it easier to analyze patterns.


---

21. Why is NumPy’s array slicing faster than Python’s list slicing?

NumPy uses contiguous memory blocks and optimized C backend, enabling faster slicing without creating copies like Python lists.


---

22. What are some common use cases for Seaborn?

Visualizing distributions (histplot, kdeplot)

Pairwise relationships (pairplot)
Categorical data (boxplot, violinplot)

Heatmaps and correlation matrices


1. How do you create a 2D NumPy array and calculate the sum of each row

import numpy as np



In [None]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
row_sums = np.sum(arr, axis=1)
print(row_sums)


2. Write a Pandas script to find the mean of a specific column in a DataFrame

import pandas as pd



In [None]:
df = pd.DataFrame({'A': [10, 20, 30], 'B': [5, 15, 25]})
mean_B = df['B'].mean()
print(mean_B)



3. Create a scatter plot using Matplotlib

import matplotlib.pyplot as plt


In [None]:
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]

plt.scatter(x, y)
plt.title('Scatter Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()



4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt


In [None]:
data = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

corr = data.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.show()


5. Generate a bar plot using Plotly

import plotly.express as px
import pandas as pd


In [None]:
df = pd.DataFrame({
    'Category': ['A', 'B', 'C'],
    'Values': [10, 20, 15]
})

fig = px.bar(df, x='Category', y='Values', title='Bar Plot')
fig.show()


6. Create a DataFrame and add a new column based on an existing column

import pandas as pd


In [None]:
df = pd.DataFrame({'Salary': [1000, 2000, 3000]})
df['Bonus'] = df['Salary'] * 0.1
print(df)


7. Write a program to perform element-wise multiplication of two NumPy arrays

import numpy as np



In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a * b
print(result)



8. Create a line plot with multiple lines using Matplotlib

import matplotlib.pyplot as plt



In [None]:
x = [1, 2, 3, 4]
y1 = [1, 4, 9, 16]
y2 = [2, 3, 5, 7]

plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.title('Multiple Lines')
plt.show()



9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold

import pandas as pd



In [None]:
df = pd.DataFrame({'Age': [20, 30, 40, 25]})
filtered = df[df['Age'] > 25]
print(filtered)


10. Create a histogram using Seaborn to visualize a distribution

import seaborn as sns
import matplotlib.pyplot as plt


In [None]:
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
sns.histplot(data, bins=4, kde=True)
plt.show()



11. Perform matrix multiplication using NumPy

import numpy as np


In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
result = np.dot(a, b)
print(result)


12. Use Pandas to load a CSV file and display its first 5 rows

import pandas as pd



In [None]:
df = pd.read_csv('filename.csv')  # Replace with actual path
print(df.head())



13. Create a 3D scatter plot using Plotly

import plotly.express as px
import pandas as pd



In [None]:
df = pd.DataFrame({
    'x': [1, 2, 3],
    'y': [4, 5, 6],
    'z': [7, 8, 9]
})

fig = px.scatter_3d(df, x='x', y='y', z='z', title='3D Scatter Plot')
fig.show()