#Theory Questions

1. What is NumPy, and why is it widely used in Python?
-->NumPy is a powerful library for numerical computing in Python. It provides efficient array operations, mathematical functions, and supports high-performance multi-dimensional data processing.

2. How does broadcasting work in NumPy?
-->Broadcasting allows operations between arrays of different shapes by automatically expanding the smaller array to match the larger one without copying data.

3. What is a Pandas DataFrame?
-->A DataFrame is a 2D labeled data structure in Pandas, similar to a table in SQL or Excel, with rows and columns for easy data manipulation and analysis.

4. Explain the use of the groupby() method in Pandas.
-->Groupby() splits data into groups based on some criteria, applies a function (like sum, mean), and then combines the results, useful for aggregation and analysis.

5. Why is Seaborn preferred for statistical visualizations?
-->Seaborn provides attractive, high-level interfaces for drawing informative and beautiful statistical graphics with less code than Matplotlib.

6. What are the differences between NumPy arrays and Python lists?
-->NumPy arrays are faster, use less memory, and support vectorized operations, while Python lists are more flexible but slower and not suited for numerical computation.

7. What is a heatmap, and when should it be used?
-->A heatmap is a 2D graphical representation of data where values are represented by color, useful for showing correlation matrices and patterns in data.

8. What does the term “vectorized operation” mean in NumPy?
-->It refers to performing operations on entire arrays without explicit loops, which is faster and more efficient due to underlying C optimizations.

9. How does Matplotlib differ from Plotly?
-->Matplotlib is static and widely used for traditional plots; Plotly creates interactive, web-friendly visualizations with zoom and hover features.

10. What is the significance of hierarchical indexing in Pandas?
-->Hierarchical indexing allows multiple levels of indexing on rows or columns, enabling more complex data structures and flexible data access.

11. What is the role of Seaborn’s pairplot() function?
-->pairplot() creates scatter plots for all numeric variable pairs in a dataset, helpful for exploring relationships and distributions at once.

12. What is the purpose of the describe() function in Pandas?
-->describe() provides summary statistics (mean, count, std, etc.) for each numeric column, giving a quick overview of the dataset.

13. Why is handling missing data important in Pandas?
-->Missing data can skew analysis or cause errors; Pandas provides methods to detect, fill, or drop missing values to maintain data integrity.

14. What are the benefits of using Plotly for data visualization?
-->Plotly offers interactive, web-ready visualizations, supports dashboards, and handles complex chart types with ease, ideal for presentations.

15. How does NumPy handle multidimensional arrays?
-->NumPy uses the ndarray object to support N-dimensional arrays with efficient storage, indexing, and broadcasting capabilities.

16. What is the role of Bokeh in data visualization?
-->Bokeh is a Python library for creating interactive, browser-based visualizations with ease, especially useful for dashboards and web apps.

17. Explain the difference between apply() and map() in Pandas.
-->map() is used for element-wise operations on Series, while apply() works on rows or columns in DataFrames or Series for more complex functions.

18. What are some advanced features of NumPy?
-->NumPy supports linear algebra, Fourier transforms, random number generation, and memory-mapped arrays for large data processing.

19. How does Pandas simplify time series analysis?
-->Pandas provides powerful time series tools like resampling, time-based indexing, date range generation, and rolling window operations.

20. What is the role of a pivot table in Pandas?
-->Pivot tables summarize data by reorganizing it based on categories, allowing aggregation and comparison across different dimensions.

21. Why is NumPy’s array slicing faster than Python’s list slicing?
-->NumPy slices return views (not copies) and use efficient C-level memory access, making operations significantly faster than Python lists.

22. What are some common use cases for Seaborn?
-->Seaborn is commonly used for visualizing distributions, correlations, regression plots, categorical plots, and statistical relationships.



#Practical Questions

1. How do you create a 2D NumPy array and calculate the sum of each row?

In [1]:
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
row_sums = np.sum(arr, axis=1)
print(row_sums)

2. Write a Pandas script to find the mean of a specific column in a DataFrame

In [None]:
df = pd.DataFrame({'Scores': [85, 90, 78, 92]})
mean_score = df['Scores'].mean()
print(mean_score)

3. Create a scatter plot using Matplotlib

In [None]:
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.scatter(x, y)
plt.title("Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

In [None]:
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [4, 3, 2, 1],
    'C': [5, 6, 7, 8]
})
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()


5. Generate a bar plot using Plotly

In [None]:
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Cherry'], 'Count': [10, 20, 15]})
fig = px.bar(df, x='Fruit', y='Count', title='Fruit Count')
fig.show()


6. Create a DataFrame and add a new column based on an existing column

In [None]:
df = pd.DataFrame({'Score': [70, 80, 90]})
df['Passed'] = df['Score'] > 75
print(df)


7. Write a program to perform element-wise multiplication of two NumPy arrays

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a * b
print(result)


8. Create a line plot with multiple lines using Matplotlib

In [None]:
x = [1, 2, 3, 4]
y1 = [10, 15, 20, 25]
y2 = [5, 10, 15, 20]

plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.title("Multiple Lines")
plt.show()


9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold

In [None]:
df = pd.DataFrame({'Age': [18, 25, 30, 17]})
filtered_df = df[df['Age'] > 20]
print(filtered_df)

10. Create a histogram using Seaborn to visualize a distribution

In [None]:
data = [10, 20, 20, 30, 40, 50, 60, 70]
sns.histplot(data, bins=5, kde=True)
plt.title("Histogram with KDE")
plt.show()

11. Perform matrix multiplication using NumPy

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
product = np.dot(a, b)
print(product)

12. Use Pandas to load a CSV file and display its first 5 rows

In [None]:
df = pd.read_csv('your_file.csv')  # replace with actual file path
print(df.head())

13. Create a 3D scatter plot using Plotly

In [None]:
df = pd.DataFrame({
    'x': [1, 2, 3, 4],
    'y': [10, 20, 25, 30],
    'z': [5, 15, 20, 35]
})
fig = px.scatter_3d(df, x='x', y='y', z='z', title='3D Scatter Plot')
fig.show()