#Theoretical Questions

**Q1- What is NumPy, and why is it widely used in Python?**
- NumPy is a powerful Python library for numerical computing. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on them efficiently.

**Q2- How does broadcasting work in NumPy?**
- Broadcasting allows NumPy to perform arithmetic operations on arrays of different shapes by automatically expanding the smaller array to match the shape of the larger one.


**Q3- What is a Pandas DataFrame?**
- A Pandas DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns), ideal for data manipulation and analysis.

**Q4- Explain the use of the groupby() method in Pandas.**
- The groupby() method in Pandas splits data into groups based on some criteria, applies a function to each group, and combines the results, useful for aggregation and transformation.

**Q5- Why is Seaborn preferred for statistical visualizations?**
- Seaborn is preferred for its high-level interface and aesthetically pleasing statistical plots. It integrates well with Pandas and simplifies complex visualizations like heatmaps and regression plots.

**Q6- What are the differences between NumPy arrays and Python lists?**
- NumPy arrays are faster, more memory-efficient, and support vectorized operations, while Python lists are more flexible but slower and less suitable for numerical computations.

**Q7- What is a heatmap, and when should it be used?**
- A heatmap is a graphical representation of data where values are depicted by color. It’s useful for visualizing correlation matrices or frequency distributions.

**Q8- What does the term “vectorized operation” mean in NumPy?**
- Vectorized operations refer to performing operations on entire arrays without explicit loops, resulting in faster and more concise code.

**Q9- How does Matplotlib differ from Plotly?**
- Matplotlib is a static plotting library ideal for basic visualizations, while Plotly offers interactive, web-based plots with advanced features like zooming and tooltips.

**Q10- What is the significance of hierarchical indexing in Pandas?**
- Hierarchical indexing allows multiple levels of indexing in a DataFrame, enabling complex data organization and easier access to nested data.

**Q11- What is the role of Seaborn’s pairplot() function?**
- Seaborn’s pairplot() creates scatter plots for all pairwise combinations of variables in a dataset, helping visualize relationships and distributions.

**Q12- What is the purpose of the describe() function in Pandas?**
- The describe() function provides summary statistics of numerical columns, including count, mean, standard deviation, min, and max values.

**Q13- Why is handling missing data important in Pandas?**
- Handling missing data ensures data integrity and accuracy in analysis. Pandas offers functions like dropna() and fillna() to manage missing values.

**Q14- What are the benefits of using Plotly for data visualization?**
- Plotly provides interactive, publication-quality graphs with features like zoom, hover info, and real-time updates, ideal for dashboards and web apps.

**Q15- How does NumPy handle multidimensional arrays?**
- NumPy uses the ndarray object to handle multidimensional arrays, supporting operations across axes and efficient reshaping, slicing, and broadcasting.

**Q16- What is the role of Bokeh in data visualization?**
- Bokeh is a Python library for creating interactive visualizations in web browsers. It supports streaming data and complex layouts.

**Q17- Explain the difference between apply() and map() in Pandas.**
- apply() works on DataFrame rows or columns with custom functions, while map() is used for element-wise operations on Series.

**Q18- What are some advanced features of NumPy?**
- Advanced NumPy features include broadcasting, masked arrays, memory mapping, linear algebra operations, and integration with C/C++ for performance.

**Q19- How does Pandas simplify time series analysis?**
- Pandas simplifies time series analysis with datetime indexing, resampling, frequency conversion, and built-in functions for rolling statistics.

**Q20- What is the role of a pivot table in Pandas?**
- Pivot tables summarize data by grouping and aggregating values, making it easier to analyze patterns and trends across categories.

**Q21- Why is NumPy’s array slicing faster than Python’s list slicing?**
- NumPy’s array slicing is faster due to contiguous memory allocation and optimized C-based implementation, unlike Python’s dynamic list structure.

**Q22- What are some common use cases for Seaborn?**
- Seaborn is commonly used for visualizing distributions, correlations, categorical data, regression analysis, and creating heatmaps and pair plots.

#Practical Questions

In [None]:
#Q1- How do you create a 2D NumPy array and calculate the sum of each row?
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
row_sums = np.sum(arr, axis=1)
print("Row sums:", row_sums)

In [None]:
#Q2- Write a Pandas script to find the mean of a specific column in a DataFrame.
import pandas as pd
data = {'A': [10, 20, 30], 'B': [15, 25, 35]}
df = pd.DataFrame(data)
mean_B = df['B'].mean()
print("Mean of column B:", mean_B)

In [None]:
#Q3- Create a scatter plot using Matplotlib.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 8, 7]
plt.scatter(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot")
plt.show()

In [None]:
#Q4- How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?
import seaborn as sns
import pandas as pd
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [4, 3, 2, 1],
    'C': [2, 3, 4, 5]
})
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()

In [None]:
#Q5- Generate a bar plot using Plotly.
import plotly.express as px
data = {'Fruits': ['Apple', 'Banana', 'Cherry'], 'Quantity': [10, 15, 7]}
df = pd.DataFrame(data)
fig = px.bar(df, x='Fruits', y='Quantity', title="Fruit Quantity")
fig.show()

In [None]:
#Q6- Create a DataFrame and add a new column based on an existing column.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4]})
df['B'] = df['A'] * 2
print(df)

In [None]:
#Q7- Write a program to perform element-wise multiplication of two NumPy arrays.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = np.multiply(a, b)
print("Element-wise multiplication:", result)

In [None]:
#Q8- Create a line plot with multiple lines using Matplotlib.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y1 = [1, 4, 9, 16]
y2 = [2, 3, 5, 7]
plt.plot(x, y1, label='y1 = x^2')
plt.plot(x, y2, label='y2 = linear')
plt.legend()
plt.title("Multiple Line Plot")
plt.show()

In [None]:
#Q9- Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.
import pandas as pd
df = pd.DataFrame({'Score': [45, 67, 89, 34, 76]})
filtered_df = df[df['Score'] > 50]
print(filtered_df)

In [None]:
#Q10- Create a histogram using Seaborn to visualize a distribution.
import seaborn as sns
import matplotlib.pyplot as plt
data = [1, 2, 2, 3, 3, 3, 4, 4, 5]
sns.histplot(data, bins=5, kde=True)
plt.title("Histogram")
plt.show()

In [None]:
#Q11- Perform matrix multiplication using NumPy.
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
result = np.dot(A, B)
print("Matrix multiplication result:\n", result)

In [None]:
#Q12- Use Pandas to load a CSV file and display its first 5 rows.
import pandas as pd
df = pd.read_csv("fileName.csv")
print(df.head())

In [None]:
#Q13- Create a 3D scatter plot using Plotly.
import plotly.express as px
import pandas as pd
df = pd.DataFrame({
    'x': [1, 2, 3, 4],
    'y': [10, 11, 12, 13],
    'z': [100, 200, 300, 400]
})
fig = px.scatter_3d(df, x='x', y='y', z='z', title="3D Scatter Plot")
fig.show()