# Data Toolkit – Assignment Solutions



## Theoretical Questions and Answers

### Q: What is NumPy, and why is it widely used in Python?
**Answer:**
NumPy is a core Python library for numerical computing. It provides support for large multidimensional arrays and matrices. NumPy is widely used because it performs mathematical operations efficiently. It also serves as the foundation for many data science libraries.


### Q: How does broadcasting work in NumPy?
**Answer:**
Broadcasting allows NumPy to operate on arrays of different shapes. It automatically expands smaller arrays without copying data. This results in faster and memory-efficient operations. Broadcasting simplifies array-based calculations.


### Q: What is a Pandas DataFrame?
**Answer:**
A Pandas DataFrame is a two-dimensional labeled data structure similar to a table. It contains rows and columns with different data types. DataFrames allow easy data manipulation and analysis. They are widely used in data science.


### Q: Explain the use of the groupby() method in Pandas.
**Answer:**
The groupby() method groups data based on column values. It allows aggregation operations such as sum, mean, and count. This method is useful for analyzing subsets of data. It helps summarize large datasets.


### Q: Why is Seaborn preferred for statistical visualizations?
**Answer:**
Seaborn provides a high-level interface for creating statistical plots. It offers attractive default styles and color palettes. Seaborn integrates well with Pandas DataFrames. It simplifies complex statistical visualization tasks.


### Q: What are the differences between NumPy arrays and Python lists?
**Answer:**
NumPy arrays store elements of the same data type, while Python lists can store mixed types. NumPy arrays are faster for numerical operations. Lists are more flexible but slower. NumPy supports vectorized operations.


### Q: What is a heatmap, and when should it be used?
**Answer:**
A heatmap represents data using colors to show intensity. It is commonly used for correlation matrices. Heatmaps help identify patterns and relationships. They are useful in exploratory data analysis.


### Q: What does the term vectorized operation mean in NumPy?
**Answer:**
Vectorized operations apply computations to entire arrays at once. They eliminate the need for explicit loops. This improves performance significantly. Vectorization also improves code readability.


### Q: How does Matplotlib differ from Plotly?
**Answer:**
Matplotlib is used for static visualizations. Plotly provides interactive and dynamic charts. Plotly supports zooming and hover features. Matplotlib is simpler and widely used.


### Q: What is the significance of hierarchical indexing in Pandas?
**Answer:**
Hierarchical indexing allows multiple index levels in a DataFrame. It helps represent high-dimensional data. This feature enables advanced slicing and grouping. It improves data organization.


### Q: What is the role of Seaborn’s pairplot() function?
**Answer:**
pairplot() visualizes pairwise relationships between variables. It generates scatter plots and histograms automatically. Pairplot helps identify correlations. It is useful for exploratory analysis.


### Q: What is the purpose of the describe() function in Pandas?
**Answer:**
The describe() function generates summary statistics for numerical data. It includes count, mean, and percentiles. This helps understand data distribution. It is useful for quick data inspection.


### Q: Why is handling missing data important in Pandas?
**Answer:**
Missing data can lead to incorrect analysis results. Handling missing values improves data quality. Pandas provides tools to fill or remove missing data. This ensures accurate insights.


### Q: What are the benefits of using Plotly for data visualization?
**Answer:**
Plotly provides interactive and visually appealing plots. It supports complex visualizations like 3D charts. Plotly is suitable for web applications. It enhances user engagement.


### Q: How does NumPy handle multidimensional arrays?
**Answer:**
NumPy uses ndarray to represent multidimensional arrays. These arrays are stored efficiently in memory. NumPy supports slicing and indexing. This enables fast numerical operations.


### Q: What is the role of Bokeh in data visualization?
**Answer:**
Bokeh is used to create interactive visualizations for web browsers. It supports real-time data streaming. Bokeh generates JavaScript-based plots from Python. It is useful for dashboards.


### Q: Explain the difference between apply() and map() in Pandas.
**Answer:**
map() performs element-wise operations on Series objects. apply() works on Series and DataFrames. apply() can operate row-wise or column-wise. map() is simpler but limited.


### Q: What are some advanced features of NumPy?
**Answer:**
NumPy supports broadcasting and vectorization. It provides linear algebra and random modules. NumPy allows memory-mapped files. These features make it powerful.


### Q: How does Pandas simplify time series analysis?
**Answer:**
Pandas supports datetime indexing and resampling. It allows rolling and shifting operations. Pandas simplifies time-based slicing. It is widely used in financial analysis.


### Q: What is the role of a pivot table in Pandas?
**Answer:**
A pivot table summarizes data using aggregation. It rearranges data for analysis. Pivot tables help identify trends. They are used in reporting.


### Q: Why is NumPy’s array slicing faster than Python’s list slicing?
**Answer:**
NumPy slicing returns views instead of copies. This reduces memory usage. Operations run at C-level speed. This makes NumPy slicing faster.


### Q: What are some common use cases for Seaborn?
**Answer:**
Seaborn is used for statistical data visualization. It is helpful for correlation analysis. Seaborn produces high-quality plots. It is widely used in data exploration.


## Practical Questions and Programs

### Q: How do you create a 2D NumPy array and calculate the sum of each row?

In [None]:
import numpy as np
arr = np.array([[1,2,3],[4,5,6]])
print(arr.sum(axis=1))

### Q: Write a Pandas script to find the mean of a specific column in a DataFrame.

In [None]:
import pandas as pd
df = pd.DataFrame({'A':[1,2,3]})
print(df['A'].mean())

### Q: Create a scatter plot using Matplotlib.

In [None]:
import matplotlib.pyplot as plt
plt.scatter([1,2,3],[4,5,6])
plt.show()

### Q: How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

In [None]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
sns.heatmap(df.corr(), annot=True)
plt.show()

### Q: Generate a bar plot using Plotly.

In [None]:
import plotly.express as px
df = px.data.tips()
fig = px.bar(df, x='day', y='total_bill')
fig.show()

### Q: Create a DataFrame and add a new column based on an existing column.

In [None]:
import pandas as pd
df=pd.DataFrame({'A':[1,2,3]})
df['B']=df['A']*2
print(df)

### Q: Write a program to perform element-wise multiplication of two NumPy arrays.

In [None]:
import numpy as np
a=np.array([1,2,3])
b=np.array([4,5,6])
print(a*b)

### Q: Create a line plot with multiple lines using Matplotlib.

In [1]:
import matplotlib.pyplot as plt
plt.plot([1,2,3],[1,4,9])
plt.plot([1,2,3],[2,3,4])
plt.show()

ModuleNotFoundError: No module named 'matplotlib'

### Q: Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.

In [None]:
import pandas as pd
df=pd.DataFrame({'A':[1,5,10]})
print(df[df['A']>4])

### Q: Create a histogram using Seaborn to visualize a distribution.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.histplot([1,2,2,3,3,3])
plt.show()

### Q: Perform matrix multiplication using NumPy.

In [None]:
import numpy as np
a=np.array([[1,2],[3,4]])
b=np.array([[5,6],[7,8]])
print(a.dot(b))

### Q: Use Pandas to load a CSV file and display its first 5 rows.

In [None]:
import pandas as pd
# df = pd.read_csv('file.csv')
# print(df.head())

### Q: Create a 3D scatter plot using Plotly.

In [None]:
import plotly.express as px
import pandas as pd
df=pd.DataFrame({'x':[1,2,3],'y':[4,5,6],'z':[7,8,9]})
fig=px.scatter_3d(df,x='x',y='y',z='z')
fig.show()