### Q1. What is NumPy, and why is it widely used in Python?

NumPy (Numerical Python) is a foundational library for fast numerical computing. It provides the `ndarray`, a contiguous, homogeneous, fixed‑type, N‑dimensional array that supports vectorized operations implemented in C. Compared with native Python lists, NumPy arrays use far less memory, enable SIMD/vectorized math, broadcasting, and interoperate with C/Fortran code. This makes tasks like linear algebra, FFTs, random sampling, and element‑wise computations dramatically faster and more concise. NumPy also serves as the backbone for pandas, SciPy, scikit‑learn, and many other libraries, which is why it’s ubiquitous in the Python data stack.

### Q2. How does broadcasting work in NumPy?

Broadcasting lets arrays of different shapes participate in arithmetic by *logically* expanding dimensions of size 1 without copying data. NumPy compares shapes from right to left; two dimensions are compatible if they are equal or either is 1. When an axis is 1, its values are conceptually repeated to match the other shape for the operation. Broadcasting reduces boilerplate like manual tiling, and it remains memory‑efficient because NumPy does not actually materialize the repeated data.

### Q3. What is a pandas DataFrame?

A pandas DataFrame is a 2‑D labeled, tabular data structure with rows and columns. Each column is a `Series` with its own data type, supporting heterogeneous data across columns. DataFrames offer rich indexing (labels, boolean masks, loc/iloc), alignment, missing‑data handling, joins/merges, groupby, reshaping (pivot/melt), time‑series features, and I/O to formats like CSV, Excel, SQL, and Parquet—making them a high‑level tool for data wrangling and analysis.

### Q4. Explain the use of the `groupby()` method in pandas.

`groupby()` splits data into groups based on keys (one or more columns), applies a function to each group (aggregate, transform, or filter), and combines results. Typical uses include aggregations (e.g., sum, mean, count), windowed computations within categories, and per‑group transformations (e.g., z‑scores within each category). It’s central for analyzing patterns across categories such as sales by region or averages per class.

### Q5. Why is Seaborn preferred for statistical visualizations?

Seaborn sits atop Matplotlib and streamlines statistical plotting with sensible defaults, themeing, and functions that accept tidy DataFrames. It integrates estimation and confidence intervals, categorical plotting, and distribution plots with fewer lines of code. It also manages color palettes and facets well. In short, it lowers the effort to produce statistically informative visuals with consistent aesthetics.

### Q6. What are the differences between NumPy arrays and Python lists?

• **Homogeneity:** NumPy arrays store one fixed dtype; lists hold arbitrary Python objects.
• **Memory/Layout:** Arrays are contiguous blocks enabling SIMD; lists are arrays of pointers.
• **Speed:** Vectorized array ops run in optimized C; list loops are Python‑level and slower.
• **Functionality:** Arrays support broadcasting, slicing views, linear algebra, FFTs; lists do not.
• **Safety:** Fixed dtype catches type issues early; lists allow mixed types which may hide errors.

### Q7. What is a heatmap, and when should it be used?

A heatmap is a color‑encoded matrix where cell color represents magnitude. It’s ideal for visualizing correlation matrices, confusion matrices, pivot tables, or any 2‑D grid of values to reveal patterns, clusters, or outliers at a glance. Annotations and colorbars aid interpretation.

### Q8. What does the term “vectorized operation” mean in NumPy?

A vectorized operation applies a computation across entire arrays without explicit Python loops. The work is offloaded to compiled, low‑level routines (often leveraging BLAS/SIMD), yielding cleaner code and substantial speedups. Examples include `a+b`, `a*b`, `np.dot`, or `np.exp(a)`.

### Q9. How does Matplotlib differ from Plotly?

Matplotlib is a static plotting library with fine‑grained control and a vast ecosystem; it outputs publication‑quality figures (PNG, PDF, SVG). Plotly focuses on interactive, web‑native graphics (zoom/hover/tooltips) that can be embedded as HTML dashboards. Choose Matplotlib for static reports and traditional scientific workflows; choose Plotly for interactive exploration and sharing.

### Q10. What is the significance of hierarchical indexing in pandas?

Hierarchical (MultiIndex) indexing allows multiple levels of row/column labels (e.g., (city, year)). It supports compact representation of high‑dimensional data, enables partial indexing/slicing across levels, and makes reshaping (stack/unstack) flexible. This is especially helpful for panel/time‑by‑category analyses.

### Q11. What is the role of Seaborn’s `pairplot()` function?

`pairplot()` draws pairwise relationships for a DataFrame (scatter plots for each pair of numeric variables and hist/ KDE on the diagonal). It’s useful for quick EDA to spot linear/nonlinear associations, clusters, or anomalies; hue can reveal class structure across variables.

### Q12. What is the purpose of the `describe()` function in pandas?

`describe()` provides summary statistics for numeric (and optionally categorical) columns—count, mean, std, min, quartiles, max (and unique/top/freq for categoricals). It’s a fast way to sanity‑check ranges, detect missingness, and understand distribution shape.

### Q13. Why is handling missing data important in pandas?

Missing values can bias statistics, break models, and distort visualizations. Proper handling—detecting with `isna()`, imputing (mean/median/model‑based), dropping, or flagging—ensures analyses remain valid and reproducible. pandas provides `fillna`, `dropna`, interpolation, and informative NA‑aware operations.

### Q14. What are the benefits of using Plotly for data visualization?

Plotly offers interactivity (hover, zoom, pan, selection), rich 2‑D/3‑D chart types, web embedding, and seamless export to HTML. It accelerates exploratory analysis and stakeholder communication by making insights discoverable via tooltips and interactions without extra coding.

### Q15. How does NumPy handle multidimensional arrays?

NumPy generalizes arrays to N dimensions with a shape tuple and strides describing memory layout. Operations respect broadcasting rules across axes. Views allow slicing without copying; `axis` parameters control reductions (e.g., sum over rows vs columns). Many linear‑algebra routines support 2‑D/stacked arrays (batched operations).

### Q16. What is the role of Bokeh in data visualization?

Bokeh is a Python library for interactive, browser‑based visualizations rendered via HTML/JS. It excels at dashboards with linked brushing and server‑backed apps that update plots from Python callbacks, bridging static plotting and full web frameworks with relatively little code.

### Q17. Explain the difference between `apply()` and `map()` in pandas.

`Series.map(func)` transforms element‑wise on a single Series (or uses a dict for remapping). `DataFrame.apply(func, axis=0/1)` applies a function to each column or row and can return scalars or Series. For vectorized operations prefer built‑ins; `apply` is a flexible fallback when no vectorized method exists.

### Q18. What are some advanced features of NumPy?

Advanced features include broadcasting, strides and views, structured/record dtypes, memory‑mapped arrays, vectorize/ufuncs (incl. generalized ufuncs), masked arrays, random generators with BitGenerators, linear algebra (`np.linalg`), FFTs, polynomial tools, and interoperability with Numba/Cython/SciPy.

### Q19. How does pandas simplify time series analysis?

pandas provides datetime indexing, resampling (up/down sampling with aggregations), window functions (rolling/expanding), time‑zone handling, shifting/lagging, period/frequency conversions, and rich date parsing. This simplifies tasks like daily → monthly aggregations, moving averages, and event alignment.

### Q20. What is the role of a pivot table in pandas?

Pivot tables reshape long/tidy data into a matrix with chosen index/columns and aggregated values. They summarize metrics across two or more keys (e.g., mean sales by (region × product)) and are great for reporting, heatmaps, and feeding models that expect wide data.

### Q21. Why is NumPy’s array slicing faster than Python’s list slicing?

Array slices are *views* referencing the same contiguous memory with adjusted strides—no element‑wise object copies. List slicing creates a new list of object references, incurring Python‑level overhead. NumPy’s compiled routines and homogeneous dtypes further accelerate access.

### Q22. What are some common use cases for Seaborn?

Typical use cases: quick EDA (pairplot, jointplot), visualizing distributions (hist, KDE, ECDF), categorical comparisons (bar/point/box/violin/swarm plots), regression diagnostics (regplot/lmplot), correlation heatmaps, and faceted grids for multi‑panel comparisons with minimal code.

### Task 1. Create a 2D NumPy array and calculate the sum of each row

In [None]:

import numpy as np
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
row_sums = A.sum(axis=1)
print("Array:\n", A)
print("Row sums:", row_sums)


### Task 2. Find the mean of a specific column in a DataFrame

In [None]:

import pandas as pd
df = pd.DataFrame({'city':['A','B','C','D'],'value':[10, 14, 9, 12]})
mean_val = df['value'].mean()
print(df)
print("Mean of 'value':", mean_val)


### Task 3. Create a scatter plot using Matplotlib

In [None]:

import matplotlib.pyplot as plt
x = np.linspace(0, 10, 50)
y = np.sin(x) + 0.2*np.random.randn(50)
plt.figure()
plt.scatter(x, y)
plt.title("Scatter Plot")
plt.xlabel("x"); plt.ylabel("y")
plt.show()


### Task 4. Calculate the correlation matrix and visualize it with a heatmap

In [None]:

# Using pandas corr and Matplotlib imshow for a heatmap-like view
np.random.seed(0)
data = pd.DataFrame({
    'feat1': np.random.randn(100),
    'feat2': np.random.randn(100) * 0.5 + 0.2,
    'feat3': np.random.randn(100) * 2 - 1
})
corr = data.corr(numeric_only=True)
print("Correlation Matrix:\n", corr)

plt.figure()
im = plt.imshow(corr.values, interpolation='nearest', aspect='auto')
plt.colorbar(im)
plt.xticks(range(len(corr.columns)), corr.columns, rotation=45)
plt.yticks(range(len(corr.columns)), corr.columns)
plt.title("Correlation Heatmap (Matplotlib)")
plt.tight_layout()
plt.show()


### Task 5. Generate a bar plot using Plotly

In [None]:

import plotly.express as px
import pandas as pd

sales = pd.DataFrame({'product':['A','B','C','D'], 'revenue':[120, 90, 150, 60]})
fig = px.bar(sales, x='product', y='revenue', title="Bar Plot (Plotly)")
fig.show()


### Task 6. Create a DataFrame and add a new column based on an existing column

In [None]:

df2 = pd.DataFrame({'price':[100, 250, 400], 'qty':[2, 1, 5]})
df2['total'] = df2['price'] * df2['qty']
print(df2)


### Task 7. Element-wise multiplication of two NumPy arrays

In [None]:

x = np.array([1,2,3])
y = np.array([4,5,6])
prod = x * y
print("x:", x, "y:", y, "element-wise product:", prod)


### Task 8. Create a line plot with multiple lines using Matplotlib

In [None]:

t = np.linspace(0, 2*np.pi, 200)
plt.figure()
plt.plot(t, np.sin(t), label='sin')
plt.plot(t, np.cos(t), label='cos')
plt.title("Multiple Lines")
plt.xlabel("t"); plt.ylabel("value")
plt.legend()
plt.show()


### Task 9. Filter DataFrame rows where a column value is greater than a threshold

In [None]:

df3 = pd.DataFrame({'name':['p','q','r','s'],'score':[65, 82, 58, 90]})
filtered = df3[df3['score'] > 70]
print("Original:\n", df3)
print("\nFiltered (score > 70):\n", filtered)


### Task 10. Create a histogram to visualize a distribution

In [None]:

np.random.seed(42)
samples = np.random.normal(loc=0, scale=1, size=500)
plt.figure()
plt.hist(samples, bins=25)
plt.title("Distribution Histogram")
plt.xlabel("Value"); plt.ylabel("Frequency")
plt.show()


### Task 11. Perform matrix multiplication using NumPy

In [None]:

M = np.array([[1,2],[3,4],[5,6]])      # 3x2
N = np.array([[7,8,9],[10,11,12]])     # 2x3
MN = M @ N
print("M:\n", M)
print("N:\n", N)
print("M @ N (3x3):\n", MN)


### Task 12. Use pandas to load a CSV file and display its first 5 rows

In [None]:

# Create small CSV
tmp = pd.DataFrame({'id':[1,2,3,4,5],'val':[10,20,15,30,25]})
csv_path = '/mnt/data/sample.csv'
tmp.to_csv(csv_path, index=False)

# Load and display
loaded = pd.read_csv(csv_path)
print("Loaded from:", csv_path)
print(loaded.head())


### Task 13. Create a 3D scatter plot using Plotly

In [None]:

import plotly.express as px
np.random.seed(123)
pts = pd.DataFrame({
    'x': np.random.randn(200),
    'y': np.random.randn(200),
    'z': np.random.randn(200),
    'label': np.random.choice(['A','B'], size=200)
})
fig = px.scatter_3d(pts, x='x', y='y', z='z', color='label', title='3D Scatter (Plotly)')
fig.show()
