1. What is NumPy, and why is it widely used in Python?
  - NumPy (Numerical Python) is a powerful open-source library used for numerical and scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a wide range of mathematical functions to operate on these arrays efficiently.

  - Key Features:

   - N-dimensional arrays: The ndarray object is faster and more compact than Python lists.

   - Mathematical operations: You can perform operations like addition, subtraction, multiplication, and division directly on arrays without writing loops.

   - Broadcasting and vectorization: These features allow operations between arrays of different shapes efficiently.

   - Integration: Works seamlessly with libraries like Pandas, Matplotlib, and Scikit-learn.

   - Why it’s widely used:

   - It’s the foundation of data analysis and machine learning in Python.

   - It provides C-speed performance through optimized code.

   - It simplifies mathematical computations and enables matrix manipulation, which is essential for scientific applications.



2. How does broadcasting work in NumPy?
  - Broadcasting is a method that allows NumPy to perform arithmetic operations on arrays of different shapes and sizes. Instead of creating multiple copies of smaller arrays, NumPy “stretches” or replicates them in memory to match the shape of the larger array — without actually copying data, making it memory-efficient.

  - Rules of Broadcasting:

   - If arrays have a different number of dimensions, the smaller one is padded with ones on the left.

   - If the sizes of dimensions do not match, NumPy compares them:

   - If one of the dimensions is 1, it can be stretched to match the other.

   - If neither matches, it throws an error.

   - Example:

     - import numpy as np
     - a = np.array([1, 2, 3])
     - b = np.array([[10], [20], [30]])
     - print(a + b)
   - Output:
     - [[11 12 13]
     - [21 22 23]
     - [31 32 33]]


   - NumPy automatically broadcasted the smaller array a to match the shape of b.

3. What is a Pandas DataFrame?
  - A DataFrame is a two-dimensional, tabular data structure provided by the Pandas library, similar to an Excel spreadsheet or SQL table.

  - It has rows and columns, where each column can have a different data type (integer, float, string, etc.).

  - Features:

  - Labeled axes (rows and columns)

  - Can handle missing data

  - Supports powerful data manipulation, filtering, and grouping

  - Can import/export data from sources like CSV, Excel, or databases

  - Example:

   - import pandas as pd
   - data = {
     - 'Name': ['Raj', 'Ravi', 'Simran'],
     - 'Age': [25, 28, 22],
     - 'City': ['Delhi', 'Mumbai', 'Pune']
     }
     - df = pd.DataFrame(data)
     - print(df)


  - Output:
    - Name  Age    City
    - 0    Raj   25   Delhi
    - 1   Ravi   28  Mumbai
    - 2 Simran   22    Pune


   - DataFrames are widely used for data cleaning, analysis, and visualization preparation.

4. Explain the use of the groupby() method in Pandas.
  - The groupby() method in Pandas is used to split data into groups, apply operations, and then combine the results.

  - It’s especially useful for analyzing large datasets — for example, calculating average sales per region or total marks per student.

  - Three-step process:

  - Split: Divide data into groups based on some criteria.

  - Apply: Perform a function (sum, mean, count, etc.) on each group.

  - Combine: Merge the results into a new DataFrame.

  - Example:

   - import pandas as pd
   - data = {'City': ['Delhi', 'Delhi', 'Mumbai', 'Pune'],
        'Sales': [100, 150, 200, 130]}
   - df = pd.DataFrame(data)
   - result = df.groupby('City')['Sales'].sum()
   - print(result)


  - Output:
    - City
    - Delhi     250
    - Mumbai    200
    - Pune      130
    - Name: Sales, dtype: int64

   - This helps in data summarization and aggregation.

5. Why is Seaborn preferred for statistical visualizations?
  - Seaborn is a Python data visualization library built on top of Matplotlib. It’s preferred for statistical visualizations because it provides high-level functions that make complex visualizations simple and beautiful.

  - Reasons for preference:

  - Built-in themes: Produces clean, professional-looking charts.

  - Easy integration with Pandas: Works directly with DataFrames.

  - Statistical plots: Offers specialized charts like boxplots, violin plots, pairplots, and heatmaps.

  - Automatic estimation: Can display trends, distributions, and confidence intervals automatically.


6. What are the differences between NumPy arrays and Python lists?
  - Feature	NumPy Array	Python List
  - Data Type	Homogeneous (same type)	Heterogeneous (mixed types)
  - Performance	Faster (C-optimized)	Slower (interpreted)
  - Memory Usage	Less memory	More memory
  - Operations	Supports vectorized (element-wise) operations	Requires loops
  - Dimensions	Supports multi-dimensional arrays	Mostly one-dimensional
  - Functionality	Offers many mathematical functions	Limited operations

  - NumPy arrays perform mathematical operations, while lists perform concatenation.

7. What is a heatmap, and when should it be used?
  - A heatmap is a graphical representation of data where individual values are represented using color gradients.

  - Purpose:
  - It helps visualize the relationship, intensity, or pattern between variables — especially useful for correlation matrices or large datasets.

  - When to use:

  - To see correlations between variables

  - To analyze missing data

  - To visualize data density or frequency

  - orrelation heatmap showing how strongly each variable is related.

8. What does the term “vectorized operation” mean in NumPy?
  - Vectorized operation means performing operations on entire arrays instead of individual elements, without using explicit loops.
  
  - This is possible because NumPy is built on optimized C code, which allows these operations to run much faster.
  - no loop is required — NumPy internally applies the operation to every element.

  - Benefits:

  - Faster execution

  - Cleaner and more readable code

  - Efficient memory usage

  - Vectorization is one of the main reasons NumPy is preferred for mathematical and data operations.

9. How does Matplotlib differ from Plotly?
  - Feature	Matplotlib	Plotly
  - Type	Static visualization library	Interactive visualization library
  - Interactivity	Limited	Highly interactive (zoom, hover, click)
  - Ease of Use	Requires more code for styling	Easier, modern API
  - Output	Good for reports and publications	Good for dashboards and web apps
  - Integration	Works with Seaborn and Pandas	Works with Dash (web framework)

  - Example:

  - Matplotlib: Best for static plots (research papers, simple graphs).

  - Plotly: Best for interactive dashboards and data exploration.

10. What is the significance of hierarchical indexing in Pandas?
  - Hierarchical indexing (MultiIndex) allows multiple levels of index labels in a Pandas DataFrame or Series.
  - It helps represent higher-dimensional data in a two-dimensional table.

  - Benefits:

  - Makes it easier to work with complex datasets.

  - Enables grouping and subsetting at multiple levels.

  - Useful for pivot tables and time-series data.

  - Hierarchical indexing provides flexibility and better data organization.

11. What is the role of Seaborn’s pairplot() function?
  - The pairplot() function in Seaborn is used to visualize the pairwise relationships between variables in a dataset. It automatically creates a grid of plots, where each numeric variable is plotted against every other numeric variable.
  - On the diagonal, it shows the distribution (usually a histogram or density plot) of each variable, and on the off-diagonal, it shows scatter plots that reveal relationships between variables.

  - This function is especially helpful for exploratory data analysis (EDA), as it helps identify trends, correlations, clusters, and potential outliers within the data.
  - It is commonly used when you want to quickly understand how different features in a dataset relate to each other.

12. What is the purpose of the describe() function in Pandas?
  - The describe() function in Pandas provides a statistical summary of a DataFrame or Series. It helps analysts quickly understand the distribution and characteristics of the data without performing calculations manually.

  - For numeric columns, it shows count, mean, standard deviation, minimum, quartiles, and maximum values.
  - For categorical data, it can display the count, unique values, most frequent value, and its frequency.

  - This function is highly useful in data preprocessing and exploration, as it gives an overview of central tendency, spread, and range, helping to detect anomalies or data imbalances.

13. Why is handling missing data important in Pandas?
  - Handling missing data is essential because incomplete datasets can lead to inaccurate analysis and misleading conclusions. Missing values can affect statistical calculations, visualizations, and machine learning models.

  - If missing data is not addressed, it can cause:

  - Errors in computations (like division by NaN)

  - Biased results in summaries and correlations

  - Reduced accuracy in predictive models

  - Pandas provides various tools for handling missing data, such as filling missing values, dropping incomplete rows or columns, or imputing data statistically.
  - Proper handling ensures data integrity, reliability, and consistency during analysis.

14. What are the benefits of using Plotly for data visualization?
  - Plotly is a powerful visualization library known for creating interactive, web-based visualizations. It provides a modern interface that allows users to explore data visually through actions like zooming, hovering, and filtering.

  - Benefits include:

  - Interactivity: Users can engage with the plots directly, making it ideal for dashboards and presentations.

  - Wide range of charts: Supports line charts, scatter plots, 3D graphs, maps, and more.

  - Integration: Works seamlessly with web applications (via Dash) and Jupyter notebooks.

  - Aesthetic visuals: Automatically produces clean and professional visuals with minimal customization.

  - Plotly is often used for data exploration, reporting, and dashboard creation in data science and business analytics.

5. How does NumPy handle multidimensional arrays?
  - NumPy handles multidimensional arrays through its ndarray object, which can represent data with any number of dimensions. Each dimension is called an axis, and NumPy efficiently stores and manipulates data across these axes.

  - It uses contiguous memory blocks and optimized C-based operations, allowing it to perform mathematical and statistical operations quickly across all dimensions.
  - Multidimensional arrays are indexed using tuples, enabling access to specific elements, rows, columns, or subarrays.

  - NumPy also provides advanced operations like reshaping, transposing, stacking, and broadcasting, which make it a core tool for numerical and scientific computing.

16. What is the role of Bokeh in data visualization?
  - Bokeh is a Python library designed for creating interactive and real-time visualizations for modern web browsers. It focuses on producing rich, interactive graphics that can be embedded in web applications easily.

  - Its key role is to allow developers and analysts to build interactive dashboards and visual analytics tools without requiring JavaScript.
  - Bokeh supports zooming, panning, tooltips, and real-time streaming data, which makes it ideal for dynamic data visualization.

  - It integrates well with Pandas and NumPy and is often used for data-driven storytelling and web-based data exploration.

17. Explain the difference between apply() and map() in Pandas.
  - Both apply() and map() are used to apply functions to data in Pandas, but they differ in scope and flexibility.

  - map():
  - Used primarily on Series objects. It applies a function element-wise to transform or modify the data. It is suitable for simple transformations or mappings.

  - apply():
  - Works on both Series and DataFrames. It can apply a function along an axis (rows or columns) and handle more complex operations. It is used when transformations depend on multiple columns or rows.

  - In summary, map() is for element-wise transformations on one column, while apply() is for custom, multi-dimensional operations on the entire dataset.

18. What are some advanced features of NumPy?
  - NumPy offers several advanced features that make it essential for scientific and analytical computing:

  - Broadcasting: Enables arithmetic between arrays of different shapes efficiently.

  - Vectorization: Allows operations on entire arrays without explicit loops.

  - Masked arrays: Supports ignoring or masking invalid or missing values in computations.

  - Linear algebra functions: Includes matrix multiplication, eigenvalues, and decompositions.

  - Random module: Provides tools for generating random numbers and performing simulations.

  - Memory-efficient slicing: Allows working with subarrays without copying data.

  - Universal functions (ufuncs): Provide fast, element-wise operations implemented in C.

  - These features make NumPy not only fast but also capable of handling large-scale data and complex mathematical operations efficiently.

19. How does Pandas simplify time series analysis?
   - Pandas includes built-in support for time series data, making it easier to analyze and manipulate date-time information. It automatically recognizes datetime formats and provides special data types like DatetimeIndex.

   - It simplifies time series analysis through:

   - Resampling: Aggregating data over specific time periods (daily, monthly, yearly).

   - Shifting and lagging: Comparing current and past values.

   - Rolling windows: Calculating moving averages or trends.

   - Frequency conversion: Changing time intervals for better pattern recognition.

   - These features allow analysts to handle data with temporal relationships, such as stock prices, sales trends, and sensor readings, efficiently.

20. What is the role of a pivot table in Pandas?
  - A pivot table in Pandas is used to summarize, reorganize, and analyze data by transforming columns into rows and applying aggregation functions such as sum, mean, or count.

  - It allows you to view data from different perspectives by grouping and summarizing based on specific keys or categories.
  - Pivot tables are particularly useful for data summarization and reporting, similar to those used in spreadsheet software like Microsoft Excel.

  - They help in analyzing relationships between different features and generating multi-dimensional summaries of large datasets.

21. Why is NumPy’s array slicing faster than Python’s list slicing?
  - NumPy’s array slicing is faster because of its contiguous memory allocation and fixed data types. NumPy arrays are implemented in C, allowing direct memory access and operations at machine speed, while Python lists are high-level objects with variable data types stored in different memory locations.

  - When slicing, NumPy does not copy the data — it creates a view of the same memory block. In contrast, Python lists create new objects during slicing, which increases memory and processing time.

  - This efficiency makes NumPy ideal for large-scale numerical and matrix operations.

22. What are some common use cases for Seaborn?

  - Seaborn is widely used for statistical data visualization due to its simplicity and visual appeal. Some common use cases include:

  - Exploratory Data Analysis (EDA): To identify trends, patterns, and outliers.

  - Distribution plots: For analyzing data spread and skewness using histograms or KDE plots.

  - Categorical data visualization: For comparing data across categories with bar plots, box plots, or violin plots.

  - Correlation analysis: Using heatmaps to identify relationships between numerical variables.

  - Pairwise comparisons: To explore relationships between multiple variables using pairplots.

  - Regression analysis: To visualize relationships between variables and fitted regression lines.

  - Seaborn provides an easy and effective way to visualize statistical insights in data science and analytics projects.

In [None]:
# 1. How do you create a 2D NumPy array and calculate the sum of each row?

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
row_sum = arr.sum(axis=1)
print(row_sum)

In [None]:
# 2. Write a Pandas script to find the mean of a specific column in a DataFrame.

import pandas as pd
data = {'Name': ['A', 'B', 'C'], 'Marks': [80, 90, 85]}
df = pd.DataFrame(data)
mean_value = df['Marks'].mean()
print(mean_value)

In [None]:
# 3. Create a scatter plot using Matplotlib.

import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [5, 7, 8, 5, 6]
plt.scatter(x, y)
plt.show()

In [None]:
# 4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

import seaborn as sns
import pandas as pd
data = pd.DataFrame({'A': [1,2,3,4], 'B': [2,3,4,5], 'C': [5,4,3,2]})
corr = data.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')

In [None]:
# 5. Generate a bar plot using Plotly.

import plotly.express as px
data = {'Fruit': ['Apple', 'Banana', 'Mango'], 'Quantity': [10, 20, 15]}
fig = px.bar(data, x='Fruit', y='Quantity', title='Fruit Quantity')
fig.show()

In [None]:
# 6. Create a DataFrame and add a new column based on an existing column.

import pandas as pd
data = {'Name': ['A', 'B', 'C'], 'Marks': [80, 90, 85]}
df = pd.DataFrame(data)
df['Grade'] = df['Marks'] + 5
print(df)

In [None]:
# 7. Write a program to perform element-wise multiplication of two NumPy arrays.

import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a * b
print(result)


In [None]:
# 8. Create a line plot with multiple lines using Matplotlib.

import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 3, 5, 7, 9]
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.show()

In [None]:
# 9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.

import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D'], 'Marks': [60, 85, 45, 90]}
df = pd.DataFrame(data)
filtered = df[df['Marks'] > 70]
print(filtered)


In [None]:
# 10. Create a histogram using Seaborn to visualize a distribution.

import seaborn as sns
import pandas as pd
data = pd.DataFrame({'Scores': [55, 60, 65, 70, 75, 80, 85, 90]})
sns.histplot(data['Scores'], bins=5, kde=True)

In [None]:
# 11. Perform matrix multiplication using NumPy.

import numpy as np
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
result = np.dot(a, b)
print(result)

In [None]:
# 12. Use Pandas to load a CSV file and display its first 5 rows.

import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())

In [None]:
# 13. Create a 3D scatter plot using Plotly.

import plotly.express as px
data = {'x': [1, 2, 3, 4], 'y': [10, 15, 13, 17], 'z': [5, 6, 7, 8]}
fig = px.scatter_3d(data, x='x', y='y', z='z', title='3D Scatter Plot')
fig.show()