Q1. What is NumPy, and why is it widely used in Python?
- NumPy (Numerical Python) is a library for numerical computing that provides support for large, multi-dimensional arrays and matrices. It includes optimized mathematical functions for performing fast operations on arrays, such as linear algebra, statistical analysis, and random number generation. NumPy is widely used because it offers significant performance improvements over Python lists, supports vectorized operations, and serves as the foundation for many data science libraries like Pandas, SciPy, and TensorFlow.

Q2. How does broadcasting work in NumPy?
- Broadcasting allows NumPy to perform element-wise operations on arrays of different shapes without explicit loops. When performing operations between arrays of incompatible shapes, NumPy automatically expands the smaller array along missing dimensions to match the larger array. This feature optimizes performance by avoiding unnecessary data duplication and enables more efficient computations.

Q3. What is a Pandas DataFrame?
- A Pandas DataFrame is a two-dimensional labeled data structure, similar to a table in a database or an Excel spreadsheet. It consists of rows and columns, where each column can have a different data type. DataFrames allow efficient data manipulation, filtering, aggregation, and analysis, making them a fundamental tool for data science and machine learning.

Q4. Explain the use of the groupby() method in Pandas?
- The groupby() method in Pandas is used to group data based on one or more columns and then apply aggregation functions such as sum, mean, count, or max. This method is useful for analyzing and summarizing large datasets by categories. For example, in a sales dataset, groupby('Region').sum() can provide total sales per region.

Q5. Why is Seaborn preferred for statistical visualizations?
- Seaborn is preferred for statistical visualizations because it is built on top of Matplotlib and provides a high-level interface for creating aesthetically pleasing and informative plots. It simplifies the creation of complex visualizations such as violin plots, box plots, and heatmaps. Seaborn also integrates well with Pandas, making it easy to visualize relationships between variables in structured datasets.

Q6. A What are the differences between NumPy arrays and Python lists?

- Speed: NumPy arrays are faster because they use fixed data types and optimized C-based operations.
- Memory Efficiency: NumPy arrays require less memory than Python lists.
- Vectorized Operations: NumPy supports element-wise operations without loops.
- Data Type Consistency: Python lists can store mixed data types, whereas NumPy arrays store only one type.

Q7. What is a heatmap, and when should it be used?
- A heatmap is a data visualization that represents values using color gradients, where higher values are shown in warmer colors and lower values in cooler colors. It is useful for analyzing correlation matrices, identifying trends in large datasets, and visualizing relationships between multiple variables in a dataset.

Q8. A What does the term “vectorized operation” mean in NumPy?
- A vectorized operation in NumPy refers to performing operations on entire arrays at once instead of using loops. These operations leverage optimized C-based implementations, making them significantly faster than iterating through elements manually. Examples include arithmetic operations (array1 + array2) and mathematical functions (np.sin(array)).

Q9. A How does Matplotlib differ from Plotly?
- Matplotlib is primarily used for static visualizations, offering fine control over plots but requiring more code for customization. Plotly, on the other hand, provides interactive visualizations with features like zooming, tooltips, and real-time updates. Plotly is more suitable for dashboards and web applications, while Matplotlib is often used for detailed analysis in research.

Q10. A What is the significance of hierarchical indexing in Pandas?
- Hierarchical indexing allows Pandas to use multiple levels of indexing for rows and columns. It enables working with more complex datasets, making data retrieval, slicing, and aggregation more efficient. For example, a dataset with multi-level indices for country and city allows querying data at different granularities.

Q11. What is the role of Seaborn’s pairplot() function?
- The pairplot() function in Seaborn creates scatter plots for all pairwise combinations of numerical variables in a dataset. It helps in understanding relationships between multiple features, detecting patterns, and identifying correlations, making it a useful tool for exploratory data analysis.

Q12. What is the purpose of the describe() function in Pandas?
- The describe() function provides summary statistics for numerical columns, including count, mean, standard deviation, min, max, and quartiles. It is useful for getting a quick statistical overview of a dataset and identifying potential anomalies or trends.

Q13. Why is handling missing data important in Pandas?
- Handling missing data is crucial to maintain data integrity and avoid biased results. Pandas provides functions like dropna() to remove missing values and fillna() to replace them with appropriate values. Proper handling ensures accurate analysis and prevents errors in machine learning models.

Q14. What are the benefits of using Plotly for data visualization?
- Plotly provides interactive visualizations, allowing users to zoom, hover, and filter data dynamically. It supports a variety of chart types, including 3D plots and geospatial maps. Plotly is particularly useful for creating dashboards and sharing visual insights in web applications.

Q15. How does NumPy handle multidimensional arrays?
- NumPy allows efficient handling of multidimensional arrays using ndarray objects. It supports reshaping, slicing, broadcasting, and mathematical operations across multiple dimensions. Functions like np.reshape() and np.transpose() make it easy to manipulate array structures.

Q16. What is the role of Bokeh in data visualization?
- Bokeh is a Python library for interactive and web-friendly visualizations. It allows users to create dynamic plots that can be embedded in web applications, making it suitable for dashboards and real-time data visualization. It supports linking multiple plots and handling large datasets efficiently.

Q17. Explain the difference between apply() and map() in Pandas?
- `apply()` works on both DataFrames and Series, applying a function to rows or columns.
- `map()` is used only on Series and applies a function element-wise. -
`apply()` is more flexible and can handle complex transformations, while
`map()` is mainly for simple element-wise operations.

Q18. What are some advanced features of NumPy?  
- Broadcasting: Enables operations on arrays of
 different shapes.
 - Linear Algebra: Functions like matrix multiplication and eigenvalues.
 - Random Sampling: Generates random numbers efficiently. Advanced Indexing: Enables complex selections and modifications. - **FFT (Fast Fourier Transform):** Used in signal processing and data analysis.


Q19. How does Pandas simplify time series analysis?
- Pandas provides built-in support for handling time series data, including datetime indexing, resampling, and rolling window operations. These features make it easy to analyze trends, detect seasonality, and perform time-based aggregations.

Q20.  What is the role of a pivot table in Pandas?
- A pivot table in Pandas is used to reorganize and summarize data by grouping and aggregating values based on specified columns. It is useful for analyzing large datasets, allowing users to extract meaningful insights from structured data.

Q21. Why is NumPy’s array slicing faster than Python’s list slicing?
- NumPy’s array slicing is faster than Python’s list slicing because NumPy arrays use contiguous memory allocation and return views instead of copies. When slicing a NumPy array, it does not create a new array but instead provides a reference to the original data, making operations highly efficient. In contrast, Python lists store elements as separate objects in memory, and slicing creates a new list, increasing memory usage and processing time.

Q22. What are some common use cases for Seaborn?
- Statistical Visualizations: Creating box plots, violin plots, and swarm plots to analyze distributions.
- Correlation Analysis: Using heatmaps to visualize relationships between variables in a dataset.
- Pairwise Relationships: Using pairplot() to explore dependencies between multiple numerical features.
- Time Series Analysis: Plotting trends and seasonal variations in time-series data.
- Categorical Data Analysis: Using count plots and bar plots to study categorical variable distributions.
- Regression Analysis: Using regplot() to visualize relationships between independent and dependent variables.







In [None]:
# Q1. How do you create a 2D NumPy array and calculate the sum of each row?
import numpy as np

# Creating a 2D NumPy array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Calculating the sum of each row
row_sums = np.sum(arr, axis=1)

print(row_sums)


In [None]:
# Q2. Write a Pandas script to find the mean of a specific column in a DataFrame
import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Salary': [50000, 60000, 70000, 80000]}

df = pd.DataFrame(data)

# Calculating the mean of the 'Salary' column
mean_salary = df['Salary'].mean()

print("Mean Salary:", mean_salary)


In [None]:
# Q3. Create a scatter plot using Matplotlib?
import matplotlib.pyplot as plt

# Sample data
x = [10, 20, 30, 40, 50]
y = [5, 15, 25, 35, 45]

# Creating the scatter plot
plt.scatter(x, y, color='blue', marker='o')

# Adding labels and title
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Simple Scatter Plot")

# Display the plot
plt.show()


In [None]:
# Q4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
        'B': [2, 3, 4, 5, 6],
        'C': [5, 4, 3, 2, 1]}

df = pd.DataFrame(data)

# Calculating the correlation matrix
corr_matrix = df.corr()

# Creating a heatmap
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f")

# Display the plot
plt.title("Correlation Matrix Heatmap")
plt.show()


In [None]:
# Q5. Generate a bar plot using Plotly?
import plotly.express as px

# Sample data
data = {'Category': ['A', 'B', 'C', 'D'],
        'Value': [10, 15, 7, 12]}

# Create a DataFrame
import pandas as pd
df = pd.DataFrame(data)

# Creating the bar plot
fig = px.bar(df, x='Category', y='Value', title='Bar Plot using Plotly')

# Show the plot
fig.show()


In [None]:
# Q6. Create a DataFrame and add a new column based on an existing column?
import pandas as pd

# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40]}

df = pd.DataFrame(data)

# Adding a new column 'Age_in_5_years' based on the 'Age' column
df['Age_in_5_years'] = df['Age'] + 5

# Display the updated DataFrame
print(df)


In [None]:
# Q7. Write a program to perform element-wise multiplication of two NumPy arrays?
import numpy as np

# Creating two NumPy arrays
array1 = np.array([1, 2, 3, 4])
array2 = np.array([5, 6, 7, 8])

# Element-wise multiplication
result = array1 * array2

# Display the result
print("Element-wise multiplication result:", result)


In [None]:
# Q8. Create a line plot with multiple lines using Matplotlib?
import matplotlib.pyplot as plt

# Sample data for multiple lines
x = [0, 1, 2, 3, 4, 5]
y1 = [0, 1, 4, 9, 16, 25]  # y = x^2
y2 = [0, 1, 2, 3, 4, 5]    # y = x
y3 = [0, 1, 3, 6, 10, 15]  # y = triangular numbers

# Creating the line plot
plt.plot(x, y1, label='y = x^2', color='blue')
plt.plot(x, y2, label='y = x', color='green')
plt.plot(x, y3, label='y = triangular numbers', color='red')

# Adding labels and title
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Multiple Line Plot")

# Adding a legend
plt.legend()

# Display the plot
plt.show()


In [None]:
# Q9.  Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold?
import pandas as pd

# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Salary': [50000, 60000, 70000, 80000]}

df = pd.DataFrame(data)

# Setting a threshold for filtering
threshold = 60000

# Filtering rows where 'Salary' is greater than the threshold
filtered_df = df[df['Salary'] > threshold]

# Display the filtered DataFrame
print(filtered_df)


In [None]:
# Q10. Create a histogram using Seaborn to visualize a distribution?
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = [23, 45, 56, 67, 89, 45, 34, 23, 56, 67, 45, 67, 89, 23, 56]

# Create the histogram
sns.histplot(data, kde=True, bins=10, color='blue')

# Adding labels and title
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.title("Histogram to Visualize Distribution")

# Display the plot
plt.show()


In [None]:
# Q11. Perform matrix multiplication using NumPy?
import numpy as np

# Define two matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Perform matrix multiplication using @ operator
result = A @ B

# Alternatively, using np.matmul()
# result = np.matmul(A, B)

# Display the result
print("Matrix multiplication result:\n", result)


In [None]:
# Q12. Use Pandas to load a CSV file and display its first 5 rows?
import pandas as pd

# Load the CSV file (replace 'your_file.csv' with the actual file path)
df = pd.read_csv('your_file.csv')

# Display the first 5 rows of the DataFrame
print(df.head())


In [None]:
# Q13. Create a 3D scatter plot using Plotly.
import plotly.express as px
import pandas as pd

# Sample data
data = {'x': [1, 2, 3, 4, 5],
        'y': [5, 4, 3, 2, 1],
        'z': [10, 11, 12, 13, 14],
        'label': ['A', 'B', 'C', 'D', 'E']}

# Create a DataFrame
df = pd.DataFrame(data)

# Create a 3D scatter plot
fig = px.scatter_3d(df, x='x', y='y', z='z', color='label', title='3D Scatter Plot')

# Show the plot
fig.show()
