
1. **What is NumPy, and why is it widely used in Python?**
   - NumPy (Numerical Python) is a fundamental library for numerical computations in Python, offering:
     - Multidimensional array objects
     - Functions for performing operations like linear algebra, Fourier transforms, and random number generation
     - Broadcasting capabilities allowing operations on arrays of different shapes and sizes
     - Integration with other tools like Pandas, Matplotlib, and Scipy
   It’s valued for its efficiency, performance, and ease of use in handling large datasets.

2. **How does broadcasting work in NumPy?**
   - Broadcasting allows NumPy to perform element-wise operations on arrays of different shapes without making multiple copies of data. It essentially "stretches" the smaller array across the larger one for compatibility, enabling efficient and concise code.

3. **What is a Pandas DataFrame?**
   - A DataFrame in Pandas is a two-dimensional, size-mutable, and heterogeneous data structure with labeled axes (rows and columns). Think of it as an in-memory spreadsheet or SQL table name array. It’s designed to simplify data manipulation, cleaning, and analysis.

4. **Explain the use of the groupby() method in Pandas**
   - The `groupby()` method in Pandas is used for splitting data into groups based on some criteria, then applying a function to each group independently, and finally combining the results. It’s useful for aggregation, transformation, and filtration operations.

5. **Why is Seaborn preferred for statistical visualizations?**
   - Seaborn is built on top of Matplotlib and provides beautiful, high-level interface for drawing attractive and informative statistical graphics. It comes with several built-in themes and color palettes to make visualizations more appealing and interpretable, with support for data structures provided by Pandas.

6. **What are the differences between NumPy arrays and Python lists?**
   - NumPy arrays offer:
     - More efficient storage and operation processing
     - Ability to perform complex mathematical operations
     - Support for multidimensional arrays
   Python lists, on the other hand, are more flexible (can contain different data types) but are not optimized for numerical operations.

7. **What is a heatmap, and when should it be used?**
   - A heatmap is a graphical representation of data where values are depicted by color. It’s especially useful in identifying patterns, correlations, and anomalies in a dataset by visualizing metrics.

8. **What does the term “vectorized operation” mean in NumPy?**
   - Vectorized operation refers to executing array operations in bulk without explicit loops, utilizing low-level machine code optimizations for better performance and less code.

9. **How does Matplotlib differ from Plotly?**
   - Matplotlib focuses on flexibility and customization, offering static, highly customizable plots. Plotly, however, emphasizes interactivity, providing dynamic and easy-to-create interactive charts suitable for web applications.

10. **What is the significance of hierarchical indexing in Pandas?**
    - Hierarchical indexing (MultiIndex) allows Pandas to handle more complex data structures by maintaining multiple levels of index labels. It simplifies data querying, aggregation, and reshaping operations.

11. **What is the role of Seaborn’s pairplot() function?**
    - The `pairplot()` function in Seaborn creates a grid of scatter plots and histograms to visualize pairwise relationships between features in a dataset, aiding in exploratory data analysis.

12. **What is the purpose of the describe() function in Pandas?**
    - The `describe()` function generates summary statistics of numerical columns in a DataFrame, providing mean, standard deviation, min, max, and quartile values.

13. **Why is handling missing data important in Pandas?**
    - Handling missing data is crucial as it can bias the analysis, reduce model accuracy, and lead to incorrect insights if not managed properly. Pandas provides methods to identify, fill, and drop missing values efficiently.

14. **What are the benefits of using Plotly for data visualization?**
    - Plotly offers interactive plots, easy integration with web applications, a wide array of chart types, and a user-friendly interface. It facilitates sharing visualizations online with interactive capabilities.

15. **How does NumPy handle multidimensional arrays?**
    - NumPy’s `ndarray` enables the creation and manipulation of multidimensional arrays easily, supporting operations like reshaping, indexing, and matrix manipulation, with high performance due to optimized C code.

16. **What is the role of Bokeh in data visualization?**
    - Bokeh is designed for creating interactive and scalable visualizations for modern web browsers. It supports large datasets, offers a flexible syntax, and integrates well with web technologies.

17. **Explain the difference between apply() and map() in Pandas**
    - `apply()` can be used with DataFrames or Series to apply a function along an axis (rows or columns). `map()` is used strictly with Series to map values using a dictionary or function, primarily for element-wise operations.

18. **What are some advanced features of NumPy?**
    - Advanced features include broadcasting, vectorized operations, linear algebra functions, random number capabilities, FFT (Fast Fourier Transform) operations, and support for memory-mapped files.

19. **How does Pandas simplify time series analysis?**
    - Pandas offers robust support for time series data, including datetime indexing, resampling, time zone handling, period conversion, and rolling window statistics, making it easier to perform time-based operations and analysis.

20. **What is the role of a pivot table in Pandas?**
   - A pivot table in Pandas is a powerful tool used for data aggregation and summarization. It reorganizes and reshapes data by sorting, grouping, and computing aggregate statistics like mean, sum, count, etc., providing a concise overview of large datasets. The `pivot_table()` function in Pandas helps in creating pivot tables easily, making it invaluable for data analysis and reporting.

21. **Why is NumPy’s array slicing faster than Python’s list slicing?**
   - NumPy’s array slicing is faster due to:
     - **Homogeneous data types:** NumPy arrays store the same data type, making memory access more efficient.
     - **Memory layout:** NumPy uses contiguous blocks of memory, enhancing cache locality and reducing overhead during slicing.
     - **Optimized C backend:** NumPy’s operations are executed at the C level, producing faster computations compared to Python’s interpreter.

22. **What are some common use cases for Seaborn?**
   - Some prevalent use cases for Seaborn include:
     - **Exploratory Data Analysis (EDA):** Creating informative and aesthetically pleasing visual representations of data distributions and relationships.
     - **Statistical visualizations:** Plotting statistical models to identify relationships amongst variables using pairplots, boxplots, violin plots, etc.
     - **Heatmaps:** Visualizing correlation matrices and identifying trends or anomalies in large datasets.
     - **Time series data:** Plotting time series data to identify trends over time.



#Practical

In [None]:
#1.How do you create a 2D NumPy array and calculate the sum of each row?

import numpy as np

# Create a 2D array
array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Calculate the sum of each row
row_sums = np.sum(array, axis=1)
print("Sum of each row:", row_sums)


In [None]:
#2.Write a Pandas script to find the mean of a specific column in a DataFrameA

import pandas as pd

# Create a sample DataFrame
data = {'A': [1, 2, 3, 4, 5], 'B': [5, 6, 7, 8, 9]}
df = pd.DataFrame(data)

# Calculate the mean of column 'A'
mean_A = df['A'].mean()
print("Mean of column 'A':", mean_A)


In [None]:
#3.Create a scatter plot using MatplotlibA

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 35]

# Create scatter plot
plt.scatter(x, y)
plt.title('Scatter Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()


In [None]:
#4.How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap?

import seaborn as sns
import pandas as pd
import numpy as np

# Create a sample DataFrame
data = np.random.rand(10, 4)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'])

# Calculate the correlation matrix
correlation_matrix = df.corr()

# Visualize with a heatmap
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix Heatmap')
plt.show()


In [None]:
#5.Generate a bar plot using Plotly.

import plotly.graph_objects as go

# Sample data
categories = ['Category A', 'Category B', 'Category C']
values = [10, 15, 7]

# Create a bar plot
fig = go.Figure(data=[
    go.Bar(name='Values', x=categories, y=values)
])

# Set plot title and labels
fig.update_layout(title='Bar Plot',
                  xaxis_title='Categories',
                  yaxis_title='Values')

# Show plot
fig.show()
s


In [None]:
#6.Create a DataFrame and add a new column based on an existing column.

import pandas as pd

# Create a sample DataFrame
data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Add a new column 'C' based on column 'A' (e.g., square of 'A')
df['C'] = df['A'] ** 2
print(df)


In [None]:
#7 Write a program to perform element-wise multiplication of two NumPy arrays.

import numpy as np

# Create two sample arrays
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([5, 4, 3, 2, 1])

# Perform element-wise multiplication
result = np.multiply(array1, array2)
print("Element-wise multiplication result:", result)



In [None]:
#8. Create a line plot with multiple lines using Matplotlib.

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y1 = [10, 20, 30, 40, 50]
y2 = [15, 25, 35, 45, 55]

# Create line plot
plt.plot(x, y1, label='Line 1', marker='o')
plt.plot(x, y2, label='Line 2', marker='s')
plt.title('Multiple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.show()


In [None]:
#9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold.

import pandas as pd

# Create a sample DataFrame
data = {'A': [5, 10, 15, 20, 25], 'B': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Filter rows where column 'A' is greater than 15
filtered_df = df[df['A'] > 15]
print(filtered_df)


In [None]:
#10.Create a histogram using Seaborn to visualize a distribution.

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5]

# Create histogram
sns.histplot(data, bins=5, kde=True)
plt.title('Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()


In [None]:
#11. Perform matrix multiplication using NumPy.

import numpy as np

# Create two 2D arrays (matrices)
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Perform matrix multiplication
result = np.matmul(matrix1, matrix2)  # or you can use matrix1 @ matrix2
print("Matrix multiplication result:\n", result)


In [None]:
#12.Use Pandas to load a CSV file and display its first 5 rowsA.

import pandas as pd

# Load the CSV file into a DataFrame
df = pd.read_csv('your_file.csv')  # Replace 'your_file.csv' with the path to your CSV file

# Display the first 5 rows of the DataFrame
print(df.head())


In [None]:
#13. Create a 3D scatter plot using Plotly.

import plotly.graph_objects as go

# Sample data
x = [0, 1, 2, 3, 4, 5]
y = [10, 11, 12, 13, 14, 15]
z = [5, 6, 7, 8, 9, 10]

# Create a 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=x,
    y=y,
    z=z,
    mode='markers',
    marker=dict(
        size=5,
        color=z,  # Setting the color based on the z values
        colorscale='Viridis',  # Color scale
        opacity=0.8
    )
)])

# Set plot title and labels
fig.update_layout(title='3D Scatter Plot',
                  scene=dict(
                      xaxis_title='X Axis',
                      yaxis_title='Y Axis',
                      zaxis_title='Z Axis'))

# Show plot
fig.show()
