Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [6]:
import pandas as pd

# Create a Pandas Series
data = [4, 8, 15, 16, 23, 42]
my_series = pd.Series(data)

# Print the Pandas Series
print(my_series)


0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the
variable print it.

In [5]:
import pandas as pd

# Create a list containing 10 elements
my_list = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

# Convert the list to a Pandas Series
my_series = pd.Series(my_list)

# Print the Pandas Series
print(my_series)


0     10
1     20
2     30
3     40
4     50
5     60
6     70
7     80
8     90
9    100
dtype: int64


Q3. Create a Pandas DataFrame that contains the following data:
    Name
Alice
Bob
Claire

Age
25
30
27

Gender
Female
Male
Female
Then, print the DataFrame.

In [4]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)

print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

In pandas, a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure. It can be thought of as a spreadsheet or SQL table where data is organized in rows and columns. Each column in a DataFrame is represented as a pandas.Series object, and these Series share a common index, allowing for easy alignment of data.

Here's how a DataFrame is different from a Series:

DataFrame:

Two-dimensional structure with rows and columns.
Can contain multiple columns, each of which can have different data types.
Provides a tabular view of data, similar to a spreadsheet.
Supports both row and column indexing.
Ideal for representing structured data like CSV files, SQL tables, or Excel spreadsheets.
Series:

One-dimensional data structure, essentially a labeled array.
Contains data of a single data type.
Represents a single column or row of data.
Indexed by labels or integers.
Useful for representing one-dimensional data or a single column/row from a DataFrame.
Here's an example to illustrate the difference between a DataFrame and a Series:



In [2]:
import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 22]}
df = pd.DataFrame(data)

# Creating a Series
ages = pd.Series([25, 30, 22], name='Age')

# Displaying the DataFrame and Series
print("DataFrame:")
print(df)
print("\nSeries:")
print(ages)


DataFrame:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   22

Series:
0    25
1    30
2    22
Name: Age, dtype: int64


In this example, we create a DataFrame df with two columns ('Name' and 'Age'). Each column is a Series, but they are combined into a DataFrame to represent a structured dataset. On the other hand, the ages Series contains only the 'Age' column data and is a one-dimensional data structure.

In [None]:
Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can
you give an example of when you might use one of these functions?

Pandas provides a wide range of functions to manipulate data in a DataFrame. Here are some common functions and methods you can use:

1.Selecting and Filtering Data:

df['column_name'] or df.column_name: Select a specific column.
df[['col1', 'col2']]: Select multiple columns.
df.loc[row_label] or df.iloc[row_index]: Select rows by label or index.
df[df['column_name'] > value]: Filter rows based on a condition.
Example: Selecting rows where the 'Age' column is greater than 30.
filtered_df = df[df['Age'] > 30]


2.Sorting Data:

df.sort_values(by='column_name'): Sort the DataFrame by a specific column.
Example: Sorting the DataFrame by the 'Score' column in descending order.
sorted_df = df.sort_values(by='Score', ascending=False)


3.Aggregation and Summary Statistics:

df.groupby('column_name').agg({'column_name': 'function'}): Perform aggregation operations (e.g., mean, sum) on grouped data.
Example: Calculate the mean score for each age group.
mean_scores = df.groupby('Age').agg({'Score': 'mean'})


4.Changing Data:

df['new_column'] = some_value: Add a new column to the DataFrame.
df['column_name'].apply(function): Apply a function to a column.
Example: Create a new column 'Grade' based on the 'Score' column.
df['Grade'] = df['Score'].apply(lambda x: 'A' if x >= 90 else 'B' if x >= 80 else 'C')


5.Handling Missing Data:

df.isnull(), df.notnull(): Check for missing values.
df.dropna(): Remove rows with missing values.
df.fillna(value): Replace missing values with a specific value.
Example: Replace missing values in the 'Age' column with the mean age.
mean_age = df['Age'].mean()
df['Age'].fillna(mean_age, inplace=True)


6.Merging and Joining DataFrames:

pd.concat([df1, df2]): Concatenate two DataFrames.
df1.merge(df2, on='key'): Merge two DataFrames on a common column.
Example: Merge two DataFrames based on a common 'ID' column.
merged_df = df1.merge(df2, on='ID')


In [None]:
Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

In the context of pandas, which is a popular Python library for data manipulation and analysis, the mutability of Series, DataFrame, and Panel objects can be explained as follows:

Series: Series objects are mutable. This means you can modify the values of a Series after it has been created. You can change individual elements within a Series by assigning new values to specific index labels.
import pandas as pd

# Create a Series
s = pd.Series([1, 2, 3, 4, 5])

# Modify a value in the Series
s[2] = 10

DataFrame: DataFrame objects are also mutable. You can modify the contents of a DataFrame by adding, updating, or deleting rows and columns. You can change the values of specific cells, columns, or rows.

Example:
import pandas as pd

# Create a DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

# Modify a value in the DataFrame
df.at[1, 'B'] = 10
 
    
Panel: In earlier versions of pandas (before version 0.25.0), there was a data structure called Panel, which was designed to handle three-dimensional data. However, Panel was deprecated and removed from pandas due to its complexity and lack of common use cases. As of my last knowledge update in September 2021, Panel is no longer a part of the pandas library. Therefore, there is no need to discuss its mutability.

Q7. Create a DataFrame using multiple Series. Explain with an example.

In [1]:
import pandas as pd

# Creating multiple Series
names = pd.Series(["Alice", "Bob", "Charlie", "David"])
ages = pd.Series([25, 30, 22, 35])
scores = pd.Series([95, 89, 78, 92])

# Creating a DataFrame by combining the Series
data = {
    "Name": names,
    "Age": ages,
    "Score": scores
}

df = pd.DataFrame(data)

# Displaying the DataFrame
print(df)


      Name  Age  Score
0    Alice   25     95
1      Bob   30     89
2  Charlie   22     78
3    David   35     92


In this example:

We import the pandas library as pd.
We create three Series: names, ages, and scores. These Series represent columns of our DataFrame.
We create a dictionary called data where each key represents the column name, and the corresponding value is the Series that will populate that column in the DataFrame.
We use the pd.DataFrame() constructor to create the DataFrame df by passing the data dictionary.
Finally, we print the DataFrame, which will display the following 