## Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [2]:
import pandas as pd

# Creating the Pandas Series
data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

# Printing the Series
print(series)


0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


## Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [3]:
import pandas as pd

# Creating the list with 10 elements
data_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Converting the list to a Pandas Series
series = pd.Series(data_list)

# Printing the Series
print(series)


0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


## Q3. Create a Pandas DataFrame that contains the following data: Name Alice Bob Claire Age 25 30 27 Gender Female Male Female Then, print the DataFrame.

In [4]:
import pandas as pd

# Creating the data dictionary
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

# Converting the dictionary to a Pandas DataFrame
df = pd.DataFrame(data)

# Printing the DataFrame
print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


## Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

A DataFrame in pandas is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is similar to a spreadsheet or SQL table, or a dictionary of Series objects. Each column in a DataFrame can be of a different data type, and it allows for a wide range of operations and data manipulations.

A Series, on the other hand, is a one-dimensional labeled array capable of holding any data type (integers, strings, floating-point numbers, etc.). Think of a Series as a single column of data.
Key Differences

    Dimensions: A Series is one-dimensional (like a single column), while a DataFrame is two-dimensional (like a table with rows and columns).
    Data Structure: A Series can be thought of as a single column of data with an index, whereas a DataFrame consists of multiple columns, each of which is a Series.
    Data Types: In a DataFrame, different columns can contain different data types (e.g., integers, floats, strings), whereas a Series contains data of a single data type.

In [5]:
import pandas as pd

# Creating a Series
data_series = pd.Series([10, 20, 30, 40, 50], name="Numbers")
print("Series:")
print(data_series)
print()

# Creating a DataFrame
data_dict = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}
data_frame = pd.DataFrame(data_dict)
print("DataFrame:")
print(data_frame)


Series:
0    10
1    20
2    30
3    40
4    50
Name: Numbers, dtype: int64

DataFrame:
     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


## Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

Pandas provides a wide array of functions to manipulate data in a DataFrame. Here are some common functions and their typical use cases:

    head() and tail(): View the first or last few rows of the DataFrame.
        Example: df.head(3) to see the first 3 rows of the DataFrame.

    info(): Get a concise summary of the DataFrame, including the data types of each column and the number of non-null values.
        Example: df.info() to understand the structure and data types of your DataFrame.

    describe(): Generate descriptive statistics of numeric columns.
        Example: df.describe() to get summary statistics like mean, median, and standard deviation.

    loc[] and iloc[]: Access a group of rows and columns by labels or integer positions, respectively.
        Example: df.loc[0, 'Name'] to access the value in the first row and the 'Name' column.

    drop(): Remove rows or columns.
        Example: df.drop('Age', axis=1) to remove the 'Age' column.

    fillna(): Fill missing values.
        Example: df.fillna(0) to replace all NaN values with 0.

    groupby(): Group data by one or more columns and perform aggregate functions.
        Example: df.groupby('Gender').mean() to get the mean of numeric columns grouped by 'Gender'.

    merge(): Merge two DataFrames.
        Example: pd.merge(df1, df2, on='Name') to merge df1 and df2 on the 'Name' column.

    apply(): Apply a function along an axis of the DataFrame.
        Example: df['Age'].apply(lambda x: x + 1) to add 1 to every value in the 'Age' column.

    sort_values(): Sort by the values along either axis.
        Example: df.sort_values(by='Age') to sort the DataFrame by the 'Age' column.

Example Use Case

Let's say we have a DataFrame containing information about employees and we want to find the average age of employees grouped by their department:

In [6]:
import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Claire', 'Dan', 'Eve'],
    'Age': [25, 30, 27, 40, 35],
    'Department': ['HR', 'Engineering', 'HR', 'Engineering', 'Finance']
}

df = pd.DataFrame(data)

# Group by 'Department' and calculate the mean age
average_age_by_department = df.groupby('Department')['Age'].mean()

print(average_age_by_department)


Department
Engineering    35.0
Finance        35.0
HR             26.0
Name: Age, dtype: float64


## Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

In pandas, both Series and DataFrame are mutable, meaning you can change their contents after they are created. However, Panel has been deprecated as of pandas version 0.25.0 and is no longer available in more recent versions of pandas. Instead, for three-dimensional data, pandas recommends using the xarray library.
Mutability of Series and DataFrame

    Series: A one-dimensional labeled array that you can modify by changing values, adding new values, or removing existing values.
    DataFrame: A two-dimensional labeled data structure that allows modifications such as changing values, adding or removing columns or rows, and altering the structure of the data.

Example of Mutability
Series

In [7]:
import pandas as pd

# Creating a Series
data_series = pd.Series([1, 2, 3, 4, 5])
print("Original Series:")
print(data_series)

# Modifying the Series
data_series[0] = 10
print("\nModified Series:")
print(data_series)


Original Series:
0    1
1    2
2    3
3    4
4    5
dtype: int64

Modified Series:
0    10
1     2
2     3
3     4
4     5
dtype: int64


DataFrame

In [8]:
import pandas as pd

# Creating a DataFrame
data_frame = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
print("Original DataFrame:")
print(data_frame)

# Modifying the DataFrame
data_frame['A'][0] = 10
data_frame['C'] = [7, 8, 9]  # Adding a new column
print("\nModified DataFrame:")
print(data_frame)


Original DataFrame:
   A  B
0  1  4
1  2  5
2  3  6

Modified DataFrame:
    A  B  C
0  10  4  7
1   2  5  8
2   3  6  9


## Q7. Create a DataFrame using multiple Series. Explain with an example.

In [9]:
import pandas as pd

# Creating multiple Series
name_series = pd.Series(['Alice', 'Bob', 'Claire'])
age_series = pd.Series([25, 30, 27])
gender_series = pd.Series(['Female', 'Male', 'Female'])

# Creating a DataFrame using the Series
data_frame = pd.DataFrame({
    'Name': name_series,
    'Age': age_series,
    'Gender': gender_series
})

# Printing the DataFrame
print(data_frame)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female
