# DataFrame Operations in Pandas

This notebook demonstrates various DataFrame operations in Pandas, including statistical calculations, shifting values, applying user-defined functions, and string operations.

In [1]:
import pandas as pd
import numpy as np

## Creating a DataFrame for Patient Data

We will create a DataFrame to represent patient data, including age, height, and weight.

In [2]:
# Create a dictionary with patient data
patients_data = {
    'Age': [25, 34, 45, 56, 28, 40, 33, 29, 50, 60],
    'Height': [170, 165, 180, 175, 160, 155, 185, 172, 168, 174],  # Height in cm
    'Weight': [70, 65, 85, 90, 60, 55, 88, 75, 68, 77]  # Weight in kg
}

# Create a Pandas DataFrame from the dictionary
patients_df = pd.DataFrame(patients_data, index=list(range(10)))

# Display the DataFrame
patients_df

Unnamed: 0,Age,Height,Weight
0,25,170,70
1,34,165,65
2,45,180,85
3,56,175,90
4,28,160,60
5,40,155,55
6,33,185,88
7,29,172,75
8,50,168,68
9,60,174,77


## Calculating the Mean Value for Each Column

We will calculate the mean value for each column in the DataFrame.

In [3]:
# Calculate the mean value for each column
column_mean = patients_df.mean()

# Display the mean values for each column
column_mean

Age        40.0
Height    170.4
Weight     73.3
dtype: float64

## Calculating the Mean Value for Each Row

We will calculate the mean value for each row in the DataFrame.

In [5]:
# Calculate the mean value for each row
row_mean = patients_df.mean(axis=1)

# Display the mean values for each row
row_mean

0     88.333333
1     88.000000
2    103.333333
3    107.000000
4     82.666667
5     83.333333
6    102.000000
7     92.000000
8     95.333333
9    103.666667
dtype: float64

## Creating a DataFrame for Student Data

We will create a DataFrame to represent student data, including names, ages, marks, and GPAs. Then we will calculate the mean value for each numeric column, excluding the 'Name' column.

In [7]:
# Create a dictionary with student data
students = {
    'Name': ['Eric', 'Ivy', 'Jude'],
    'Age': [22, 25, 26],
    'Marks': [95, 82, 87],
    'GPA': [4, 3.2, 3.85]
}

# Create an index for the DataFrame
index = list(range(1, len(students)))

# Create a Pandas DataFrame from the dictionary
students_df = pd.DataFrame(students, index=index)

# Display the DataFrame
students_df

Unnamed: 0,Name,Age,Marks,GPA
1,Eric,22,95,4.0
2,Ivy,25,82,3.2
3,Jude,26,87,3.85


## Calculating the Mean Value for Each Numeric Column

We will calculate the mean value for each numeric column in the DataFrame, excluding the 'Name' column.

In [8]:
# Calculate the mean value for each numeric column
selected_column_mean = students_df.select_dtypes(include='number').mean()

# Display the mean values for each numeric column
selected_column_mean

Age      24.333333
Marks    88.000000
GPA       3.683333
dtype: float64

## Shifting Values in the DataFrame

We will shift the values in the DataFrame by 2 positions.

In [9]:
# Create a copy of the DataFrame for shifting values
patients_df2 = patients_df.copy()

# Shift the values by 2 positions
shifted_values = patients_df2.shift(2)

# Display the shifted DataFrame
shifted_values

Unnamed: 0,Age,Height,Weight
0,,,
1,,,
2,25.0,170.0,70.0
3,34.0,165.0,65.0
4,45.0,180.0,85.0
5,56.0,175.0,90.0
6,28.0,160.0,60.0
7,40.0,155.0,55.0
8,33.0,185.0,88.0
9,29.0,172.0,75.0


## Applying User-Defined Functions

We will demonstrate how to apply user-defined functions using `agg()` and `transform()`.

In [10]:
# Create a copy of the DataFrame for reducing values
patients_df3 = patients_df.copy()

# Apply a user-defined function to reduce values
patients_df3_mean = patients_df3.agg(lambda x: np.mean(x) * 2)

# Display the reduced DataFrame
patients_df3_mean

Age        80.0
Height    340.8
Weight    146.6
dtype: float64

### Broadcasting Values with `transform()`

We will use the `transform()` method to apply a user-defined function that broadcasts its result.

In [11]:
# Create a copy of the DataFrame for broadcasting values
patients_df4 = patients_df.copy()

# Apply a user-defined function to broadcast values
patients_df4_square = patients_df4.transform(lambda x: x ** 2)

# Display the broadcast DataFrame
patients_df4_square

Unnamed: 0,Age,Height,Weight
0,625,28900,4900
1,1156,27225,4225
2,2025,32400,7225
3,3136,30625,8100
4,784,25600,3600
5,1600,24025,3025
6,1089,34225,7744
7,841,29584,5625
8,2500,28224,4624
9,3600,30276,5929


## Counting Values in Each Column

We will count the number of non-null values in each column.

In [12]:
# Count the number of non-null values in each column
values_count = patients_df.count()

# Display the count of non-null values in each column
values_count

Age       10
Height    10
Weight    10
dtype: int64

## String Methods

We will demonstrate string operations using the `str` attribute.

### Converting Strings to Lowercase

We will create a Series with course names and convert them to lowercase.


In [13]:
# Create a Series with course names
course_df = pd.Series(['Java', 'Python', 'JavaScript'], index=[1, 2, 3])

# Convert the strings to lowercase
lower_case = course_df.str.lower()

# Display the lowercase Series
lower_case

1          java
2        python
3    javascript
dtype: object