Pandas provides a wide range of functions for manipulating data in a DataFrame. Here are some common functions you can use, along with examples of when you might use them:

head() and tail(): These functions allow you to view the first few rows (head()) or the last few rows (tail()) of a DataFrame. Useful for quickly inspecting the data.

In [1]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)

print("First few rows:")
print(df.head())

print("\nLast few rows:")
print(df.tail())


First few rows:
     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female

Last few rows:
     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


shape: Returns the dimensions of the DataFrame (rows, columns).

In [2]:
print("DataFrame shape:", df.shape)


DataFrame shape: (3, 3)


describe(): Generates summary statistics of the numeric columns in the DataFrame.

In [3]:
print(df.describe())


             Age
count   3.000000
mean   27.333333
std     2.516611
min    25.000000
25%    26.000000
50%    27.000000
75%    28.500000
max    30.000000


info(): Provides information about the DataFrame, including data types and non-null counts

In [4]:
print(df.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   Gender  3 non-null      object
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes
None


loc[] and iloc[]: Used for label-based and integer-based indexing, respectively.

In [5]:
print("Using loc:")
print(df.loc[0])  # Access row by label

print("\nUsing iloc:")
print(df.iloc[1])  # Access row by index


Using loc:
Name       Alice
Age           25
Gender    Female
Name: 0, dtype: object

Using iloc:
Name       Bob
Age         30
Gender    Male
Name: 1, dtype: object


groupby(): Allows you to group data based on one or more columns and apply aggregation functions.

In [6]:
grouped = df.groupby('Gender')
print(grouped['Age'].mean())  # Mean age by gender


Gender
Female    26.0
Male      30.0
Name: Age, dtype: float64


sort_values(): Sorts the DataFrame based on specified columns.

In [7]:
sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)


     Name  Age  Gender
1     Bob   30    Male
2  Claire   27  Female
0   Alice   25  Female


drop(): Removes rows or columns from the DataFrame.

In [8]:
df_dropped = df.drop(index=0)  # Drop the first row
print(df_dropped)


     Name  Age  Gender
1     Bob   30    Male
2  Claire   27  Female


fillna(): Fills missing values with a specified value or method.

In [9]:
df_filled = df.fillna(0)  # Fill missing values with 0
print(df_filled)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


apply(): Applies a function to each element, row, or column of the DataFrame.

In [10]:
df['Age_squared'] = df['Age'].apply(lambda x: x**2)
print(df)


     Name  Age  Gender  Age_squared
0   Alice   25  Female          625
1     Bob   30    Male          900
2  Claire   27  Female          729
