Let's import the NumPy and Pandas libraries

In [None]:
import numpy as np  # we're using this to use np.nan; missing values in Pandas objects are represented by np.nan
import pandas as pd

We can perform element-wise operations and broadcasting on the rows and columns of DataFrames. Each row and each column in a Pandas DataFrame can be considered its own Pandas Series.

In [None]:
data1 = {
    'Name': ['Ashok', 'Bob', 'Chandni', 'Dawood', 'Esha'],
    'Age': [25, 30, 18, 22, 30],
    'City': ['Mumbai', 'Kolkata', 'Delhi', 'Bengaluru', 'Agra'],
    'Income June': [30000, 40000, 50000, 60000, np.nan],
    'Income July': [30500, 40000, 50000, np.nan, 70000]
}
df1 = pd.DataFrame(data1)
df1

In [None]:
df1['Income June + July'] = df1['Income June'] + df1['Income July']
df1

In [None]:
df1['Income June + July'] = df1['Income June'].add(df1['Income July'], fill_value = 0)  # .add() is a Series method which works here
df1

In [None]:
df1['Tax on Income'] = 0.1 * df1['Income June + July']
df1

In [None]:
df1[['Income June', 'Income July']] = df1[['Income June', 'Income July']].fillna(0)
df1

As DataFrame rows and columns are Series, they are compatible with any Series or NumPy array methods

In [None]:
df1['Income June'].sum()

In [None]:
df1['Income June'].mean()

In [None]:
df1['Age'].sort_values()

You can also perform operations on an entire DataFrame. For example, if you want to sort the complete DataFrame by the 'Age' column:

In [None]:
df1 = df1.sort_values('Age')
df1

We can also sort by more than a single column, by providing them as a list

In [None]:
df1 = df1.sort_values(['Age', 'City'])
df1

In [None]:
df1 = df1.reset_index(drop = True)  # resetting the index so the reordered DataFrame is numbered in order from 0 to 4
df1

You can use conditional filtering to select and/or modify only certain portions of a DataFrame

In [None]:
df1.loc[df1['City'].isin(['Mumbai', 'Bengaluru']), 'Tax on Income'] += 500  # adding extra tax to rows whose 'City' is either 'Mumbai' or 'Bengaluru'
df1

In [None]:
df1.query('Age >= 25')

In [None]:
df1.query('Age >= 25 & `Income June` > 30000')  # ` can be used for column names that have spaces in them within the string query statement

Like with Series, you can use the `.apply()` and `.map()` functions to use functions on a complete row or column

In [None]:
# The default behaviour for apply is for it to work along all columns (axis = 0); in this case, we have just selected the 'Tax on Income' column
df1['Tax in Dollars'] = df1['Tax on Income'].apply(lambda x: x / 84)  # considering 1$ = Rs. 84
df1

In [None]:
# Here, apply is working along each row (axis = 1)
df1['Tax Statement'] = df1.apply(lambda row: f"{row['Name']}, aged {row['Age']}, owes a tax of ${row['Tax in Dollars']:.2f}.", axis = 1)
df1

In [None]:
df1['Tax Warning'] = df1['Tax Statement'].map(lambda x: x.upper())  # .map() doesn't have an axis parameter; it can only run on a single Series
df1

You can drop rows or columns using `.drop()`

In [None]:
df1.drop(columns = ['Income June', 'Income July'], inplace = True)
df1

In [None]:
df1.drop(labels = 1, inplace = True)
df1

We can also chain methods on Pandas DataFrames

In [None]:
df2 = df1.loc[df1['Age'] > 20, ['Name', 'City', 'Income June + July']].sort_values('Income June + July').reset_index(drop = True)
df2