### Pandas - Most frequently used functions

###### 1. Sort a DataFrame - sort_values()

- We can use the pandas dataframe sort_values() function to sort a dataframe.
<br>

- It allows the flexibility to sort a dataframe by one or more columns,<br>
  choose the sorting algorithm, how to treat NaNs during comparisons,<br> 
  using a custom key for sorting, etc. 
<br>

**Syntax:-** df.sort_values(by, ascending=True, inplace=False)
    
- Pass the column or list of columns to sort by to the by parameter.
<br>
- By default, it returns a sorted dataframe and does not alter the original dataframe.
<br>

- If we wish to modify the original dataframe, pass inplace=True

In [38]:
import numpy as np
import pandas as pd

data = {
    'Name': ['Kobe Bryant', 'LeBron James', 'Michael Jordan', 'Larry Bird'],
    'Height': [198,206,198,206],
    'Championships': [5,4,6,3]
}

df = pd.DataFrame(data)

df

# The dataframe df contains the height in cm and the number of championship victories
# of four of the most celebrated basketball players in the NBA.

Unnamed: 0,Name,Height,Championships
0,Kobe Bryant,198,5
1,LeBron James,206,4
2,Michael Jordan,198,6
3,Larry Bird,206,3


**Example 1:- Sort dataframe by a single column**

In [40]:
# To sort a dataframe by a single column, pass the column name to the by parameter of the sort_values() function.

# For instance, to sort the above dataframe by Height:

df_sorted = df.sort_values(by='Height')

df_sorted

# we can see that the returned dataframe is sorted on Height. 
# Also notice that the returned dataframe retains the row indexes from the original dataframe.

Unnamed: 0,Name,Height,Championships
0,Kobe Bryant,198,5
2,Michael Jordan,198,6
1,LeBron James,206,4
3,Larry Bird,206,3


**Example 2:- Retain the indexes**

In [43]:
# If we do not want to retain the indexes, 
# pass ignore_index=True to the function or reset the index independently.

df_sorted = df.sort_values(by='Height', ignore_index=True)

df_sorted

Unnamed: 0,Name,Height,Championships
0,Kobe Bryant,198,5
1,Michael Jordan,198,6
2,LeBron James,206,4
3,Larry Bird,206,3


**Example 3:- Sort dataframe by multiple columns**

In [42]:
# We can also sort a pandas dataframe by multiple columns. 

# For this, pass the columns by which we want to sort the dataframe as a list to the by parameter.

# Example, to sort the dataframe df by Height and Championships:

df_sorted = df.sort_values(by=['Height','Championships'])

df_sorted

# In the above example, we sort the dataframe df by columns Height and Championships in the ascending order. 
# That is, first by Height and then by Championships. 
# We can see that Lebron James and Larry Bird have the same height
# but due to lesser number of championships Larry Bird is sorted above Lebron James.

Unnamed: 0,Name,Height,Championships
0,Kobe Bryant,198,5
2,Michael Jordan,198,6
3,Larry Bird,206,3
1,LeBron James,206,4


**Example 4:- Sort dataframe by a different sorting algorithm**

In [46]:
# The sort_values() function also allows you to choose from three different sorting algorithms.

# The parameter kind controls this behavior. It takes 'quicksort', 'mergesort', and 'heapsort' as values
# and is 'quicksort' by default. 

# For example, to sort a dataframe by a column using 'mergesort' as the sorting algorithm:

df_sorted = df.sort_values(by='Height', kind='mergesort')

df_sorted

Unnamed: 0,Name,Height,Championships
0,Kobe Bryant,198,5
2,Michael Jordan,198,6
1,LeBron James,206,4
3,Larry Bird,206,3


In [47]:
df_sorted = df.sort_values(by='Height', kind='heapsort')

df_sorted

Unnamed: 0,Name,Height,Championships
0,Kobe Bryant,198,5
2,Michael Jordan,198,6
3,Larry Bird,206,3
1,LeBron James,206,4


In [48]:
df_sorted = df.sort_values(by='Height', kind='quicksort')

df_sorted

Unnamed: 0,Name,Height,Championships
0,Kobe Bryant,198,5
2,Michael Jordan,198,6
1,LeBron James,206,4
3,Larry Bird,206,3


###### 2. Change Order of Columns of a Pandas DataFrame

- During the data preprocessing and feature creation stage,<br>
  it might happen that you end up with columns that may not necessarily be in the order that we'd like. 
<br>

- We’ll look at how to change the order of columns of a pandas dataframe.
<br>
- To change the order of columns of a dataframe, we can pass a list with columns
  in the desired order to [] (that is, indexing with []). 
  
  **syntax:-** df_correct_order = df[[col1, col2, col3, ..., coln]] 
  <br>
- Generally, we use [ ] in Pandas dataframes to subset a dataframe,<br>
  but it can also be used to reorder the columns. 
<br>
- we can also use .loc and .iloc to change the order of columns of a dataframe.

In [49]:
import pandas as pd

data = {
    'Name': ['Microsoft Corporation', 'Google, LLC', 'Tesla, Inc.',\
             'Apple Inc.', 'Netflix, Inc.'],
    'Shares': [100, 50, 150, 200, 80],
    'Symbol': ['MSFT', 'GOOG', 'TSLA', 'AAPL', 'NFLX']
}

df = pd.DataFrame(data) # create dataframe

df # display the dataframe

# Here, df is a dataframe of a sample stock portfolio with columns Name, Shares, Symbol.

# We want to reorder the columns such that the resulting dataframe has columns in the order Name, Symbol, Shares

Unnamed: 0,Name,Shares,Symbol
0,Microsoft Corporation,100,MSFT
1,"Google, LLC",50,GOOG
2,"Tesla, Inc.",150,TSLA
3,Apple Inc.,200,AAPL
4,"Netflix, Inc.",80,NFLX


**Example 1:- Change column order using [ ]**

In [50]:
# We can pass the columns in the order which we like as a list.

# New dataframe with different column order

df_new = df[['Name', 'Symbol', 'Shares']]

df_new # display the dataframe

Unnamed: 0,Name,Symbol,Shares
0,Microsoft Corporation,MSFT,100
1,"Google, LLC",GOOG,50
2,"Tesla, Inc.",TSLA,150
3,Apple Inc.,AAPL,200
4,"Netflix, Inc.",NFLX,80


**Example 2:- Change column order using .loc**

In [51]:
# We can also reorder a pandas dataframe by indexing it using .loc. 

# This way, we can reorder columns using their names as we did in the previous example.

# New dataframe with different column order

df_new = df.loc[:, ['Name', 'Symbol', 'Shares']]

df_new # display the dataframe

Unnamed: 0,Name,Symbol,Shares
0,Microsoft Corporation,MSFT,100
1,"Google, LLC",GOOG,50
2,"Tesla, Inc.",TSLA,150
3,Apple Inc.,AAPL,200
4,"Netflix, Inc.",NFLX,80


**Example 3:- Change column order using .iloc**

In [52]:
# We can also change the column order of a dataframe by indexing it using .iloc.

# Here, we pass the column indexes instead of their names in the order that we want.

# new dataframe with different column order

df_new = df.iloc[:, [0, 2, 1]]

df_new

Unnamed: 0,Name,Symbol,Shares
0,Microsoft Corporation,MSFT,100
1,"Google, LLC",GOOG,50
2,"Tesla, Inc.",TSLA,150
3,Apple Inc.,AAPL,200
4,"Netflix, Inc.",NFLX,80


###### 3. Count of Unique Values in Each Column

- Generally, the data in each column represents a different feature of a pandas dataframe. 
<br>
- It may be continuous, categorical, or something totally different like distinct texts. 
<br>
- If you’re not sure about the nature of the values you’re dealing with, 
<br>
- it might be a good exploratory step to know about the count of distinct values. 
<br>

- Here, we’ll look at how to get the count of unique values in each column of a pandas dataframe.

**nunique() function:-**
    
  - To count the unique values of each column of a dataframe, we can use the pandas dataframe nunique() function. 
<br>

**Syntax:-** counts = df.nunique()
    
  - Here, df is the dataframe for which you want to know the unique counts. 
  - It returns a pandas Series of counts. 
  - By default,the pandas dataframe nunique() function counts the distinct values along axis=0,
    that is, row-wise which gives you the count of distinct values in each column.

In [53]:
import pandas as pd
import numpy as np

# create a sample dataframe
data = {
    'EmpCode': ['E1', 'E2', 'E3', 'E4', 'E5'],
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Male'],
    'Age': [27, 24, 29, 24, 25],
    'Department': ['Accounting', 'Sales', 'Accounting', np.nan, 'Sales']
}

df = pd.DataFrame(data)

df

Unnamed: 0,EmpCode,Gender,Age,Department
0,E1,Male,27,Accounting
1,E2,Female,24,Sales
2,E3,Female,29,Accounting
3,E4,Male,24,
4,E5,Male,25,Sales


**Example 1:- Count of unique values in each column**

In [54]:
# Using the pandas dataframe nunique() function with default parameters
# gives a count of all the distinct values in each column.

print(df.nunique()) # count of unique values in each column

EmpCode       5
Gender        2
Age           4
Department    2
dtype: int64


In [None]:
# In the above example, the nunique() function returns a 
# pandas Series with counts of distinct values in each column. 

# Note that, for the Department column we only have two distinct values as the nunique() function, 
# by default, ignores all NaN values.

**Example 2:- Count of unique values in each row**

In [None]:
# We can also get the count of distinct values in each row
# by setting the axis parameter to 1 or 'columns' in the nunique() function.

print(df.nunique(axis=1))  # count of unique values in each row

# In the above example, we can see that we have 4 distinct values in each row 
# except for the row with index 3 which has 3 unique values due to the presence of a NaN value.

**Example 3:- Value_counts()**

In [57]:
# In case, if we want to know the count of each of the distinct values of a specific column, 
#  can use the pandas value_counts() function.

# In the above dataframe df, if we want to know the count of each distinct value in the column Gender,

# We can use –

print(df['Gender'].value_counts())

# In the above example, the pandas series value_counts() function 
# is used to get the counts of 'Male' and 'Female', 
# the distinct values in the column B of the dataframe df.

Gender
Male      3
Female    2
Name: count, dtype: int64


###### 4. Replace Values in a DataFrame

- When working with pandas dataframes, it might be handy to know how to quickly replace values.
<br>
- The pandas dataframe replace() function is used to replace values in a pandas dataframe. 
<br>
- It allows you the flexibility to replace a single value, multiple values,<br>
  or even use regular expressions for regex substitutions.

  **Syntax:-** df_rep = df.replace(to_replace, value)
  <br>

  - Here, **to_replace** is the value or values to be replaced and value is the value to replace with.<br> 

  - By default, the pandas dataframe replace() function returns a copy of the dataframe with the values replaced.<br>

  - If we want to replace the values in-place pass inplace=True

![image-2.png](attachment:image-2.png)

**Example 1:- Replace values throughout the dataframe**

In [58]:
# The replace() function replaces all occurrences of the value with the desired value.

import pandas as pd

df = pd.DataFrame({'A': ['a','b','c'], 'B':['b','c','d']})

print("Original DataFrame:\n", df)

df_rep = df.replace('b', 'e')  # replace b with e

print("\nAfter replacing:\n", df_rep)

Original DataFrame:
    A  B
0  a  b
1  b  c
2  c  d

After replacing:
    A  B
0  a  e
1  e  c
2  c  d


**Example 2:- Replace values in a particular column**

In [59]:
# The pandas dataframe replace() function allows us the flexibility 
# to replace values in specific columns without affecting values in other columns.

import pandas as pd

df = pd.DataFrame({'A': ['a','b','c'], 'B':['b','c','d']})

print("Original DataFrame:\n", df)

df_rep = df.replace({'A': 'b'}, 'e') # replace b with e

print("\nAfter replacing:\n", df_rep)

# In the above example, we replace the occurrences of b in just the column A of the dataframe. 
# For this, we pass a dictionary to the to_replace parameter. 
# Here the dictionary {'A': 'b'} tells the replace function that 
# we want to replace the value b in the column A. 
# And the 'e' passed to the value parameter use is used to replace all relevant matches.

Original DataFrame:
    A  B
0  a  b
1  b  c
2  c  d

After replacing:
    A  B
0  a  b
1  e  c
2  c  d


**Example 3:- Replace multiple values together**

In [61]:
# We can also have multiple replacements together. 
# For example, if we want to replace a with b, b with c and c with d in the above dataframe,
# We can pass just a single dictionary to the replace function.

import pandas as pd

df = pd.DataFrame({'A': ['a','b','c'], 'B':['b','c','d']})
print("Original DataFrame:\n", df)

df_rep = df.replace({'a':'b', 'b':'c', 'c':'d'}) # replace a with b, b with c, and c with d

print("\nAfter replacing:\n", df_rep)

Original DataFrame:
    A  B
0  a  b
1  b  c
2  c  d

After replacing:
    A  B
0  b  c
1  c  d
2  d  d


**Example 4 :- Replace using Regex match**

- To replace values within a dataframe via a regular expression match,**pass regex=True** to the replace function. 
<br>
- Keep in mind that, we pass the regular expression string to the **to_replace** parameter and<br>
  the value to replace the matches to the value parameter. 
<br>

- Also, note that regular expressions will only substitute for strings.

In [62]:
import pandas as pd

df = pd.DataFrame({'A': ['tap','cap','map'], 'B':['cap','map', 'tap']})

print("Original DataFrame:\n", df)

df_rep = df.replace(to_replace='ap', value='op', regex=True) # replace ap with op

print("\nAfter replacing:\n", df_rep)

Original DataFrame:
      A    B
0  tap  cap
1  cap  map
2  map  tap

After replacing:
      A    B
0  top  cop
1  cop  mop
2  mop  top


#### 5. Filter DataFrame for multiple conditions

- Filtering is one of the most common dataframe manipulations in pandas. 
<br>

- When working with data in pandas dataframes, we’ll often encounter situations<br> 
  where we need to filter the dataframe to get a specific selection of rows<br>
  based on our criteria which may even involve multiple conditions.

![image-2.png](attachment:image-2.png)

In [63]:
import pandas as pd

data = {
    'Name': ['Microsoft Corporation', 'Google, LLC', 'Tesla, Inc.',\
             'Apple Inc.', 'Netflix, Inc.'],
    'Symbol': ['MSFT', 'GOOG', 'TSLA', 'AAPL', 'NFLX'],
    'Industry': ['Tech', 'Tech', 'Automotive', 'Tech', 'Entertainment'],
    'Shares': [100, 50, 150, 200, 80]
}

df = pd.DataFrame(data)

df

Unnamed: 0,Name,Symbol,Industry,Shares
0,Microsoft Corporation,MSFT,Tech,100
1,"Google, LLC",GOOG,Tech,50
2,"Tesla, Inc.",TSLA,Automotive,150
3,Apple Inc.,AAPL,Tech,200
4,"Netflix, Inc.",NFLX,Entertainment,80


**Example 1:- How to filter a dataframe for multiple conditions?**

- Pandas dataframes allow for boolean indexing which is quite an efficient way to filter
  a dataframe for multiple conditions.
<br>
- In boolean indexing, boolean vectors generated based on the conditions are used to filter the data. 
<br>

- Multiple conditions involving the operators | (for or operation), & (for and operation), 
  and ~ (for not operation) can be grouped using parenthesis ().

In [64]:
# In the sample dataframe created, let’s filter for all the stocks that are in the Tech industry and
# have 100 or more shares in the portfolio.

df_filtered = df[(df['Industry']=='Tech')&(df['Shares']>=100)]

df_filtered

Unnamed: 0,Name,Symbol,Industry,Shares
0,Microsoft Corporation,MSFT,Tech,100
3,Apple Inc.,AAPL,Tech,200


**Things to remember**

- We should keep in mind the following two things,
  when using boolean indexing to filter dataframes for multiple conditions:

**1) Use the operators &, |, ~ instead of and, or, not respectively**

In [65]:
# Pandas provides operators & (for and), | (for or), and ~ (for not) to 
# apply logical operations on series and to chain multiple conditions together when filtering a pandas dataframe. 

# If we instead use the python logical operators, it results in an error.

# For example, if we filter for stocks having shares in the range of 100 to 150 using and we get an error:

df_filtered = df[(df['Shares']>=100) and (df['Shares']<=150)]

print(df_filtered)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [None]:
# The error occurred because python’s logical operators (and, or, not) are meant to be used with boolean values
# so when you try to use them with a series or an array,
# it’s not clear how to determine whether it’s True or False and hence it results in a ValueError.

**2) Use parenthesis () to group multiple conditions**

In [66]:
# If we do not use parenthesis () to group your conditions, 
# python evaluates the expression based on operator precedence
# which can give unintended results with operators &, | and ~.

# For example, if we filter for stocks having shares in the range 100 to 150 without using parenthesis we get an error:

df_filtered = df[df['Shares']>=100 & df['Shares']<=150]

print(df_filtered)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [None]:
# In the above example, the error because in the absence of parenthesis (), 
# the expression df['Shares']>=100 & df['Shares']<=150 is evaluated as

# df['Shares'] >= (100 & df['Shares']) <= 150

# since the bitwise & operator has higher precedence
# than the comparison operators >= and <= and is evaluated first.

**Conclusion**

In [68]:
# Boolean indexing is an effective way to filter a pandas dataframe based on multiple conditions.

# But remember to use parenthesis to group conditions together and use operators &, |, and ~ 
# for performing logical operations on series.

# If we want to filter for stocks having shares in the range of 100 to 150, the correct usage would be:
    
df_filtered = df[(df['Shares']>=100) & (df['Shares']<=150)]

df_filtered

Unnamed: 0,Name,Symbol,Industry,Shares
0,Microsoft Corporation,MSFT,Tech,100
2,"Tesla, Inc.",TSLA,Automotive,150


###### 6. Drop Duplicates

- drop_duplicates() function can be used to remove duplicate rows from a dataframe/dataset. 
<br>

- It also gives us the flexibility to identify duplicates based on certain columns through the subset parameter. 
<br>

  **syntax:-** df.drop_duplicates()

  - It returns a dataframe with the duplicate rows removed.<br> 
  - It drops the duplicates except for the first occurrence by default.<br>
  - We can change this behavior through the parameter keep which takes in 'first', 'last', or False.<br>
 
  - To modify the dataframe in-place pass the argument inplace=True.


![image.png](attachment:image.png)

**Example 1:- Drop duplicate rows based on all columns**

In [69]:
# By default, the drop_duplicates() function identifies the duplicates taking all the columns into consideration.

# It then, drops the duplicate rows and just keeps their first occurrence.

import pandas as pd

# create a sample dataframe with duplicate rows

data = {
    'Pet': ['Cat', 'Dog', 'Dog', 'Dog', 'Cat'],
    'Color': ['Brown', 'Golden', 'Golden', 'Golden', 'Black'],
    'Eyes': ['Black', 'Black', 'Black', 'Brown', 'Green']
}

df = pd.DataFrame(data)

# print the dataframe
print("The original dataframe:\n")

df

The original dataframe:



Unnamed: 0,Pet,Color,Eyes
0,Cat,Brown,Black
1,Dog,Golden,Black
2,Dog,Golden,Black
3,Dog,Golden,Brown
4,Cat,Black,Green


In [72]:
# drop duplicates

df1 = df.drop_duplicates()

print("After dropping duplicates:")

df1

# In the above example, you can see that the rows with
# index 1 and 2 have the same values for all the three columns.

# On applying the drop_duplicates() function, 
# the first row is retained and the remaining duplicate rows are dropped. 

# As a result, the dataframe returned does not have a continuous index.

After dropping duplicates:


Unnamed: 0,Pet,Color,Eyes
0,Cat,Brown,Black
1,Dog,Golden,Black
3,Dog,Golden,Brown
4,Cat,Black,Green


**Example 2:- Having continuous index-(ignore_index=True)**

In [75]:
# If we want the returned dataframe to have a continuous index 
# pass ignore_index=True to the drop_duplicates() function 
# or reset the index of the returned dataframe.

df1 = df.drop_duplicates(ignore_index=True)

print("After dropping duplicates:")

df1

After dropping duplicates:


Unnamed: 0,Pet,Color,Eyes
0,Cat,Brown,Black
1,Dog,Golden,Black
2,Dog,Golden,Brown
3,Cat,Black,Green


###### 7. equals() function

- The pandas dataframe function equals() is used to compare two dataframes for equality. 
<br>

- It returns True if the two dataframes have the same shape and elements.

 **Syntax :-** df1.equals(df2)

  - Here, df1 and df2 are the two dataframes you want to compare. 
<br>

  - Note that NaNs in the same location are considered equal.

**Example 1:- Compare two exactly similar dataframes**

In [84]:
import pandas as pd

# two identical dataframes

df1 = pd.DataFrame({'A': [1,2], 'B': ['x', 'y']})

df2 = pd.DataFrame({'A': [1,2], 'B': ['x', 'y']})

# print the two dataframes

print("DataFrame df1:")

print(df1)

print("\nDataFrame df2:")

print(df2)

print()

# check if both are equal

print("Two dataframes are equal:-", df1.equals(df2))

DataFrame df1:
   A  B
0  1  x
1  2  y

DataFrame df2:
   A  B
0  1  x
1  2  y

Two dataframes are equal:- True


**Example 2:- Compare two exactly similar dataframes with NaNs**

In [87]:
import pandas as pd
import numpy as np

# two identical dataframes

df1 = pd.DataFrame({'A': [1,np.nan], 'B': ['x', None]})

df2 = pd.DataFrame({'A': [1,np.nan], 'B': ['x', None]})

print("DataFrame df1:")

print(df1)

print("\nDataFrame df2:")

print(df2)

# check if both are equal

print("\nAre both equal?:-",df1.equals(df2))

# Here, we can see that NaNs and None are considered equal if they occur at the same location.

DataFrame df1:
     A     B
0  1.0     x
1  NaN  None

DataFrame df2:
     A     B
0  1.0     x
1  NaN  None

Are both equal?:- True


**Example 3:- Compare two dataframes with equal values but different dtypes**

In [88]:
import pandas as pd
import numpy as np

# two identical dataframes

df1 = pd.DataFrame({'A': [1,2], 'B': ['x', 'y']})

df2 = pd.DataFrame({'A': [1.0,2.0], 'B': ['x', 'y']})

# print the two dataframes

print("DataFrame df1:")
print(df1)

print("\nDataFrame df2:")
print(df2)

# check if both are equal

print("\nAre both equal?:-",df1.equals(df2))

DataFrame df1:
   A  B
0  1  x
1  2  y

DataFrame df2:
     A  B
0  1.0  x
1  2.0  y

Are both equal?:- False


**Example 4:- Compare dataframes with same elements but different column names**

In [89]:
# What will the equals() function return if two dataframes have the same elements but different column names?

import pandas as pd
import numpy as np

# two identical dataframes

df1 = pd.DataFrame({'A': [1,2], 'B': ['x', 'y']})
df2 = pd.DataFrame({'C': [1,2], 'D': ['x', 'y']})

# print the two dataframes

print("DataFrame df1:")
print(df1)

print("\nDataFrame df2:")
print(df2)

# check if both are equal

print("\nAre both equal?:-",df1.equals(df2))

DataFrame df1:
   A  B
0  1  x
1  2  y

DataFrame df2:
   C  D
0  1  x
1  2  y

Are both equal?:- False


#### 8. Get Column Names

- While working with pandas dataframes, it may happen that
  we require a list of all the column names present in a dataframe. 
  <br>
  
- We can use **df.columns** to get the column names but it returns them as an Index object.

In [90]:
import pandas as pd

data = {
    "Name": ["Google, LLC", "Microsoft Corporation", "Tesla, Inc."],
    "Symbol": ["GOOG", "MSFT", "TSLA"],
    "Shares": [100, 50, 80],
}

df = pd.DataFrame(data) # create dataframe

df  # display dataframe

Unnamed: 0,Name,Symbol,Shares
0,"Google, LLC",GOOG,100
1,Microsoft Corporation,MSFT,50
2,"Tesla, Inc.",TSLA,80


**Example 1:- Get column names**

In [92]:
df.columns

Index(['Name', 'Symbol', 'Shares'], dtype='object')

**Example 2:- Using the list() function**

In [94]:
# Pass the dataframe to the list() function to get the list of column names.

list(df)

['Name', 'Symbol', 'Shares']

#### 9. Select One or More Columns

- There are a number of ways in which you can select a subset of columns in pandas. 
<br>

- We can select them by their names or their indexes.

**Example 1:- By passing columns names as list to the indexing operator [ ]**

In [98]:
import pandas as pd

# create a sample dataframe

data = {
    'Name': ['Jim', 'Dwight', 'Angela', 'Tobi'],
    'Age': [26, 28, 27, 32],
    'Department': ['Sales', 'Sales', 'Accounting', 'Human Resources']
}

df = pd.DataFrame(data)

print("The original dataframe:-") # print the dataframe

df

The original dataframe:-


Unnamed: 0,Name,Age,Department
0,Jim,26,Sales
1,Dwight,28,Sales
2,Angela,27,Accounting
3,Tobi,32,Human Resources


In [100]:
df_selected = df[['Name', 'Department']] # select columns 'Name' and 'Department'

print("Dataframe with the selected columns:-")

df_selected

# In the above example, we select the columns Name and Department from the dataframe df
# by passing them as a list to the indexing operator [].

# We can see that the returned dataframe just has those two columns.

Dataframe with the selected columns:-


Unnamed: 0,Name,Department
0,Jim,Sales
1,Dwight,Sales
2,Angela,Accounting
3,Tobi,Human Resources


**Example 2:- Using the .loc property**

- .loc is a pandas dataframe property used for accessing rows or columns of a dataframe by their labels. 
<br>

- We can use it to select a subset of columns of a dataframe by their names.

In [102]:
import pandas as pd

# create a sample dataframe

data = {
    'Name': ['Jim', 'Dwight', 'Angela', 'Tobi'],
    'Age': [26, 28, 27, 32],
    'Department': ['Sales', 'Sales', 'Accounting', 'Human Resources']
}

df = pd.DataFrame(data)

print("The original dataframe:-") # print the dataframe

df

The original dataframe:-


Unnamed: 0,Name,Age,Department
0,Jim,26,Sales
1,Dwight,28,Sales
2,Angela,27,Accounting
3,Tobi,32,Human Resources


In [103]:
df_selected = df.loc[:,['Name', 'Department']] # select columns 'Name' and 'Department'

print("\nDataframe with the selected columns:-")

df_selected

# In the above example, we use df.loc[:,['Name', 'Department']] to select columns Name and Department. 

# Note that the : before the , is used so that we get all the rows for the two columns. 

# We can give your specific slices based on what rows you require.


Dataframe with the selected columns:-


Unnamed: 0,Name,Department
0,Jim,Sales
1,Dwight,Sales
2,Angela,Accounting
3,Tobi,Human Resources


**Example 3:- Select columns by index in pandas - iloc()**

In [105]:
# We can also select columns by giving their indexes using the .iloc property of the dataframe.

import pandas as pd

# create a sample dataframe

data = {
    'Name': ['Jim', 'Dwight', 'Angela', 'Tobi'],
    'Age': [26, 28, 27, 32],
    'Department': ['Sales', 'Sales', 'Accounting', 'Human Resources']
}

df = pd.DataFrame(data)

print("The original dataframe:-")

df

The original dataframe:-


Unnamed: 0,Name,Age,Department
0,Jim,26,Sales
1,Dwight,28,Sales
2,Angela,27,Accounting
3,Tobi,32,Human Resources


In [106]:
df_selected = df.iloc[:,[0, 2]] # select columns 'Name' and 'Department'

print("Dataframe with the selected columns:-")

df_selected

# In the above example, we use the column indexes 0 and 2 to select columns Name and Department respectively
# from the dataframe df.

Dataframe with the selected columns:-


Unnamed: 0,Name,Department
0,Jim,Sales
1,Dwight,Sales
2,Angela,Accounting
3,Tobi,Human Resources


#### 10. Rename Column Names

- While working with data, we may require to change the names of some or all the columns of a dataframe.
<br>

- We can use the Pandas dataframe rename() function to rename column names in Pandas.

![image.png](attachment:image.png)

**Top 3 Methods to Rename Column Names in Pandas**

- **Method 1:-** Use the Pandas dataframe rename() function to modify specific column names.
<br>
- **Method 2:-** Use the Pandas dataframe set_axis() method to change all your column names.
<br>

- **Method 3:-** Set the dataframe’s columns attribute to your new list of column names.

###### Method 1: Use Pandas rename() function to rename columns

- The Pandas dataframe rename() function is a quite versatile function
  used not only to rename column names but also row indices.
<br>
- We can use this function to rename specific columns. 
<br>
- Note that we can also rename a single column or multiple columns. 
<br>
- The following is the syntax to change column names using the Pandas rename() function.

  **df.rename(columns={"OldName":"NewName"})** <br>

- The rename() function returns a new dataframe with 
  renamed axis labels (i.e. the renamed columns or rows depending on usage).
<br>

- To modify the dataframe in place set the argument inplace to True.

**Example :- Change the name of a specific column**

In [107]:
# Here, we will create a dataframe storing the category and color information 
# of some pets in the columns “Category” and “Color” respectively.

import pandas as pd

# data of pets
data = {'Category': ['Dog', 'Cat', 'Rabbit', 'Parrot'],
       'Color': ['brown', 'black', 'white', 'green']}

df = pd.DataFrame(data)

df # display the dataframe

Unnamed: 0,Category,Color
0,Dog,brown
1,Cat,black
2,Rabbit,white
3,Parrot,green


In [108]:
# change column name "Category" to "Pet"

df = df.rename(columns={"Category":"Pet"})

df  # display the dataframe

Unnamed: 0,Pet,Color
0,Dog,brown
1,Cat,black
2,Rabbit,white
3,Parrot,green


###### Method 2: Use Pandas set_axis() function to rename column names

- The Pandas dataframe set_axis() method can be used to rename a dataframe’s columns
  by passing a list of all columns with their new names.
<br>
- Note that the length of this list must be equal to the number of columns in the dataframe. 
<br>
- The following is the syntax: **df.set_axis(new_column_list, axis=1)**
<br>
- We have to explicitly specify the axis as 1 or 'columns' to update column names 
  since its default is 0 (which modifies the axis for rows).
<br>

- It returns a new dataframe with the updated axis. 
  To modify the dataframe in place, set the argument inplace to True.

**Example :- Change the name of a column using set_axis()**

In [111]:
# We’ll take the same use case as above, and change the column name “Category” to “Pet” 
# in a dataframe but this time we will be using the set_axis() method.

import pandas as pd

# data of pets
data = {'Category': ['Dog', 'Cat', 'Rabbit', 'Parrot'],
       'Color': ['brown', 'black', 'white', 'green']}

df = pd.DataFrame(data) # create pandas dataframe

print("Dataframe columns:", df.columns) # print dataframe columns

df

Dataframe columns: Index(['Category', 'Color'], dtype='object')


Unnamed: 0,Category,Color
0,Dog,brown
1,Cat,black
2,Rabbit,white
3,Parrot,green


In [110]:
# change column name Category to Pet

df = df.set_axis(["Pet", "Color"], axis=1)

print("Dataframe columns:", df.columns) # print dataframe columns

df

# In the above example, the set_axis() function is used to rename the column Category to Pet in the dataframe df. 

# Note that we had to provide the list of all the columns for the dataframe
# even if we had to change just one column name.

Dataframe columns: Index(['Pet', 'Color'], dtype='object')


Unnamed: 0,Pet,Color
0,Dog,brown
1,Cat,black
2,Rabbit,white
3,Parrot,green


###### Method 3: Rename columns in Pandas by changing its attribute

- We can also update a dataframe’s column by setting its columns attribute to your new list of columns.
<br>
- The following is the syntax:- **df.columns = new_column_list**
<br>

- Note that new_column_list must be of the same length as the number of columns in our dataframe.

Example: Update columns attribute to change column names

In [112]:
# Create a dataframe with “Category” ad “Color” columns and 
# then change the column name “Category” to “Pet”, but this time we’ll do it by updating the columns attribute.

import pandas as pd

# data of pets
data = {'Category': ['Dog', 'Cat', 'Rabbit', 'Parrot'],
       'Color': ['brown', 'black', 'white', 'green']}

df = pd.DataFrame(data) # create pandas dataframe

print("Dataframe columns:", df.columns) # print dataframe columns

Dataframe columns: Index(['Category', 'Color'], dtype='object')


In [113]:
df.columns = ["Pet", "Color"]  # change column name Category to Pet

print("Dataframe columns:", df.columns)

# In the above example, we change the column names of the dataframe df by setting df.columns to a new column list. 
# Like the set_index() function, we had to provide the list of all the columns for the dataframe
# even if we had to change just one column name.

Dataframe columns: Index(['Pet', 'Color'], dtype='object')


###### 11. Drop one or more Columns

- Pandas dataframes are quite powerful for manipulating data.
  Often while working with data particularly during EDA (Exploratory Data Analysis) and data preprocessing,
  We may require to remove one or more columns. 
<br>
- To drop columns from a pandas dataframe, we can use the pandas dataframe drop() function with axis set to 1 
  to remove one or more columns from a dataframe. 
<br>
- The following is the syntax: **df.drop(cols_to_drop, axis=1)**
<br>  
- Here, cols_to_drop the is index or column labels to drop,
  if more than one columns are to be dropped it should be a list. 
<br>
- The axis represents the axis to remove the labels from,
  it defaults to 0 but if you want to drop columns pass the axis as 1 (i.e. 0 for rows and 1 for columns).
<br>
- Also note that the drop() function does not modify the dataframe in-place by default. 
  It returns a copy of the dataframe with the labels dropped.
<br>

- If you want to modify the dataframe in-place pass the argument inplace=True to the function.

**Example 1:- Drop a single column by name**

In [117]:
import pandas as pd

# create a sample dataframe
data = {
    'A': ['a1', 'a2', 'a3'],
    'B': ['b1', 'b2', 'b3'],
    'C': ['c1', 'c2', 'c3'],
    'D': ['d1', 'd2', 'd3']
}

df = pd.DataFrame(data)


print("Original Dataframe:\n")

df

Original Dataframe:



Unnamed: 0,A,B,C,D
0,a1,b1,c1,d1
1,a2,b2,c2,d2
2,a3,b3,c3,d3


In [118]:
# remove column C

df = df.drop('C', axis=1)

print("After dropping C:-")

df

# In the above example, a sample dataframe df is created with four columns A, B, C, and D. 

# Then, the column C is dropped using the drop() function. 

# Notice that since we had to drop just a single column we didn’t need to pass a list.

After dropping C:-


Unnamed: 0,A,B,D
0,a1,b1,d1
1,a2,b2,d2
2,a3,b3,d3


**Example 2: Drop multiple columns by name**

In [119]:
import pandas as pd

# create a sample dataframe
data = {
    'A': ['a1', 'a2', 'a3'],
    'B': ['b1', 'b2', 'b3'],
    'C': ['c1', 'c2', 'c3'],
    'D': ['d1', 'd2', 'd3']
}

df = pd.DataFrame(data)

# print the dataframe

print("Original Dataframe:-")

df

Original Dataframe:-


Unnamed: 0,A,B,C,D
0,a1,b1,c1,d1
1,a2,b2,c2,d2
2,a3,b3,c3,d3


In [120]:
# remove columns C and D

df = df.drop(['C', 'D'], axis=1)

print("After dropping columns C and D:-")

df

# In the above example, the columns C and D are dropped from the dataframe df. 
# Note that we had to provide the list of column names to drop since we were dropping multiple columns together.

After dropping columns C and D:-


Unnamed: 0,A,B
0,a1,b1
1,a2,b2
2,a3,b3


**Drop columns by index**

- To drop columns by column number, pass df.columns[i] to the drop() function 
  where i is the column index of the column you want to drop. 
<br>

- To drop multiple columns by their indices pass df.columns[[i, j, k]]
  where i, j, k are the column indices of the columns you want to drop.

**Example 3:- Drop a single column by index**

In [121]:
import pandas as pd

# create a sample dataframe

data = {
    'A': ['a1', 'a2', 'a3'],
    'B': ['b1', 'b2', 'b3'],
    'C': ['c1', 'c2', 'c3'],
    'D': ['d1', 'd2', 'd3']
}

df = pd.DataFrame(data)

print("Original Dataframe:-")

df

Original Dataframe:-


Unnamed: 0,A,B,C,D
0,a1,b1,c1,d1
1,a2,b2,c2,d2
2,a3,b3,c3,d3


In [122]:
# remove column by index

df = df.drop(df.columns[2], axis=1)

print("After dropping C:-")

df

# In the above example, the column C is dropped using its index 2 from the dataframe df.

After dropping C:-


Unnamed: 0,A,B,D
0,a1,b1,d1
1,a2,b2,d2
2,a3,b3,d3


**Example 4:- Drop multiple columns with their index**

In [124]:
import pandas as pd

data = {
    'A': ['a1', 'a2', 'a3'],
    'B': ['b1', 'b2', 'b3'],
    'C': ['c1', 'c2', 'c3'],
    'D': ['d1', 'd2', 'd3']
}

df = pd.DataFrame(data)

print("Original Dataframe:-")

df

Original Dataframe:-


Unnamed: 0,A,B,C,D
0,a1,b1,c1,d1
1,a2,b2,c2,d2
2,a3,b3,c3,d3


In [125]:
# remove columns C and D

df = df.drop(df.columns[[2, 3]], axis=1)

print("After dropping columns C and D:-")

df

# In the above example, columns with index 2 and 3 are dropped from the dataframe df.

After dropping columns C and D:-


Unnamed: 0,A,B
0,a1,b1
1,a2,b2
2,a3,b3


#### 12. Reset Index 

- Pandas dataframes are quite powerful for manipulating data.<br>
  Often we may require to re-index our dataframe to the default index.

 **Why reset the index of a dataframe?**
 - As we apply common operations such as filtering, removing NaNs, etc,<br>
   it may happen that you end up with possibly a smaller dataframe whose indices are not continuous.

![image.png](attachment:image.png)

- Having a dataframe with non-continuous indices may not be an issue by itself<br>
  but performing operations on such dataframes along with dataframes with continuous or<br>
  a different index scheme could give some unintended results.

 **syntax:-** df.reset_index()

- The above function returns a copy of your dataframe with its old index 
  as a new column and having a continuous integer index from 0.
<br>
- **Pass drop=True** to the above function if you don’t want the old index as a new column in your dataframe.
<br>

- **Pass inplace=True** if you want to modify the dataframe in-place.

**Example 1: reset_index() with default parameters**

In [127]:
import pandas as pd

data = {
    'Name': ['Sam', 'Tim', 'Rahul', 'Emma', 'Kyle'],
    'Age': [14, 21, 16, 18, 23],
    'Country': ['UK', 'India', 'USA', 'Germany', 'France'],
    'Language': ['English', 'Hindi', 'English', 'German', 'French']
}

df = pd.DataFrame(data, index=[1,3,5,7,9])

print("Before reset index:-")

df

Before reset index:-


Unnamed: 0,Name,Age,Country,Language
1,Sam,14,UK,English
3,Tim,21,India,Hindi
5,Rahul,16,USA,English
7,Emma,18,Germany,German
9,Kyle,23,France,French


In [128]:
# reset the index with default parameters

df = df.reset_index()

print("After reset index:-")

df

# In the above example, we reset the index of the dataframe df using reset_index() with default parameters.
# We can see that a new column by the name index has been created storing the values of the old index 
# and the dataframe’s index is reset to continuous integers from 0.

After reset index:-


Unnamed: 0,index,Name,Age,Country,Language
0,1,Sam,14,UK,English
1,3,Tim,21,India,Hindi
2,5,Rahul,16,USA,English
3,7,Emma,18,Germany,German
4,9,Kyle,23,France,French


**Example 2: With drop=True parameter**

In [130]:
import pandas as pd

data = {
    'Name': ['Sam', 'Tim', 'Rahul', 'Emma', 'Kyle'],
    'Age': [14, 21, 16, 18, 23],
    'Country': ['UK', 'India', 'USA', 'Germany', 'France'],
    'Language': ['English', 'Hindi', 'English', 'German', 'French']
}

df = pd.DataFrame(data, index=[1,3,5,7,9])

print("Before reset index:-")

df

Before reset index:-


Unnamed: 0,Name,Age,Country,Language
1,Sam,14,UK,English
3,Tim,21,India,Hindi
5,Rahul,16,USA,English
7,Emma,18,Germany,German
9,Kyle,23,France,French


In [131]:
# reset the index with drop=True

df = df.reset_index(drop=True)

print("After reset index:-")

df

# We can see that using the drop=True gave a dataframe without the old index as an additional column.

After reset index:-


Unnamed: 0,Name,Age,Country,Language
0,Sam,14,UK,English
1,Tim,21,India,Hindi
2,Rahul,16,USA,English
3,Emma,18,Germany,German
4,Kyle,23,France,French


#### 13. min()

- Get min value in one or more columns.
<br>
- Pandas dataframe.min() function returns the minimum of the values in the given object.
<br>
- If the input is a series, the method will return a scalar which will be the minimum of the values in the series.
<br>
- If the input is a dataframe, then the method will return a series with minimum of values
  over the specified axis in the dataframe.
<br>

- By default the axis is the index axis.
<br>

![image.png](attachment:image.png)

In [2]:
import numpy as np
import pandas as pd

# create a pandas dataframe

df = pd.DataFrame({
    'Name': ['Neeraj Chopra', 'Jakub Vadlejch', 'Vitezslav Vesely', 'Julian Weber', 'Arshad Nadeem'],
    'Country': ['India', 'Czech Republic', 'Czech Republic', 'Germany', 'Pakistan'],
    'Attempt1': [87.03, 83.98, 79.79, 85.30, 82.40],
    'Attempt2': [87.58, np.nan, 80.30, 77.90, np.nan],
    'Attempt3': [76.79, np.nan, 85.44, 78.00, 84.62],
    'Attempt4': [np.nan, 82.86, np.nan, 83.10, 82.91],
    'Attempt5': [np.nan, 86.67, 84.98, 85.15, 81.98],
    'Attempt6': [84.24, np.nan, np.nan, 75.72, np.nan]
})

# display the dataframe
df

Unnamed: 0,Name,Country,Attempt1,Attempt2,Attempt3,Attempt4,Attempt5,Attempt6
0,Neeraj Chopra,India,87.03,87.58,76.79,,,84.24
1,Jakub Vadlejch,Czech Republic,83.98,,,82.86,86.67,
2,Vitezslav Vesely,Czech Republic,79.79,80.3,85.44,,84.98,
3,Julian Weber,Germany,85.3,77.9,78.0,83.1,85.15,75.72
4,Arshad Nadeem,Pakistan,82.4,,84.62,82.91,81.98,


**Example 1:- Min value in a single pandas column**

In [5]:
# To get the minimum value in a pandas column, use the min() function as follows. 

# For example, let’s get the minimum distance the javelin was thrown in the first attempt.

# min value in Attempt1

print("Minimum value in column 'Attempt1:-", df['Attempt1'].min())

Minimum value in column 'Attempt1:- 79.79


**Example 2:- To get the index corresponding to the min value with the pandas idxmin() function.**

In [8]:
# Note that we can get the index corresponding to the min value with the pandas idxmin() function. 

# Let’s get the name of the athlete who threw the shortest in the first attempt with this index.

# index corresponding min value

i = df['Attempt1'].idxmin()

print("Index of Minimum value in column 'Attempt1:-",i)

print()

# display the name corresponding this index

print("Name of the athelete who has minimum vaiue :-",df['Name'][i])

Index of Minimum value in column 'Attempt1:- 2

Name of the athelete who has minimum vaiue :- Vitezslav Vesely


**Example 3:- Min value in two pandas columns**

In [10]:
# We can also get the min value of multiple pandas columns with the pandas min() function. 
# For example, let’s find the minimum values in “Attempt1” and “Attempt2” respectively.

# get min values in columns "Attempt1" and "Attempt2"

print("Minimum value of multple columns:-\n\n", df[['Attempt1', 'Attempt2']].min())

Minimum value of multple columns:-

 Attempt1    79.79
Attempt2    77.90
dtype: float64


**Example 4:- Min value for each column in the dataframe**

In [14]:
# Similarly, we can get the min value for each column in the dataframe. 

# Apply the min() function over the entire dataframe instead of a single column or a selection of columns.

# get min values in each column of the dataframe

print(df.min())

# We get the minimum values in each column of the dataframe df.

# Note that we also get min values for text columns based on their string comparisons in python.

Name         Arshad Nadeem
Country     Czech Republic
Attempt1             79.79
Attempt2              77.9
Attempt3             76.79
Attempt4             82.86
Attempt5             81.98
Attempt6             75.72
dtype: object


**Example 5:- Min value for only numerical columns**

In [13]:
# If you only want the min values for all the numerical columns in the dataframe, 
# pass numeric_only=True to the min() function.

print(df.min(numeric_only=True))

Attempt1    79.79
Attempt2    77.90
Attempt3    76.79
Attempt4    82.86
Attempt5    81.98
Attempt6    75.72
dtype: float64


**Example 6:- Min value between two pandas columns**

In [20]:
# What if you want to get the minimum value between two columns?

# We can do so by using the pandas min() function twice. 

# For example, let’s get the minimum value considering both “Attempt1” and “Attempt2”.

# min value over two columns

print(df[['Attempt1', 'Attempt2']].min())

print()

print("Minimum values between two columns:-", df[['Attempt1', 'Attempt2']].min().min())

Attempt1    79.79
Attempt2    77.90
dtype: float64

Minimum values between two columns:- 77.9


**Example 7:- Min value in the entire dataframe**

In [24]:
# We can also get the single smallest value in the entire dataframe. 

# For example, let’s get the smallest value in the dataframe df irrespective of the column.

# min value over the entire dataframe

print(df.min(numeric_only=True))

print()

print("Minimum value of entire dataframe:-", df.min(numeric_only=True).min())

Attempt1    79.79
Attempt2    77.90
Attempt3    76.79
Attempt4    82.86
Attempt5    81.98
Attempt6    75.72
dtype: float64

Minimum value of entire dataframe:- 75.72


#### 14. max()

- To get the max value in one or more columns of a pandas dataframe.
<br>
- The max() function is used to get the maximum of the values for the requested axis.
<br>
- Pandas dataframe.max() method finds the maximum of the values in the object and returns it.
<br>
- If the input is a series, the method will return a scalar which will be the maximum of the values in the series.
<br>
- If the input is a Dataframe, then the method will return a series with a maximum of values over the specified axis
  in the Dataframe. 
 <br>
 - The index axis is the default axis taken by this method.
<br>

- If we want the index of the maximum, use idxmax.
<br>

![image.png](attachment:image.png)

In [25]:
import numpy as np
import pandas as pd

# create a pandas dataframe

df = pd.DataFrame({
    'Name': ['Neeraj Chopra', 'Jakub Vadlejch', 'Vitezslav Vesely', 'Julian Weber', 'Arshad Nadeem'],
    'Country': ['India', 'Czech Republic', 'Czech Republic', 'Germany', 'Pakistan'],
    'Attempt1': [87.03, 83.98, 79.79, 85.30, 82.40],
    'Attempt2': [87.58, np.nan, 80.30, 77.90, np.nan],
    'Attempt3': [76.79, np.nan, 85.44, 78.00, 84.62],
    'Attempt4': [np.nan, 82.86, np.nan, 83.10, 82.91],
    'Attempt5': [np.nan, 86.67, 84.98, 85.15, 81.98],
    'Attempt6': [84.24, np.nan, np.nan, 75.72, np.nan]
})

# display the dataframe
df

Unnamed: 0,Name,Country,Attempt1,Attempt2,Attempt3,Attempt4,Attempt5,Attempt6
0,Neeraj Chopra,India,87.03,87.58,76.79,,,84.24
1,Jakub Vadlejch,Czech Republic,83.98,,,82.86,86.67,
2,Vitezslav Vesely,Czech Republic,79.79,80.3,85.44,,84.98,
3,Julian Weber,Germany,85.3,77.9,78.0,83.1,85.15,75.72
4,Arshad Nadeem,Pakistan,82.4,,84.62,82.91,81.98,


**Example 1:- Maximum value in a single pandas column**

In [27]:
# To get the maximum value in a pandas column, use the max() function as follows. 

# For example, let’s get the maximum value achieved in the first attempt.

# max value in Attempt1

print("Minimum value in column 'Attempt1:-", df['Attempt1'].max())

Minimum value in column 'Attempt1:- 87.03


**Example 2:- To get the index corresponding to the max value with the pandas idxmin() function.**

In [28]:
# Note that you can get the index corresponding to the max value with the pandas idxmax() function.

# Let’s get the name of the athlete who threw the longest in the first attempt with this index.

# index corresponding max value

i = df['Attempt1'].idxmax()

print("Index of Maximum value in column 'Attempt1:-",i)

print()

# display the name corresponding this index

print("Name of the athelete who has maximum vaiue :-",df['Name'][i])

Index of Maximum value in column 'Attempt1:- 0

Name of the athelete who has maximum vaiue :- Neeraj Chopra


**Example 3:- Maximum value in two pandas columns**

In [30]:
# We can also get the max value of multiple pandas columns with the pandas min() function. 

# For example, let’s find the maximum values in “Attempt1” and “Attempt2” respectively.

# get max values in columns "Attempt1" and "Attempt2"

print("Maximum value of multple columns:-\n\n", df[['Attempt1', 'Attempt2']].max())

Maximum value of multple columns:-

 Attempt1    87.03
Attempt2    87.58
dtype: float64


**Example 4:- Maximum value for each column in the dataframe**

In [32]:
# Similarly, we can get the max value for each column in the dataframe. 
# Apply the max function over the entire dataframe instead of a single column or a selection of columns.

# get max values in each column of the dataframe

print(df.max())

# We get the maximum values in each column of the dataframe df. 
# Note that we also get max values for text columns based on their string comparisons in python.

Name        Vitezslav Vesely
Country             Pakistan
Attempt1               87.03
Attempt2               87.58
Attempt3               85.44
Attempt4                83.1
Attempt5               86.67
Attempt6               84.24
dtype: object


**Example 5:- Maximum value for only numerical columns**

In [33]:
# If we only want the max values for all the numerical columns in the dataframe, 
# pass numeric_only=True to the max() function.

# get max values of only numerical columns

print(df.max(numeric_only=True))

Attempt1    87.03
Attempt2    87.58
Attempt3    85.44
Attempt4    83.10
Attempt5    86.67
Attempt6    84.24
dtype: float64


**Example 6:- Maximum value between two pandas columns**

In [35]:
# What if we want to get the maximum value between two columns?
# we can do so by using the pandas max() function twice. 

# For example, let’s get the maximum value considering both “Attempt1” and “Attempt2”.

print(df[['Attempt1', 'Attempt2']].max())

print()

print("Maximum values between two columns:-", df[['Attempt1', 'Attempt2']].max().max())

Attempt1    87.03
Attempt2    87.58
dtype: float64

Maximum values between two columns:- 87.58


**Example 7:- Maximum value in the entire dataframe**

In [36]:
# we can also get the single biggest value in the entire dataframe. 

# For example, let’s get the biggest value in the dataframe df irrespective of the column.

print(df.max(numeric_only=True))

print()

print("Maximum value of entire dataframe:-", df.max(numeric_only=True).max())

Attempt1    87.03
Attempt2    87.58
Attempt3    85.44
Attempt4    83.10
Attempt5    86.67
Attempt6    84.24
dtype: float64

Maximum value of entire dataframe:- 87.58
