# **Guided Lab 343.3.10 - Sorting a Pandas DataFrame**

## **Learning Objective:**

In this lab, you will demonstrate how to sort data(column) in a Pandas DataFrame using the **`.sort_values()`** function.

By the end of this lesson, learner will be able to:
-Explain the concept of sorting data in a Pandas DataFrame.
Utilize the .sort_values() function to sort a DataFrame by one or more columns.
- Specify the sorting order (ascending or descending) using the ascending parameter.
- Sort a DataFrame by multiple columns with different sorting orders.
- Apply sorting techniques to real-world data scenarios.

# **Example 1**

You can use **.sort_values()** function to sort values in a DataFrame along either axis (columns or rows). Typically, you want to sort the rows in a DataFrame by the values of one or more columns:

In [1]:
import pandas as pd

In [7]:
# Define a dictionary containing Students data
data = {'Name': ['Jane', 'Princi', 'James', 'Fadi', 'Byers'],
        'Height': [5.1, 6.2, 5.1, 5.2, 5.5],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc', ''],
       'Score 1' : [56, 86, 77, 45, None],
       'Score 2' : [50, 96, 60, 30, None]}
print("------before -------")
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
print(df)
print("------after adding column -------")
# using DataFrame.assign() method adding 'Address' column and equating it to the list
df = df.assign(address = ['NYC', 'NJ', 'CA', 'PA', ''])
print(df)


------before -------
     Name  Height Qualification  Score 1  Score 2
0    Jane     5.1           Msc     56.0     50.0
1  Princi     6.2            MA     86.0     96.0
2   James     5.1           Msc     77.0     60.0
3    Fadi     5.2           Msc     45.0     30.0
4   Byers     5.5                    NaN      NaN
------after adding column -------
     Name  Height Qualification  Score 1  Score 2 address
0    Jane     5.1           Msc     56.0     50.0     NYC
1  Princi     6.2            MA     86.0     96.0      NJ
2   James     5.1           Msc     77.0     60.0      CA
3    Fadi     5.2           Msc     45.0     30.0      PA
4   Byers     5.5                    NaN      NaN        


In [None]:
print("------after sorting -------")
print(df.sort_values(by='Score 1', ascending=False))

# Use na_position to put NaN values first or last
print("------more sorting -------")
print(df.sort_values(by='Score 1', ascending=False, na_position='first'))


------after sorting -------
     Name  Height Qualification  Score 1  Score 2 address
1  Princi     6.2            MA     86.0     96.0      NJ
2   James     5.1           Msc     77.0     60.0      CA
0    Jane     5.1           Msc     56.0     50.0     NYC
3    Fadi     5.2           Msc     45.0     30.0      PA
4   Byers     5.5                    NaN      NaN        
------more sorting -------
     Name  Height Qualification  Score 1  Score 2 address
4   Byers     5.5                    NaN      NaN        
1  Princi     6.2            MA     86.0     96.0      NJ
2   James     5.1           Msc     77.0     60.0      CA
0    Jane     5.1           Msc     56.0     50.0     NYC
3    Fadi     5.2           Msc     45.0     30.0      PA


### **Sort by Two Columns: If you want to sort by multiple columns, then just pass lists as arguments for by and ascending as shown below.**


In [None]:
print (df.sort_values(by=['Score 1', 'Height'], ascending=[False, True]))
print()
#                                    Only the first False or True executes but both are needed to satisfy args
print (df.sort_values(by=['Score 1', 'Height'], ascending=[False, False]))


     Name  Height Qualification  Score 1  Score 2 address
1  Princi     6.2            MA     86.0     96.0      NJ
2   James     5.1           Msc     77.0     60.0      CA
0    Jane     5.1           Msc     56.0     50.0     NYC
3    Fadi     5.2           Msc     45.0     30.0      PA
4   Byers     5.5                    NaN      NaN        

     Name  Height Qualification  Score 1  Score 2 address
1  Princi     6.2            MA     86.0     96.0      NJ
2   James     5.1           Msc     77.0     60.0      CA
0    Jane     5.1           Msc     56.0     50.0     NYC
3    Fadi     5.2           Msc     45.0     30.0      PA
4   Byers     5.5                    NaN      NaN        




---



---



# **Example 2**

In [19]:
df_cars = pd.read_json('./Data/cars.json')

print("------before -------")

df_cars

------before -------


Unnamed: 0,Car,MPG,Cylinders,Displacement,Horsepower,Weight,Acceleration,Model,Origin,quantity,city
0,Chevrolet Vega,25.0,4,140.0,75,2542,17.0,74,US,177,NJ
1,Chevrolet Vega (sw),22.0,4,140.0,72,2408,19.0,71,US,91,DALLAS
2,Chevrolet Vega 2300,28.0,4,140.0,90,2264,15.5,71,US,74,TEXAS
3,Chevrolet Woody,24.5,4,98.0,60,2164,22.1,76,US,241,OH
4,Chevrolete Chevelle Malibu,16.0,6,250.0,105,3897,18.5,75,US,206,NewYork
...,...,...,...,...,...,...,...,...,...,...,...
156,Mercury Capri v6,21.0,6,155.0,107,2472,14.0,73,US,158,NewYork
157,Mercury Cougar Brougham,15.0,8,302.0,130,4295,14.9,77,US,27,NJ
158,Mercury Grand Marquis,16.5,8,351.0,138,3955,13.2,79,US,332,DALLAS
159,Mercury Lynx l,36.0,4,98.0,70,2125,17.3,82,US,425,TEXAS


In [21]:
print("------after sorting column -------")
df_cars.sort_values(by=['quantity'], ascending=[True])

------after sorting column -------


Unnamed: 0,Car,MPG,Cylinders,Displacement,Horsepower,Weight,Acceleration,Model,Origin,quantity,city
123,Ford Torino,17.0,8,302.0,140,3449,10.5,70,US,5,OH
112,Ford Mustang II 2+2,25.5,4,140.0,89,2755,15.8,77,US,5,NJ
6,Chevy S-10,31.0,4,119.0,82,2720,19.4,82,US,7,NewYork
7,Chrysler Cordoba,15.5,8,400.0,190,4325,12.2,77,US,7,NJ
121,Ford Ranger,28.0,4,120.0,79,2625,18.6,82,US,7,DALLAS
...,...,...,...,...,...,...,...,...,...,...,...
128,Honda Accord,36.0,4,107.0,75,2205,14.5,82,Japan,427,OH
134,Honda Civic (auto),32.0,4,91.0,67,1965,15.7,82,Japan,430,TEXAS
22,Datsun 310 GX,38.0,4,91.0,67,1995,16.2,82,Japan,431,NJ
8,Chrysler Lebaron Medallion,26.0,4,156.0,92,2585,14.5,82,US,434,DALLAS


In [22]:
df_cars.sort_values(by=['quantity','Car'], ascending=[False,True])

Unnamed: 0,Car,MPG,Cylinders,Displacement,Horsepower,Weight,Acceleration,Model,Origin,quantity,city
110,Ford Mustang GL,27.0,4,140.0,86,2790,15.6,82,US,439,OH
8,Chrysler Lebaron Medallion,26.0,4,156.0,92,2585,14.5,82,US,434,DALLAS
22,Datsun 310 GX,38.0,4,91.0,67,1995,16.2,82,Japan,431,NJ
134,Honda Civic (auto),32.0,4,91.0,67,1965,15.7,82,Japan,430,TEXAS
128,Honda Accord,36.0,4,107.0,75,2205,14.5,82,Japan,427,OH
...,...,...,...,...,...,...,...,...,...,...,...
121,Ford Ranger,28.0,4,120.0,79,2625,18.6,82,US,7,DALLAS
130,Honda Accord LX,29.5,4,98.0,68,2135,16.6,78,Japan,7,NJ
147,Mazda GLC Deluxe,34.1,4,86.0,65,1975,15.2,79,Japan,7,NJ
112,Ford Mustang II 2+2,25.5,4,140.0,89,2755,15.8,77,US,5,NJ


In [23]:
student_dict = {'Name': ['Joe', 'Nat', 'Harry'], 'Age': [20, 21, 19], 'Marks': [85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# set index using column
student_df = student_df.set_index('Name')
print(student_df)


    Name  Age  Marks
0    Joe   20  85.10
1    Nat   21  77.80
2  Harry   19  91.54
       Age  Marks
Name             
Joe     20  85.10
Nat     21  77.80
Harry   19  91.54
