## Sorting Data

* sort by a column: `dataframe.sort_values(by="column name")`
* sort by a column in descending order: `dataframe.sort_values(by="column name", ascending=False)`
* sort by a column, (if there are duplicates) then sort by another column in descending order: `dataframe.sort_values(by=["column name_1", 'column_name_2'], ascending=False)`
* sort by a column in descending order, (if there are duplicates) then sort by another column in ascending order: `dataframe.sort_values(by=["column name_1", 'column_name_2'], ascending=[False, True])`
* to change the original dataframe we should pass in "inplace=True"
* to turn back to original index: `dataframe.sort_index`

In [4]:
import pandas as pd

In [12]:
people = {'first': ['Ali','Veli','Cem', "David", "Elif", "Fahri"], 
         'last': ['Bir','Yedi','Sekiz', "Yedi", "Bir", "Sekiz"],
         'email': ['birali@pg.com','yediveli@pg.com','sekizcem@pg.com', 'ydidavid@pg.com', "birelif@pg.com", 'sekizfahri@pg.com']}

In [13]:
df = pd.DataFrame(people)
df

Unnamed: 0,first,last,email
0,Ali,Bir,birali@pg.com
1,Veli,Yedi,yediveli@pg.com
2,Cem,Sekiz,sekizcem@pg.com
3,David,Yedi,ydidavid@pg.com
4,Elif,Bir,birelif@pg.com
5,Fahri,Sekiz,sekizfahri@pg.com


In [14]:
# sort by a column: `dataframe.sort_values(by="column name")`

df.sort_values(by="last")

Unnamed: 0,first,last,email
0,Ali,Bir,birali@pg.com
4,Elif,Bir,birelif@pg.com
2,Cem,Sekiz,sekizcem@pg.com
5,Fahri,Sekiz,sekizfahri@pg.com
1,Veli,Yedi,yediveli@pg.com
3,David,Yedi,ydidavid@pg.com


In [15]:
# sort by a column in descending order: `dataframe.sort_values(by="column name", ascending=True)`

df.sort_values(by="last", ascending=False)

Unnamed: 0,first,last,email
1,Veli,Yedi,yediveli@pg.com
3,David,Yedi,ydidavid@pg.com
2,Cem,Sekiz,sekizcem@pg.com
5,Fahri,Sekiz,sekizfahri@pg.com
0,Ali,Bir,birali@pg.com
4,Elif,Bir,birelif@pg.com


In [16]:
# sort by a column, if there are duplicates then sort by another column in descending order:
# `dataframe.sort_values(by=["column name_1", 'column_name_2'], ascending=True)`

df.sort_values(by=["last", "first"], ascending=False)

Unnamed: 0,first,last,email
1,Veli,Yedi,yediveli@pg.com
3,David,Yedi,ydidavid@pg.com
5,Fahri,Sekiz,sekizfahri@pg.com
2,Cem,Sekiz,sekizcem@pg.com
4,Elif,Bir,birelif@pg.com
0,Ali,Bir,birali@pg.com


In [17]:
# sort by a column in descending order, (if there are duplicates) then sort by another column in ascending order:
# `dataframe.sort_values(by=["column name_1", 'column_name_2'], ascending=[False, True])`

df.sort_values(by=["last", "first"], ascending=[False, True])

Unnamed: 0,first,last,email
3,David,Yedi,ydidavid@pg.com
1,Veli,Yedi,yediveli@pg.com
2,Cem,Sekiz,sekizcem@pg.com
5,Fahri,Sekiz,sekizfahri@pg.com
0,Ali,Bir,birali@pg.com
4,Elif,Bir,birelif@pg.com


In [18]:
# to change the original dataframe we should pass in "inplace=True"

df.sort_values(by=["last", "first"], ascending=[False, True], inplace=True)
df

Unnamed: 0,first,last,email
3,David,Yedi,ydidavid@pg.com
1,Veli,Yedi,yediveli@pg.com
2,Cem,Sekiz,sekizcem@pg.com
5,Fahri,Sekiz,sekizfahri@pg.com
0,Ali,Bir,birali@pg.com
4,Elif,Bir,birelif@pg.com


In [20]:
# to turn back to original index: `dataframe.sort_index`
df.sort_index(inplace=True)
df

Unnamed: 0,first,last,email
0,Ali,Bir,birali@pg.com
1,Veli,Yedi,yediveli@pg.com
2,Cem,Sekiz,sekizcem@pg.com
3,David,Yedi,ydidavid@pg.com
4,Elif,Bir,birelif@pg.com
5,Fahri,Sekiz,sekizfahri@pg.com


## The largest and smallest values in a DataFrame

* `series.nlargest(10)`: gives us the 10 largest value in the series
* `dataframe.nlargest(10, "column_name")`: gives us the all columns for 10 largest value of specified column.
* We can also use `nsmallest(int)` for the smallest values like we do in getting largest values.

In [24]:
dev_salaries = pd.read_csv("Data/dev_salaries.csv")  
dev_salaries.head()

Unnamed: 0,Age,All_Devs,Python,JavaScript
0,18,17784,20046,16446
1,19,16500,17100,16791
2,20,18012,20000,18942
3,21,20628,24744,21780
4,22,25206,30500,25704


In [25]:
# `series.nlargest(10)`

dev_salaries["Python"].nlargest(8)

36    122870
37    120000
35    112542
30    112285
33    108423
32    104708
29    102736
34    101407
Name: Python, dtype: int64

In [27]:
# `dataframe.nsmallest(10, "column_name")`

dev_salaries.nsmallest(8, 'Python')

Unnamed: 0,Age,All_Devs,Python,JavaScript
1,19,16500,17100,16791
2,20,18012,20000,18942
0,18,17784,20046,16446
3,21,20628,24744,21780
4,22,25206,30500,25704
5,23,30252,37732,29000
6,24,34368,41247,34372
11,29,53200,45000,53437


Source:
* [Corey Schafer - Python Pandas Tutorial](https://www.youtube.com/watch?v=ZyhVh-qRZPA&list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS&index=1)