# Select and Sort Columns in a DataFrame

In this example, we will see how to

1. select specific columns;
2. sort on specific columns.

In [1]:
from scripts import query
import pandas as pd

In [2]:
# import the tracks table
chinook = query.Database(db="data/chinook.db")
tbl = query.Table(table="tracks", db=chinook)
df_tracks = tbl.toDF()

In [3]:
df_tracks.head()

Unnamed: 0,TrackId,Name,AlbumId,MediaTypeId,GenreId,Composer,Milliseconds,Bytes,UnitPrice
0,1,For Those About To Rock (We Salute You),1,1,1,"Angus Young, Malcolm Young, Brian Johnson",343719,11170334,0.99
1,2,Balls to the Wall,2,2,1,,342562,5510424,0.99
2,3,Fast As a Shark,3,2,1,"F. Baltes, S. Kaufman, U. Dirkscneider & W. Ho...",230619,3990994,0.99
3,4,Restless and Wild,3,2,1,"F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. D...",252051,4331779,0.99
4,5,Princess of the Dawn,3,2,1,Deaffy & R.A. Smith-Diesel,375418,6290521,0.99


## Selecting columns

~~~pathon
df[["col1", "col2", ...]]
~~~

Note the column names are case sensitive.

**Example 1.** View only the: `Trackid`, `Name`, `Composer`, and `UnitPrice` columns.

In [5]:
df_tracks[["TrackId", "Name", "Composer", "UnitPrice"]].head()

Unnamed: 0,TrackId,Name,Composer,UnitPrice
0,1,For Those About To Rock (We Salute You),"Angus Young, Malcolm Young, Brian Johnson",0.99
1,2,Balls to the Wall,,0.99
2,3,Fast As a Shark,"F. Baltes, S. Kaufman, U. Dirkscneider & W. Ho...",0.99
3,4,Restless and Wild,"F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. D...",0.99
4,5,Princess of the Dawn,Deaffy & R.A. Smith-Diesel,0.99


## Sorting columns

~~~python
DataFrame.sort_values(by, axis=0, ascending=True,
                      inplace=False, kind='quicksort',
                      na_position='last', ignore_index=False, key=None)
~~~

**Example 2.** Select the `Name`, `Milliseconds`, `Composer` and `AlbumId` columns, and sort on `AlbumId`.

In [15]:
df_sorted = df_tracks[["Name", "Milliseconds", "Composer", "AlbumId"]]

In [9]:
# sort by default ascending
df_sorted.sort_values(by="AlbumId").head()

Unnamed: 0,Name,Milliseconds,AlbumId
0,For Those About To Rock (We Salute You),343719,1
13,Spellbound,270863,1
12,Night Of The Long Knives,205688,1
11,Breaking The Rules,263288,1
10,C.O.D.,199836,1


In [10]:
# sort descending
df_sorted.sort_values(by="AlbumId", ascending=False).head()

Unnamed: 0,Name,Milliseconds,AlbumId
3502,Koyaanisqatsi,206005,347
3501,"Quintet for Horn, Violin, 2 Violas, and Cello ...",221331,346
3500,"L'orfeo, Act 3, Sinfonia (Orchestra)",66639,345
3499,"String Quartet No. 12 in C Minor, D. 703 ""Quar...",139200,344
3498,Pini Di Roma (Pinien Von Rom) \ I Pini Della V...,286741,343


**Example 3.** Multisort on `AlbumId` and `Milliseconds`.

In [12]:
# sort both Asc
df_sorted.sort_values(by=["AlbumId", "Milliseconds"]).head()

Unnamed: 0,Name,Milliseconds,AlbumId
10,C.O.D.,199836,1
8,Snowballed,203102,1
5,Put The Finger On You,205662,1
12,Night Of The Long Knives,205688,1
7,Inject The Venom,210834,1


In [13]:
# sort Asc, then desc
df_sorted.sort_values(by=["AlbumId", "Milliseconds"], ascending=[True, False]).head()

Unnamed: 0,Name,Milliseconds,AlbumId
0,For Those About To Rock (We Salute You),343719,1
13,Spellbound,270863,1
9,Evil Walks,263497,1
11,Breaking The Rules,263288,1
6,Let's Get It Up,233926,1


Note, if multisort does not have both columns ascending, then `ascending=[]` must be a list of equal length of `by=[]`.

**Example 4.** Sort on `Composer` with `None` values first.

Note, `na_position="last"` is default.

In [16]:
df_sorted.sort_values(by="Composer", na_position="first").head()

Unnamed: 0,Name,Milliseconds,Composer,AlbumId
1,Balls to the Wall,342562,,2
62,Desafinado,185338,,8
63,Garota De Ipanema,285048,,8
64,Samba De Uma Nota Só (One Note Samba),137273,,8
65,Por Causa De Você,169900,,8
