# The DataFrame object

The pandas <code>DataFrame</code> is a two-dimensional table of data with rows and columns.

The <code>DataFrame</code> is two-dimensional because it requires two points of reference—a row and a column—to isolate a value from the data set.

## Creating a DataFrame from a dictionary

The next example passes a dictionary of string keys and list values. Pandas returns a DataFrame with three columns.

In [1]:
import pandas as pd
import numpy as np

city_data = {
            "City": ["New York City", "Paris", "Barcelona", "Rome"],
            "Country": ["United States", "France", "Spain", "Italy"],
            "Population": [8600000, 2141000, 5515000, 2873000]
}
cities = pd.DataFrame(city_data)
cities

Unnamed: 0,City,Country,Population
0,New York City,United States,8600000
1,Paris,France,2141000
2,Barcelona,Spain,5515000
3,Rome,Italy,2873000


## Creating a DataFrame from a NumPy ndarray

The <code>DataFrame</code> constructor’s <code>data</code> parameter also accepts a NumPy <code>ndarray</code> . 

We can generate an <code>ndarray</code> of any size with the <code>randint</code> function in NumPy’s <code>random</code> module. 

The next example creates a 3 x 5 ndarray of integers between 1 and 101:

In [2]:
random_data = np.random.randint(1, 101, [3, 5])
random_data

array([[ 85,  68,  41,  58,  64],
       [ 40,  61,  35,  75,   2],
       [ 20,  99,  69, 100,  36]])

Let’s pass our <code>ndarray</code> into the <code>DataFrame</code> constructor. 

The <code>ndarray</code> has neither row labels nor column labels. Thus, pandas uses a numeric index for both the
row axis and column axis:

In [3]:
pd.DataFrame(data = random_data)

Unnamed: 0,0,1,2,3,4
0,85,68,41,58,64
1,40,61,35,75,2
2,20,99,69,100,36


We can manually set the row labels with the <code>DataFrame</code> constructor’s <code>index</code> parameter, which accepts any iterable object, including a list, tuple, or <code>ndarray</code> .

In [4]:
row_labels = ["Morning", "Afternoon", "Evening"]
temperatures = pd.DataFrame(data = random_data, index = row_labels)
temperatures

Unnamed: 0,0,1,2,3,4
Morning,85,68,41,58,64
Afternoon,40,61,35,75,2
Evening,20,99,69,100,36


We can set the column names with the constructor’s <code>columns</code> parameter. The <code>ndarray</code> includes five columns, so we must pass an iterable with five items.

In [9]:
row_labels = ["Morning", "Afternoon", "Evening"]

column_labels = ("Monday","Tuesday","Wednesday","Thursday","Friday")

pd.DataFrame(
            data = random_data,
            index = row_labels,
            columns = column_labels,
)

Unnamed: 0,Monday,Tuesday,Wednesday,Thursday,Friday
Morning,85,68,41,58,64
Afternoon,40,61,35,75,2
Evening,20,99,69,100,36



## National Basketball Association (NBA) data

In [10]:
pd.read_csv("../data/nba.csv")

Unnamed: 0,Name,Team,Position,Birthday,Salary
0,Shake Milton,Philadelphia 76ers,SG,9/26/96,1445697
1,Christian Wood,Detroit Pistons,PF,9/27/95,1645357
2,PJ Washington,Charlotte Hornets,PF,8/23/98,3831840
3,Derrick Rose,Detroit Pistons,PG,10/4/88,7317074
4,Marial Shayok,Philadelphia 76ers,G,7/26/95,79568
...,...,...,...,...,...
445,Austin Rivers,Houston Rockets,PG,8/1/92,2174310
446,Harry Giles,Sacramento Kings,PF,4/22/98,2578800
447,Robin Lopez,Milwaukee Bucks,C,4/1/88,4767000
448,Collin Sexton,Cleveland Cavaliers,PG,1/4/99,4764960


Pandas imports the Birthday column values as strings rather than as datetimes, limiting the number of operations we can perform on them. We can use the <code>parse_dates</code> parameter to coerce the values into datetimes:

In [15]:
nba = pd.read_csv("../data/nba.csv", parse_dates = ["Birthday"])
nba

Unnamed: 0,Name,Team,Position,Birthday,Salary
0,Shake Milton,Philadelphia 76ers,SG,1996-09-26,1445697
1,Christian Wood,Detroit Pistons,PF,1995-09-27,1645357
2,PJ Washington,Charlotte Hornets,PF,1998-08-23,3831840
3,Derrick Rose,Detroit Pistons,PG,1988-10-04,7317074
4,Marial Shayok,Philadelphia 76ers,G,1995-07-26,79568
...,...,...,...,...,...
445,Austin Rivers,Houston Rockets,PG,1992-08-01,2174310
446,Harry Giles,Sacramento Kings,PF,1998-04-22,2578800
447,Robin Lopez,Milwaukee Bucks,C,1988-04-01,4767000
448,Collin Sexton,Cleveland Cavaliers,PG,1999-01-04,4764960


A <code>DataFrame</code> can hold heterogeneous data. 

*Heterogeneous* means mixed or varied. One column can hold integers, and another can hold strings. 

A <code>DataFrame</code> has a unique <code>dtypes</code> attribute.

In [16]:
nba.dtypes

Name                object
Team                object
Position            object
Birthday    datetime64[ns]
Salary               int64
dtype: object

The <code>object</code> data type is pandas’ lingo for complex objects including strings.

A <code>DataFrame</code> consists of several smaller objects: an index that holds the row labels, an index that holds the column labels, and a data container that holds the values. The <code>index</code> attribute exposes the index of the <code>DataFrame</code>:

In [17]:
nba.index

RangeIndex(start=0, stop=450, step=1)

Pandas uses a separate index object to store a <code>DataFrame</code>’s columns. We can access it via the <code>columns</code> attribute:

In [18]:
nba.columns

Index(['Name', 'Team', 'Position', 'Birthday', 'Salary'], dtype='object')

The <code>shape</code> attribute returns the <code>DataFrame</code>’s dimensions in a tuple.

The <code>sample</code> method extracts random rows from the <code>DataFrame</code>. Its first parameter specifies the number of rows:

In [21]:
nba.sample(4)

Unnamed: 0,Name,Team,Position,Birthday,Salary
58,Marc Gasol,Toronto Raptors,C,1985-01-29,25595700
416,Vincent Poirier,Boston Celtics,C,1993-10-17,2505793
41,DaQuan Jeffries,Sacramento Kings,SG,1997-08-30,898310
116,Duncan Robinson,Miami Heat,PF,1994-04-22,1416852


Suppose that we want to find out how many teams, salaries, and positions exist in this data set. We invoke the <code>nunique</code> method to count the number of unique values per column:

In [22]:
nba.nunique()

Name        450
Team         30
Position      9
Birthday    430
Salary      269
dtype: int64

What if we want to identify multiple max values, such as the four highest-paid players in the data set? 

The <code>nlargest</code> method retrieves a subset of rows in which a given column has the largest values in the <code>DataFrame</code> .

We pass the number of rows to extract to its n parameter and the column to use for sorting to its columns parameter.

In [23]:
nba.nlargest(n = 4, columns = "Salary")

Unnamed: 0,Name,Team,Position,Birthday,Salary
205,Stephen Curry,Golden State Warriors,PG,1988-03-14,40231758
38,Chris Paul,Oklahoma City Thunder,PG,1985-05-06,38506482
219,Russell Westbrook,Houston Rockets,PG,1988-11-12,38506482
251,John Wall,Washington Wizards,PG,1990-09-06,38199000


Now we want find the three oldest players in the league. 

We can accomplish this task by getting the three earliest dates in the Birthday column. 

The <code>nsmallest</code> method can help us; it returns a subset of rows in which a given column has the
smallest values in the data set. 

The smallest datetime values are those that occur earliest in chronological order. 

Note that the <code>nlargest</code> and <code>nsmallest</code> methods can be invoked only on numeric or datetime columns:

In [24]:
nba.nsmallest(n = 3, columns = ["Birthday"])

Unnamed: 0,Name,Team,Position,Birthday,Salary
98,Vince Carter,Atlanta Hawks,PF,1977-01-26,2564753
196,Udonis Haslem,Miami Heat,C,1980-06-09,2564753
262,Kyle Korver,Milwaukee Bucks,PF,1981-03-17,6004753


## Sorting a DataFrame

### Sorting by a single column

In [26]:
# The two lines below are equivalent
nba.sort_values("Name")
nba.sort_values(by = "Name").head()

Unnamed: 0,Name,Team,Position,Birthday,Salary
52,Aaron Gordon,Orlando Magic,PF,1995-09-16,19863636
101,Aaron Holiday,Indiana Pacers,PG,1996-09-30,2239200
437,Abdel Nader,Oklahoma City Thunder,SF,1993-09-25,1618520
81,Adam Mokoka,Chicago Bulls,G,1998-07-18,79568
399,Admiral Schofield,Washington Wizards,SF,1997-03-30,1000000


In [27]:
nba.sort_values("Name", ascending = False).head()

Unnamed: 0,Name,Team,Position,Birthday,Salary
248,Zylan Cheatham,New Orleans Pelicans,SF,1995-11-17,79568
137,Zion Williamson,New Orleans Pelicans,F,2000-07-06,9757440
312,Zhaire Smith,Philadelphia 76ers,SG,1999-06-04,3058800
302,Zach Norvell,Los Angeles Lakers,SG,1997-12-09,79568
159,Zach LaVine,Chicago Bulls,PG,1995-03-10,19500000


In [28]:
nba.sort_values("Birthday", ascending = False).head()

Unnamed: 0,Name,Team,Position,Birthday,Salary
136,Sekou Doumbouya,Detroit Pistons,SF,2000-12-23,3285120
432,Talen Horton-Tucker,Los Angeles Lakers,GF,2000-11-25,898310
137,Zion Williamson,New Orleans Pelicans,F,2000-07-06,9757440
313,RJ Barrett,New York Knicks,SG,2000-06-14,7839960
392,Jalen Lecque,Phoenix Suns,G,2000-06-13,898310


### Sorting by multiple columns

In [29]:
nba.sort_values(by = ["Team", "Name"])

Unnamed: 0,Name,Team,Position,Birthday,Salary
359,Alex Len,Atlanta Hawks,C,1993-06-16,4160000
167,Allen Crabbe,Atlanta Hawks,SG,1992-04-09,18500000
276,Brandon Goodwin,Atlanta Hawks,PG,1995-10-02,79568
438,Bruno Fernando,Atlanta Hawks,C,1998-08-15,1400000
194,Cam Reddish,Atlanta Hawks,SF,1999-09-01,4245720
...,...,...,...,...,...
418,Jordan McRae,Washington Wizards,PG,1991-03-28,1645357
273,Justin Robinson,Washington Wizards,PG,1997-10-12,898310
428,Moritz Wagner,Washington Wizards,C,1997-04-26,2063520
21,Rui Hachimura,Washington Wizards,PF,1998-02-08,4469160


We can to sort each column in a different order. We might want to sort the teams in ascending order and then sort the salaries within those teams in descending order, for example.

In [30]:
nba.sort_values(by = ["Team", "Salary"], ascending = [True, False])

Unnamed: 0,Name,Team,Position,Birthday,Salary
111,Chandler Parsons,Atlanta Hawks,SF,1988-10-25,25102512
28,Evan Turner,Atlanta Hawks,PG,1988-10-27,18606556
167,Allen Crabbe,Atlanta Hawks,SG,1992-04-09,18500000
213,De'Andre Hunter,Atlanta Hawks,SF,1997-12-02,7068360
339,Jabari Parker,Atlanta Hawks,PF,1995-03-15,6500000
...,...,...,...,...,...
80,Isaac Bonga,Washington Wizards,PG,1999-11-08,1416852
399,Admiral Schofield,Washington Wizards,SF,1997-03-30,1000000
273,Justin Robinson,Washington Wizards,PG,1997-10-12,898310
283,Garrison Mathews,Washington Wizards,SG,1996-10-24,79568


## Sorting by column index

A DataFrame is a two-dimensional data structure. We can sort an additional axis: the vertical axis.

To sort the <code>DataFrame</code> columns in order, we’ll again rely on the <code>sort_index</code> method. 

This time, however, we’ll need to add an axis parameter and pass it an argu-
ment of "<code>columns</code>" or 1. The next example sorts the columns in ascending order:

In [31]:
nba.sort_index(axis = 1).head()

Unnamed: 0,Birthday,Name,Position,Salary,Team
0,1996-09-26,Shake Milton,SG,1445697,Philadelphia 76ers
1,1995-09-27,Christian Wood,PF,1645357,Detroit Pistons
2,1998-08-23,PJ Washington,PF,3831840,Charlotte Hornets
3,1988-10-04,Derrick Rose,PG,7317074,Detroit Pistons
4,1995-07-26,Marial Shayok,G,79568,Philadelphia 76ers


In [33]:
nba.sort_index(axis = 1, ascending = False).head()

Unnamed: 0,Team,Salary,Position,Name,Birthday
0,Philadelphia 76ers,1445697,SG,Shake Milton,1996-09-26
1,Detroit Pistons,1645357,PF,Christian Wood,1995-09-27
2,Charlotte Hornets,3831840,PF,PJ Washington,1998-08-23
3,Detroit Pistons,7317074,PG,Derrick Rose,1988-10-04
4,Philadelphia 76ers,79568,G,Marial Shayok,1995-07-26


## Setting a new index

The <code>set_index</code> method returns a new <code>DataFrame</code> with a given column set as the index. Its first parameter, <code>keys</code> , accepts the column name as a string:

In [36]:
# The two lines below are equivalent
nba.set_index(keys = "Name")
nba.set_index("Name", inplace = True)
nba

Unnamed: 0_level_0,Team,Position,Birthday,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Shake Milton,Philadelphia 76ers,SG,1996-09-26,1445697
Christian Wood,Detroit Pistons,PF,1995-09-27,1645357
PJ Washington,Charlotte Hornets,PF,1998-08-23,3831840
Derrick Rose,Detroit Pistons,PG,1988-10-04,7317074
Marial Shayok,Philadelphia 76ers,G,1995-07-26,79568
...,...,...,...,...
Austin Rivers,Houston Rockets,PG,1992-08-01,2174310
Harry Giles,Sacramento Kings,PF,1998-04-22,2578800
Robin Lopez,Milwaukee Bucks,C,1988-04-01,4767000
Collin Sexton,Cleveland Cavaliers,PG,1999-01-04,4764960


## Selecting columns and rows from a DataFrame
### Selecting a single column from a DataFrame

In [37]:
nba.Salary

Name
Shake Milton       1445697
Christian Wood     1645357
PJ Washington      3831840
Derrick Rose       7317074
Marial Shayok        79568
                    ...   
Austin Rivers      2174310
Harry Giles        2578800
Robin Lopez        4767000
Collin Sexton      4764960
Ricky Rubio       16200000
Name: Salary, Length: 450, dtype: int64

We can also extract a column by passing its name in square brackets after the <code>DataFrame</code>:

In [38]:
nba["Position"]

Name
Shake Milton      SG
Christian Wood    PF
PJ Washington     PF
Derrick Rose      PG
Marial Shayok      G
                  ..
Austin Rivers     PG
Harry Giles       PF
Robin Lopez        C
Collin Sexton     PG
Ricky Rubio       PG
Name: Position, Length: 450, dtype: object

The advantage of the square-bracket syntax is that it supports column names with spaces.

### Selecting multiple columns from a DataFrame

To extract multiple <code>DataFrame</code> columns, declare a pair of opening and closing square brackets; then pass the column names in a list. 

The result will be a new <code>DataFrame</code> whose columns are in the same order as the list elements.

The next example targets the Salary and Birthday columns:

In [39]:
nba[["Salary", "Birthday"]]

Unnamed: 0_level_0,Salary,Birthday
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Shake Milton,1445697,1996-09-26
Christian Wood,1645357,1995-09-27
PJ Washington,3831840,1998-08-23
Derrick Rose,7317074,1988-10-04
Marial Shayok,79568,1995-07-26
...,...,...
Austin Rivers,2174310,1992-08-01
Harry Giles,2578800,1998-04-22
Robin Lopez,4767000,1988-04-01
Collin Sexton,4764960,1999-01-04


We can use the <code>select_dtypes</code> method to select columns based on their data types.

The method accepts two parameters, <code>include</code> and <code>exclude</code>.

The parameters accept a single string or a list, representing the column type(s) that pandas should keep or
discard.

The next example selects only string columns from <code>nba</code> :

In [40]:
nba.select_dtypes(include = "object")

Unnamed: 0_level_0,Team,Position
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Shake Milton,Philadelphia 76ers,SG
Christian Wood,Detroit Pistons,PF
PJ Washington,Charlotte Hornets,PF
Derrick Rose,Detroit Pistons,PG
Marial Shayok,Philadelphia 76ers,G
...,...,...
Austin Rivers,Houston Rockets,PG
Harry Giles,Sacramento Kings,PF
Robin Lopez,Milwaukee Bucks,C
Collin Sexton,Cleveland Cavaliers,PG


The next example selects all columns except string and integer columns:

In [41]:
nba.select_dtypes(exclude = ["object", "int"])

Unnamed: 0_level_0,Birthday
Name,Unnamed: 1_level_1
Shake Milton,1996-09-26
Christian Wood,1995-09-27
PJ Washington,1998-08-23
Derrick Rose,1988-10-04
Marial Shayok,1995-07-26
...,...
Austin Rivers,1992-08-01
Harry Giles,1998-04-22
Robin Lopez,1988-04-01
Collin Sexton,1999-01-04


## Selecting rows from a DataFrame
### Extracting rows by index label

The <code>loc</code> attribute extracts a row by label. 
 
Type a pair of square brackets immediately after <code>loc</code> and pass in the target index label. 
    
The next example extracts the nba row with an index label of "<code>LeBron James</code>".

In [42]:
nba.loc["LeBron James"]

Team         Los Angeles Lakers
Position                     PF
Birthday    1984-12-30 00:00:00
Salary                 37436858
Name: LeBron James, dtype: object

We can pass a list in between the square brackets to extract multiple rows. 

When the results set includes multiple records, pandas stores the results in a <code>DataFrame</code>:

In [43]:
nba.loc[["Kawhi Leonard", "Paul George"]]

Unnamed: 0_level_0,Team,Position,Birthday,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Kawhi Leonard,Los Angeles Clippers,SF,1991-06-29,32742000
Paul George,Los Angeles Clippers,SF,1990-05-02,33005556


We can use <code>loc</code> to extract a sequence of index labels. 

The syntax mirrors Python’s list slicing syntax. We provide the starting value, a colon, and the ending value.

Let’s say we wanted to target all players between Otto Porter and Patrick Beverley.

We can sort the <code>DataFrame</code> index to get the player names in alphabetical order and
then provide the two player names to the loc accessor. "<code>Otto Porter</code>" represents
our lower bound, and "<code>Patrick Beverley</code>" represents the upper bound:

In [44]:
nba.sort_index().loc["Otto Porter":"Patrick Beverley"]

Unnamed: 0_level_0,Team,Position,Birthday,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Otto Porter,Chicago Bulls,SF,1993-06-03,27250576
PJ Dozier,Denver Nuggets,PG,1996-10-25,79568
PJ Washington,Charlotte Hornets,PF,1998-08-23,3831840
Pascal Siakam,Toronto Raptors,PF,1994-04-02,2351838
Pat Connaughton,Milwaukee Bucks,SG,1993-01-06,1723050
Patrick Beverley,Los Angeles Clippers,PG,1988-07-12,12345680


We can use <code>loc</code> to pull rows from the middle of the DataFrame to its end. 

Pass the square brackets the starting index label and a colon:

In [45]:
nba.sort_index().loc["Zach Collins":]

Unnamed: 0_level_0,Team,Position,Birthday,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Zach Collins,Portland Trail Blazers,C,1997-11-19,4240200
Zach LaVine,Chicago Bulls,PG,1995-03-10,19500000
Zach Norvell,Los Angeles Lakers,SG,1997-12-09,79568
Zhaire Smith,Philadelphia 76ers,SG,1999-06-04,3058800
Zion Williamson,New Orleans Pelicans,F,2000-07-06,9757440
Zylan Cheatham,New Orleans Pelicans,SF,1995-11-17,79568


Turning in the other direction, we can use <code>loc</code> slicing to pull rows from the beginning of the <code>DataFrame</code> to a specific index label. 

Start with a colon and then enter the index label to extract to.

In [46]:
nba.sort_index().loc[:"Al Horford"]

Unnamed: 0_level_0,Team,Position,Birthday,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aaron Gordon,Orlando Magic,PF,1995-09-16,19863636
Aaron Holiday,Indiana Pacers,PG,1996-09-30,2239200
Abdel Nader,Oklahoma City Thunder,SF,1993-09-25,1618520
Adam Mokoka,Chicago Bulls,G,1998-07-18,79568
Admiral Schofield,Washington Wizards,SF,1997-03-30,1000000
Al Horford,Philadelphia 76ers,C,1986-06-03,28000000


## Extracting rows by index position

The <code>iloc</code> (index location) accessor extracts rows by index position, which is helpful
when the position of our rows has significance in our data set. 

The syntax is similar to the one we used for <code>loc</code> . 

Enter a pair of square brackets after <code>iloc</code> , and pass in an integer.

In [47]:
nba.iloc[30]

Team              Brooklyn Nets
Position                     PF
Birthday    1999-04-17 00:00:00
Salary                   898310
Name: Nicolas Claxton, dtype: object

The <code>iloc</code> accessor also accepts a list of index positions to target multiple records. The
next example pulls out the players at index positions 100, 200, 300, and 400:

In [48]:
nba.iloc[[100, 200, 300, 400]]

Unnamed: 0_level_0,Team,Position,Birthday,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Brian Bowen,Indiana Pacers,SG,1998-10-02,79568
Marco Belinelli,San Antonio Spurs,SF,1986-03-25,5846154
Jarred Vanderbilt,Denver Nuggets,PF,1999-04-03,1416852
Louis King,Detroit Pistons,F,1999-04-06,79568


We can use list-slicing syntax with the <code>iloc</code> accessor as well. 

Note, however, that pandas excludes the index position after the colon. 

The next example passes a slice of
400:404 .

In [49]:
nba.iloc[400:404]

Unnamed: 0_level_0,Team,Position,Birthday,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Louis King,Detroit Pistons,F,1999-04-06,79568
Kostas Antetokounmpo,Los Angeles Lakers,PF,1997-11-20,79568
Rodions Kurucs,Brooklyn Nets,PF,1998-02-05,1699236
Spencer Dinwiddie,Brooklyn Nets,PG,1993-04-06,10605600


In [50]:
nba.iloc[0:10:2]

Unnamed: 0_level_0,Team,Position,Birthday,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Shake Milton,Philadelphia 76ers,SG,1996-09-26,1445697
PJ Washington,Charlotte Hornets,PF,1998-08-23,3831840
Marial Shayok,Philadelphia 76ers,G,1995-07-26,79568
Kendrick Nunn,Miami Heat,SG,1995-08-03,1416852
Brook Lopez,Milwaukee Bucks,C,1988-04-01,12093024


### Extracting values from specific columns

Both the <code>loc</code> and <code>iloc</code> attributes accept a second argument representing the col-
umn(s) to extract. 

If we’re using <code>loc</code> , we have to provide the column name. If we’re using <code>iloc</code> , we have to provide the column position. 

The next example uses <code>loc</code> to pull the value at the intersection of the "<code>Giannis Antetokounmpo</code>" row and the Team column:


In [51]:
nba.loc["Giannis Antetokounmpo", "Team"]

'Milwaukee Bucks'

To specify multiple values, we can pass a list for one or both of the arguments to the <code>loc</code> accessor. The next example extracts the row with a "<code>James Harden</code>" index label and the values from the Position and Birthday columns. Pandas returns a Series :

In [52]:
nba.loc["James Harden", ["Position", "Birthday"]]

Position                     PG
Birthday    1989-08-26 00:00:00
Name: James Harden, dtype: object

The next example provides multiple row labels and multiple columns:

In [54]:
nba.loc[
            ["Russell Westbrook", "Anthony Davis"],
            ["Team", "Salary"]
]

Unnamed: 0_level_0,Team,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Russell Westbrook,Houston Rockets,38506482
Anthony Davis,Los Angeles Lakers,27093019


We can also use list-slicing syntax to extract multiple columns without explicitly writing out their names. We have four columns in our data set (Team, Position, Birthday,and Salary). Let’s extract all columns from Position to Salary. Pandas includes both endpoints in a <code>loc</code> slice:

In [55]:
nba.loc["Joel Embiid", "Position":"Salary"]

Position                      C
Birthday    1994-03-16 00:00:00
Salary                 27504630
Name: Joel Embiid, dtype: object

We must pass the column names in the order in which they appear in the <code>DataFrame</code>.

The next example yields an empty result because the Salary column comes after the Position column. Pandas is unable to identify which columns to pull out:

In [56]:
nba.loc["Joel Embiid", "Salary":"Position"]

Series([], Name: Joel Embiid, dtype: object)

Let’s say we wanted to target columns by their order rather than by their name.

Remember that pandas assigns an index position to each <code>DataFrame</code> column. 

In <code>nba</code>,the Team column has an index of 0, Position has an index of 1, and so on. 

We can pass a column’s index as the second argument to <code>iloc</code>. 

The next example targets the value at the intersection of the row at index 57 and the column at index 3 (Salary):

In [57]:
nba.iloc[57, 3]

796806

We can use list-slicing syntax here as well. The next example pulls all rows from index position 100 up to but not including index position 104. It also includes all columns from the beginning of the columns up to but not including the column at index position 3 (Salary):

In [58]:
nba.iloc[100:104, :3]

Unnamed: 0_level_0,Team,Position,Birthday
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Brian Bowen,Indiana Pacers,SG,1998-10-02
Aaron Holiday,Indiana Pacers,PG,1996-09-30
Troy Daniels,Los Angeles Lakers,SG,1991-07-15
Buddy Hield,Sacramento Kings,SG,1992-12-17


The <code>iloc</code> and <code>loc</code> accessors are remarkably versatile. 

Their square brackets can accept a single value, a list of values, a list slice, and more. 

The disadvantage of this flexibility is that it demands extra overhead; pandas has to figure out what kind of
input we’ve given to <code>iloc</code> or <code>loc</code>.

We can use two alternative attributes, <code>at</code> and <code>iat</code>, when we know that we want to
extract a single value from a <code>DataFrame</code> . 

The two attributes are speedier because pandas can optimize its searching algorithms when looking for a single value.

The syntax is similar. The <code>at</code> attribute accepts row and column labels:

In [59]:
nba.at["Austin Rivers", "Birthday"]

Timestamp('1992-08-01 00:00:00')

The <code>iat</code> attribute accepts row and column indices:

In [60]:
nba.iat[263, 1]

'PF'

Jupyter Notebook includes several magic methods to help enhance our developer experience. 

We declare magic methods with a <code>%%</code> prefix and enter them alongside our regular Python code. 

One example is <code>%%timeit</code> , which runs the code in a cell and calculates the average time it takes to execute. 

<code>%%timeit</code> sometimes runs the cell up to 100,000 times! The next examples use the magic method to compare the speed of the accessors we’ve explored so far:

In [61]:
%%timeit
nba.at["Austin Rivers", "Birthday"]

6.22 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [62]:
%%timeit
nba.loc["Austin Rivers", "Birthday"]

10.1 µs ± 70.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [63]:
%%timeit
nba.iat[263, 1]

15.3 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [64]:
%%timeit
nba.iloc[263, 1]

19.8 µs ± 175 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


## Renaming columns or rows

We can rename any or all of a <code>DataFrame</code>’s columns by assigning a list of new names to the attribute. The next example changes the name of the Salary column to Pay:

In [66]:
nba.columns = ["Team", "Position", "Date of Birth", "Pay"]
nba

Unnamed: 0_level_0,Team,Position,Date of Birth,Pay
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Shake Milton,Philadelphia 76ers,SG,1996-09-26,1445697
Christian Wood,Detroit Pistons,PF,1995-09-27,1645357
PJ Washington,Charlotte Hornets,PF,1998-08-23,3831840
Derrick Rose,Detroit Pistons,PG,1988-10-04,7317074
Marial Shayok,Philadelphia 76ers,G,1995-07-26,79568
...,...,...,...,...
Austin Rivers,Houston Rockets,PG,1992-08-01,2174310
Harry Giles,Sacramento Kings,PF,1998-04-22,2578800
Robin Lopez,Milwaukee Bucks,C,1988-04-01,4767000
Collin Sexton,Cleveland Cavaliers,PG,1999-01-04,4764960


The <code>rename</code> method is an alternative option that accomplishes the same result. 

We can pass to its <code>columns</code> parameter a dictionary in which the keys are the existing col-
umn names and the values are their new names. 

The next example alters the Date of Birth column’s name to Birthday:

In [67]:
nba.rename(columns = { "Date of Birth": "Birthday" })

Unnamed: 0_level_0,Team,Position,Birthday,Pay
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Shake Milton,Philadelphia 76ers,SG,1996-09-26,1445697
Christian Wood,Detroit Pistons,PF,1995-09-27,1645357
PJ Washington,Charlotte Hornets,PF,1998-08-23,3831840
Derrick Rose,Detroit Pistons,PG,1988-10-04,7317074
Marial Shayok,Philadelphia 76ers,G,1995-07-26,79568
...,...,...,...,...
Austin Rivers,Houston Rockets,PG,1992-08-01,2174310
Harry Giles,Sacramento Kings,PF,1998-04-22,2578800
Robin Lopez,Milwaukee Bucks,C,1988-04-01,4767000
Collin Sexton,Cleveland Cavaliers,PG,1999-01-04,4764960


Let’s make the operation permanent by assigning the returned <code>DataFrame</code> to the <code>nba</code> variable:

In [68]:
nba = nba.rename(columns = { "Date of Birth": "Birthday" })
nba

Unnamed: 0_level_0,Team,Position,Birthday,Pay
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Shake Milton,Philadelphia 76ers,SG,1996-09-26,1445697
Christian Wood,Detroit Pistons,PF,1995-09-27,1645357
PJ Washington,Charlotte Hornets,PF,1998-08-23,3831840
Derrick Rose,Detroit Pistons,PG,1988-10-04,7317074
Marial Shayok,Philadelphia 76ers,G,1995-07-26,79568
...,...,...,...,...
Austin Rivers,Houston Rockets,PG,1992-08-01,2174310
Harry Giles,Sacramento Kings,PF,1998-04-22,2578800
Robin Lopez,Milwaukee Bucks,C,1988-04-01,4767000
Collin Sexton,Cleveland Cavaliers,PG,1999-01-04,4764960


We can also rename index labels by passing a dictionary to the method’s <code>index</code> parameter. 

The same logic applies; the keys are the old labels, and the values are the new ones. 

The following example swaps "<code>Giannis Antetokounmpo</code>" with his popular nickname "<code>Greek Freak</code>":

In [69]:
nba.loc["Giannis Antetokounmpo"]

Team            Milwaukee Bucks
Position                     PF
Birthday    1994-12-06 00:00:00
Pay                    25842697
Name: Giannis Antetokounmpo, dtype: object

In [70]:
nba = nba.rename(index = { "Giannis Antetokounmpo": "Greek Freak" })

In [71]:
nba.loc["Greek Freak"]

Team            Milwaukee Bucks
Position                     PF
Birthday    1994-12-06 00:00:00
Pay                    25842697
Name: Greek Freak, dtype: object

### Resetting an index

The <code>reset_index</code> method moves the current index to a <code>DataFrame</code> column and replaces the former index with pandas’ numeric index:

In [72]:
nba.reset_index().head()

Unnamed: 0,Name,Team,Position,Birthday,Pay
0,Shake Milton,Philadelphia 76ers,SG,1996-09-26,1445697
1,Christian Wood,Detroit Pistons,PF,1995-09-27,1645357
2,PJ Washington,Charlotte Hornets,PF,1998-08-23,3831840
3,Derrick Rose,Detroit Pistons,PG,1988-10-04,7317074
4,Marial Shayok,Philadelphia 76ers,G,1995-07-26,79568


In [73]:
nba.reset_index().set_index("Team").head()

Unnamed: 0_level_0,Name,Position,Birthday,Pay
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Philadelphia 76ers,Shake Milton,SG,1996-09-26,1445697
Detroit Pistons,Christian Wood,PF,1995-09-27,1645357
Charlotte Hornets,PJ Washington,PF,1998-08-23,3831840
Detroit Pistons,Derrick Rose,PG,1988-10-04,7317074
Philadelphia 76ers,Marial Shayok,G,1995-07-26,79568


One advantage of avoiding the <code>inplace</code> parameter is that we can chain multiple method calls. 

Let’s chain the <code>reset_index</code> and <code>set_index</code> method calls and overwrite the <code>nba</code> variable with the result:

In [75]:
nba = nba.reset_index().set_index("Team")
nba

Unnamed: 0_level_0,Name,Position,Birthday,Pay
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Philadelphia 76ers,Shake Milton,SG,1996-09-26,1445697
Detroit Pistons,Christian Wood,PF,1995-09-27,1645357
Charlotte Hornets,PJ Washington,PF,1998-08-23,3831840
Detroit Pistons,Derrick Rose,PG,1988-10-04,7317074
Philadelphia 76ers,Marial Shayok,G,1995-07-26,79568
...,...,...,...,...
Houston Rockets,Austin Rivers,PG,1992-08-01,2174310
Sacramento Kings,Harry Giles,PF,1998-04-22,2578800
Milwaukee Bucks,Robin Lopez,C,1988-04-01,4767000
Cleveland Cavaliers,Collin Sexton,PG,1999-01-04,4764960
