# How to extract data from pd Dataframe Part I
> "In this tutorial we learn how to set and reset index in a dataframe, retrieve specific rows and columns using the loc and iloc accessors, change values in a dataframe."

- toc:true- branch: master
- badges: true
- comments: true
- author: Dinesh Kesaboina
- categories: [pandas]

In [153]:
import pandas as pd

In [154]:
df = pd.read_csv('./data/jamesbond.csv')
df.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


## How to extract rows

### `set_index` and `reset_index`

* `read_csv` method has an `index_col` parameter that lets you set an index for your dataframe
* Remember that the index of a dataframe is allowed to have duplicate values. That's what makes a dataframe significantly more powerful than a python dictionary.

In [155]:
df = pd.read_csv('./data/jamesbond.csv', index_col='Film')
df.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [156]:
df = pd.read_csv('./data/jamesbond.csv')
df.head(3)

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


You can also set the index *after* loading the dataset as well.

In [157]:
# You can make the operation permanent by inplace parameter
df.set_index('Film').head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [158]:
df.set_index('Film', inplace=True)

If you wanted to undo or *reset* the index to pandas default numeric index, then...

In [159]:
df.reset_index().head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


Pay attention to the parameters of the `reset_index()` method here. There is something called `drop` parameter which lets you drop the previous index. By default it is set to `False` so you won't see it missing when you reset the index. However, you could delete the old index by changing the `drop=True` though :)

In [160]:
# Film is dropped here. Don't worry this operation is not permanent
df.reset_index(drop=True).head()

Unnamed: 0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,1967,David Niven,Ken Hughes,315.0,85.0,


In [161]:
df.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


We can see clearly that `Film` column is the index for this dataframe. Now let's say you want to set someother column as the index (say `Year`) and *not* `Film`. We could do this:

In [162]:
df.set_index('Year').head()

Unnamed: 0_level_0,Actor,Director,Box Office,Budget,Bond Actor Salary
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962,Sean Connery,Terence Young,448.8,7.0,0.6
1963,Sean Connery,Terence Young,543.8,12.6,1.6
1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
1965,Sean Connery,Terence Young,848.1,41.9,4.7
1967,David Niven,Ken Hughes,315.0,85.0,


Notice that this operation dropped our old index by default! This is not what we wanted so let's see how we can fix this...

In [163]:
# You can make this permanent by using the inplace argument of course
df.reset_index().set_index('Year').head()

Unnamed: 0_level_0,Film,Actor,Director,Box Office,Budget,Bond Actor Salary
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1962,Dr. No,Sean Connery,Terence Young,448.8,7.0,0.6
1963,From Russia with Love,Sean Connery,Terence Young,543.8,12.6,1.6
1964,Goldfinger,Sean Connery,Guy Hamilton,820.4,18.6,3.2
1965,Thunderball,Sean Connery,Terence Young,848.1,41.9,4.7
1967,Casino Royale,David Niven,Ken Hughes,315.0,85.0,


### Retrieve rows by index label using `.loc[]` accessor 

In [164]:
df.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


#### Tip
* One optimisation that one can always help in Pandas is to sort the dataframe by the index. 
* It is a lot easier for pandas to find a value in a sorted dataframe than it is to find it in a randomly ordered dataframe.

In [165]:
df.sort_index(inplace=True)
df.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


* One can access a row in a Series by its label or its index position by placing that label or index position within square brackets.
* When it comes to a dataframe, we need to preceed that square bracket with a `.loc` accessor.

In [166]:
# If the accessor returns more than 1 value, it is a dataframe
df.loc['Casino Royale']

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [167]:
# If the accessor returns 1 value, then it is a series 
df.loc['Goldfinger']

Year                         1964
Actor                Sean Connery
Director             Guy Hamilton
Box Office                  820.4
Budget                       18.6
Bond Actor Salary             3.2
Name: Goldfinger, dtype: object

We can extract all index labels in between two movies like this 

In [168]:
df.loc['Diamonds Are Forever':'License To Kill']

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9


* Notice how this slicing is different from list slicing. In list slicing, the last value in the slicing is excluded. However, here the last value in the slice is included in the result.
* Building off of list slicing syntax, we can extract slices that skip a few movies by steps.

In [169]:
df.loc['Diamonds Are Forever':'License To Kill':2]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


* As you can imagine, we can also use other list slicing techniques here. We can pull all data from a given point to the very end of the dataframe

In [170]:
df.loc['The Spy Who Loved Me':]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
The Spy Who Loved Me,1977,Roger Moore,Lewis Gilbert,533.0,45.1,
The World Is Not Enough,1999,Pierce Brosnan,Michael Apted,439.5,158.3,13.5
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Tomorrow Never Dies,1997,Pierce Brosnan,Roger Spottiswoode,463.2,133.9,10.0
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4


* You can also extract data from the beginning of the dataframe upto a certain point

In [171]:
df.loc[:'Diamonds Are Forever']

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8


* You can extract more than one index labels from the dataframe like this:

In [172]:
df.loc[['GoldenEye', 'Casino Royale', 'Dr. No']]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6


### Retrieve rows by index position using `iloc` accessor

We are going to need the standard numeric pandas index. So we will have to reset the index:


In [173]:
df.reset_index(inplace=True)

In [174]:
df.iloc[0]

Film                 A View to a Kill
Year                             1985
Actor                     Roger Moore
Director                    John Glen
Box Office                      275.2
Budget                           54.5
Bond Actor Salary                 9.1
Name: 0, dtype: object

Remember that we previously sorted the dataframe on the `Name` index. Hence the first value we retrieve is "A View to a Kill". However let's load the dataframe again using the `read_csv` function

In [175]:
df = pd.read_csv('./data/jamesbond.csv')
df.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [176]:
df.iloc[0]

Film                        Dr. No
Year                          1962
Actor                 Sean Connery
Director             Terence Young
Box Office                   448.8
Budget                           7
Bond Actor Salary              0.6
Name: 0, dtype: object

Similar to `loc` we can extract multiple rows using a python list like so:

In [177]:
df.iloc[[0, 5, 10, 15]]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
5,You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
10,The Spy Who Loved Me,1977,Roger Moore,Lewis Gilbert,533.0,45.1,
15,A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1


As you can guess, we can extract rows between certain index position values as well!

In [178]:
df.iloc[1:8]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
5,You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
6,On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
7,Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8


#### (Difference between `loc` and `iloc`)
**Caveat**:
Notice here that the last row mentioned in the slice will not be included just like a python list slicing! 

In [179]:
# 5 most recent bond films
df.iloc[21:]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
21,Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
22,Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
23,Quantum of Solace,2008,Daniel Craig,Marc Forster,514.2,181.4,8.1
24,Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
25,Spectre,2015,Daniel Craig,Sam Mendes,726.7,206.3,


In [180]:
# First five bond films in chronological order
df.iloc[:5]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


#### Note:
Integer indexes exist even if your dataframe has its index set to a non-integer index. Let me demonstrate what I mean here:

In [181]:
df.set_index('Film', inplace=True)

In [182]:
df.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [183]:
df.iloc[[0, 5, 10, 15]]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
The Spy Who Loved Me,1977,Roger Moore,Lewis Gilbert,533.0,45.1,
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1


In [184]:
df.iloc[:5]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


Notice how the integer index syntax still works regardless of what we set our index to be! 

#### Summary:
* You can slice a pandas dataframe in the same number of ways as you can slice python list. Just remember to use either `loc` or `iloc` accessor(s)!
* Remember to use *square brackets* while using these properties and not parantheses like we are so used to!

## How to extract columns (and rows)

In [185]:
# Let's sort the dataframe according to the index (Film)
df.sort_index(inplace=True)
df.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [186]:
# We can access a series object using the loc accessor
df.loc['Die Another Day']

Year                           2002
Actor                Pierce Brosnan
Director               Lee Tamahori
Box Office                    465.4
Budget                        154.2
Bond Actor Salary              17.9
Name: Die Another Day, dtype: object

What if we wanted to extract the actor information only? It's simple!

In [187]:
df.loc['Die Another Day', 'Actor']

'Pierce Brosnan'

Now say we wanted to know the director as well as the box office returns for this movie? That's simple too!

In [188]:
df.loc['Die Another Day', ['Director', 'Box Office']]

Director      Lee Tamahori
Box Office           465.4
Name: Die Another Day, dtype: object

What if we wanted this information for Moonraker movie as well?

In [189]:
df.loc[['Moonraker', 'Die Another Day'], ['Director', 'Box Office']]

Unnamed: 0_level_0,Director,Box Office
Film,Unnamed: 1_level_1,Unnamed: 2_level_1
Moonraker,Lewis Gilbert,535.0
Die Another Day,Lee Tamahori,465.4


Pandas is remarkably flexible with slicing is it is completely acceptable to do this:

In [190]:
df.loc['Moonraker', 'Director':'Budget']

Director      Lewis Gilbert
Box Office              535
Budget                 91.5
Name: Moonraker, dtype: object

In [191]:
# Or this
df.loc['Moonraker':'Quantum of Solace', 'Director':'Budget']

Unnamed: 0_level_0,Director,Box Office,Budget
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Moonraker,Lewis Gilbert,535.0,91.5
Never Say Never Again,Irvin Kershner,380.0,86.0
Octopussy,John Glen,373.8,53.9
On Her Majesty's Secret Service,Peter R. Hunt,291.5,37.3
Quantum of Solace,Marc Forster,514.2,181.4


In [192]:
# Or this
df.loc['The World Is Not Enough':, 'Director':]

Unnamed: 0_level_0,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
The World Is Not Enough,Michael Apted,439.5,158.3,13.5
Thunderball,Terence Young,848.1,41.9,4.7
Tomorrow Never Dies,Roger Spottiswoode,463.2,133.9,10.0
You Only Live Twice,Lewis Gilbert,514.2,59.9,4.4


In [193]:
df.loc[:'Die Another Day', :'Budget']

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2


Now predictably, we can extend this behaviour to integer indexes too.

In [194]:
df.iloc[14, 2]

'John Glen'

In [195]:
df.iloc[14, 2:5]

Director      John Glen
Box Office        373.8
Budget             53.9
Name: Octopussy, dtype: object

In [196]:
df.iloc[:5, :3]

Unnamed: 0_level_0,Year,Actor,Director
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A View to a Kill,1985,Roger Moore,John Glen
Casino Royale,2006,Daniel Craig,Martin Campbell
Casino Royale,1967,David Niven,Ken Hughes
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton
Die Another Day,2002,Pierce Brosnan,Lee Tamahori


#### Summary:
* First argument to `loc` or `iloc` : rows
* Second argument to `loc` or `iloc` : columns

## Set values

### Set a new value to specific cell

In [197]:
df.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [198]:
df.loc['Dr. No', 'Actor']

'Sean Connery'

In [199]:
# Setting a new value
df.loc['Dr. No', 'Actor'] = 'Sir Sean Connery'

In [200]:
df.loc['Dr. No', 'Actor']

'Sir Sean Connery'

Say we wanted to replace multiple values inside the same row: 

In [201]:
df.loc['Dr. No', ['Box Office', 'Budget', 'Bond Actor Salary']] = [448000000, 7000000, 600000]

In [202]:
df.loc['Dr. No', 'Box Office']

448000000.0

### Set multiple values in Dataframe

Let's change Sean Connery's name to Sir Sean Connery

In [203]:
df = pd.read_csv('./data/jamesbond.csv', index_col='Film')
df.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [204]:
df[df['Actor'] == 'Sean Connery']

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,


In [205]:
connery_mask = df['Actor'] == 'Sean Connery'
df[connery_mask]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,


#### Wrong way to replace values


In [206]:
df[connery_mask]['Actor']

Film
Dr. No                   Sean Connery
From Russia with Love    Sean Connery
Goldfinger               Sean Connery
Thunderball              Sean Connery
You Only Live Twice      Sean Connery
Diamonds Are Forever     Sean Connery
Never Say Never Again    Sean Connery
Name: Actor, dtype: object

In [207]:
df[connery_mask]['Actor'] = 'Sir Sean Connery'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [208]:
df[connery_mask]['Actor']

Film
Dr. No                   Sean Connery
From Russia with Love    Sean Connery
Goldfinger               Sean Connery
Thunderball              Sean Connery
You Only Live Twice      Sean Connery
Diamonds Are Forever     Sean Connery
Never Say Never Again    Sean Connery
Name: Actor, dtype: object

Nothing changed. Why? If you read the warning it says a value is trying to be set on a *copy* of a dataframe. That's the problem here. We want our changes to be reflected on the original dataframe and not a copy of it!

#### Right way to change values

In [209]:
df.loc[connery_mask, 'Actor']

Film
Dr. No                   Sean Connery
From Russia with Love    Sean Connery
Goldfinger               Sean Connery
Thunderball              Sean Connery
You Only Live Twice      Sean Connery
Diamonds Are Forever     Sean Connery
Never Say Never Again    Sean Connery
Name: Actor, dtype: object

In [210]:
df.loc[connery_mask, 'Actor'] = 'Sir Sean Connery'

In [211]:
df.loc[connery_mask, 'Actor']

Film
Dr. No                   Sir Sean Connery
From Russia with Love    Sir Sean Connery
Goldfinger               Sir Sean Connery
Thunderball              Sir Sean Connery
You Only Live Twice      Sir Sean Connery
Diamonds Are Forever     Sir Sean Connery
Never Say Never Again    Sir Sean Connery
Name: Actor, dtype: object

Hopefully this tutorial was helpful in someway! Check out part II of this tutorial to learn how to rename columns/index labels, delete rows and columns, filter dataframes, using the apply function and more!!

P.S: Data found in this tutorial can be found in the Github repo with the same name. Visit my Github profile for more info.