# Selecting data from a pandas DataFrame

A fundamental task when working with a DataFrame is selecting data from it. One thing that you will notice straight away is that there **many** different ways in which this can be done. At first, this might seem quite confusing and redundant. But with more practice, you will come to appreciate the flexibility provided by having all these different ways, and you will learn to choose the most appropriate method for your particular needs.

Before we begin, let's review the main components of a DataFrame.

### The components of a DataFrame
<img src="picture.png" width="900">


Essentially a DataFrame is composed of the following three elements:

* the **index labels** (these are the bold numbers from 0 to 9 on the left hand side of the table)
* the **column names** (these are the bold names on the top of the table)
* the **data** itself (this is everything else inside the actual cells of the table)

Let's set up the DataFrame from the image above so that we can look at it further.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('data.csv', index_col=0)
df

Unnamed: 0,Mountain,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
0,Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿,,1953,>>145,121.0
1,K2 / Qogir / Godwin Austen,8611,Baltoro Karakoram,35°52′53″N 76°30′48″E﻿,Mount Everest,1954,45,44.0
2,Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38,24.0
3,Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26.0
4,Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52.0
5,Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28.0
6,Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960,51,39.0
7,Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49,45.0
8,Nanga Parbat,8126,Nanga Parbat Himalaya,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953,52,67.0
9,Annapurna I,8091,Annapurna Himalaya,28°35′44″N 83°49′13″E﻿,Cho Oyu,1950,36,47.0


The default indexing in pandas is always a numbering starting at `0` but we can change this to anything that we want, even non-numerical values. For the purposes of our exercise let's change the row index to be non-numerical, for example the column `Mountain` giving the names of the mountains.

In [3]:
df.set_index('Mountain', inplace=True)
df

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿,,1953,>>145,121.0
K2 / Qogir / Godwin Austen,8611,Baltoro Karakoram,35°52′53″N 76°30′48″E﻿,Mount Everest,1954,45,44.0
Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38,24.0
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26.0
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52.0
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28.0
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960,51,39.0
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49,45.0
Nanga Parbat,8126,Nanga Parbat Himalaya,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953,52,67.0
Annapurna I,8091,Annapurna Himalaya,28°35′44″N 83°49′13″E﻿,Cho Oyu,1950,36,47.0


We can now see that the numerical  indexing from 0 has been replaced by the column giving the mountain names. Given a DataFrame we can always check its index labels and column names using the attributes `index` and `columns` respectively.

In [4]:
df.index

Index(['Mount Everest / Sagarmatha / Chomolungma',
       'K2 / Qogir / Godwin Austen', 'Kangchenjunga', 'Lhotse', 'Makalu',
       'Cho Oyu', 'Dhaulagiri I', 'Manaslu', 'Nanga Parbat', 'Annapurna I'],
      dtype='object', name='Mountain')

In [5]:
df.columns

Index(['Height (m)', 'Range', 'Coordinates', 'Parent mountain', 'First ascent',
       'Ascents bef. 2004', 'Failed attempts bef. 2004'],
      dtype='object')

In addition to the index labels and column names we can also refer to rows and columns in the DataFrame using their position. The positioning of rows and columns starts at `0`. In our example we have 10 rows which are from position `0` to position `9` and `7` columns which are from position `0` to position `6`. 

Now that we understand the structure of a DataFrame, we can go on with the main topic: selecting data.

# The attribute operator `.` to select columns


The attribute operator allows us to select a single column at a time. This is because in the implementation of the DataFrame object the columns are automatically declared as attributes. Let's look at an example

In [6]:
df.Range

Mountain
Mount Everest / Sagarmatha / Chomolungma       Mahalangur Himalaya
K2 / Qogir / Godwin Austen                       Baltoro Karakoram
Kangchenjunga                               Kangchenjunga Himalaya
Lhotse                                         Mahalangur Himalaya
Makalu                                         Mahalangur Himalaya
Cho Oyu                                        Mahalangur Himalaya
Dhaulagiri I                                   Dhaulagiri Himalaya
Manaslu                                           Manaslu Himalaya
Nanga Parbat                                 Nanga Parbat Himalaya
Annapurna I                                     Annapurna Himalaya
Name: Range, dtype: object

This returns a Series object whose index is the same as of the original DataFrame and whose entries correspond to the column which we selected. 

Now if we tried to use the attribute operator with the column `'Height (m)'` we will obtain an error since Python's language syntax cannot allow for certain characters such as white spaces direct attribute reference. We can get around this using the `getattr` function as follows

In [7]:
getattr(df, 'Height (m)')

Mountain
Mount Everest / Sagarmatha / Chomolungma    8848
K2 / Qogir / Godwin Austen                  8611
Kangchenjunga                               8586
Lhotse                                      8516
Makalu                                      8485
Cho Oyu                                     8188
Dhaulagiri I                                8167
Manaslu                                     8163
Nanga Parbat                                8126
Annapurna I                                 8091
Name: Height (m), dtype: int64

But in general, the attribute operator is restricted to those columns whose names can be accessed via direct attribute reference. 

# The index operator `[ ]` to select columns


Another way that we can select a specific column is by passing its name to the index operator. Note that the name has to be passed inside quotation marks.

In [8]:
df['Height (m)']

Mountain
Mount Everest / Sagarmatha / Chomolungma    8848
K2 / Qogir / Godwin Austen                  8611
Kangchenjunga                               8586
Lhotse                                      8516
Makalu                                      8485
Cho Oyu                                     8188
Dhaulagiri I                                8167
Manaslu                                     8163
Nanga Parbat                                8126
Annapurna I                                 8091
Name: Height (m), dtype: int64

The index operator also allows us to select multiple columns at a time. We can do this by passing a list containing all the column names that we want to select.

In [9]:
df[['Height (m)', 'Range', 'Coordinates']]

Unnamed: 0_level_0,Height (m),Range,Coordinates
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿
K2 / Qogir / Godwin Austen,8611,Baltoro Karakoram,35°52′53″N 76°30′48″E﻿
Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿
Nanga Parbat,8126,Nanga Parbat Himalaya,35°14′14″N 74°35′21″E﻿
Annapurna I,8091,Annapurna Himalaya,28°35′44″N 83°49′13″E﻿


As soon as we select more than one column, the returned object is a DataFrame as supposed to a Series. 

# The index operator `[ ]` to select rows

We can also use the index operator with Python's slice notation. Recall the general syntax for the slice notation for an iterable object `a`:
```
a[start:stop:step]
```

When used on a DataFrame the slicing will be applied to the **rows** of the DataFrame. Here is an example

In [10]:
df[2:8]

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38,24.0
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26.0
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52.0
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28.0
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960,51,39.0
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49,45.0


This selects the rows starting at position `2` (inclusive) and up to position `8` (exclusive). 

We can also use the slicing notation with the index labels as follows

In [11]:
df['Lhotse':'Manaslu']

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26.0
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52.0
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28.0
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960,51,39.0
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49,45.0


Note when used with the index labels as supposed to the positions the slicing notation becomes inclusive of **both** the start and end value. 

We have seen how to select multiple rows or multiple columns individually, but what if we would like to simultaneously select a subset of the rows **and** a subset of the columns. This is what the next two operators help us achieve.

# The `iloc` operator to select rows and columns based on position

The `iloc` operator allows us to slice both rows and columns using their position. The general syntax is the follows:
```
df.iloc[rows, columns]
```

where **rows** gives the positions of the rows that we want to select and **columns** gives the positions of the columns we want to select. These positions can be specified in several ways:

* single position value, e.g. `3`
* a list of position values, e.g. `[3,5,8]`
* a slice of position values, e.g. `3:8`
* the `:` symbol to select *all* the rows and/or columns

Let's try some examples

In [12]:
df.iloc[:, 2:6]

Unnamed: 0_level_0,Coordinates,Parent mountain,First ascent,Ascents bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mount Everest / Sagarmatha / Chomolungma,27°59′17″N 86°55′31″E﻿,,1953,>>145
K2 / Qogir / Godwin Austen,35°52′53″N 76°30′48″E﻿,Mount Everest,1954,45
Kangchenjunga,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38
Lhotse,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26
Makalu,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45
Cho Oyu,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79
Dhaulagiri I,28°41′48″N 83°29′35″E﻿,K2,1960,51
Manaslu,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49
Nanga Parbat,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953,52
Annapurna I,28°35′44″N 83°49′13″E﻿,Cho Oyu,1950,36


Here we are selecting all of the rows, and we are using the slicing notation `2:6` for the columns which returns the columns starting at position `2` (inclusive) and up to position `6` exclusive. 

In [13]:
df.iloc[::2, 2:]

Unnamed: 0_level_0,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Mount Everest / Sagarmatha / Chomolungma,27°59′17″N 86°55′31″E﻿,,1953,>>145,121.0
Kangchenjunga,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38,24.0
Makalu,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52.0
Dhaulagiri I,28°41′48″N 83°29′35″E﻿,K2,1960,51,39.0
Nanga Parbat,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953,52,67.0


Here we are using the slicing notation `::2` for the rows which selecting every second row from the first to and up to the last. We are then using the slicing notation `:2` for the columns which selects every column from position `2` to the last column of the DataFrame. 

# The `loc` operator to select rows and columns by labels

The `loc` operator is similar to the `iloc` operator except that instead of referencing rows and columns using their position in the DataFrame we use the index labels and column names respectively. The general syntax is exactly the same

```
df.loc[rows, columns]
```

where **rows** can now take the form of 
* a single index label, e.g. 'Makalu'
* a list of index labels, e.g. ['Makalu', 'Nanga Parbat']
* a slice of index labels, e.g. 'Makalu':'Nanga Parbat'
* the `:` symbol to select *all* the rows

and **columns** can take the form of
* a single column name, e.g. 'Range'
* a list of column names, e.g. ['Range', 'Parent mountain']
* a slice of column names, e.g. 'Height (m)':'First ascent'
* the `:` symbol to select *all* the columns

**Remark**: the slicing notation when used with index labels or column names inclusive of **both** the `start` and `end`! This is in contrast to the usual slicing notation in Python used with integer values which is inclusive of the `start` but exclusive of the `end`.  

Let's try some examples

In [14]:
df.loc[:,'Height (m)':'First ascent']

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿,,1953
K2 / Qogir / Godwin Austen,8611,Baltoro Karakoram,35°52′53″N 76°30′48″E﻿,Mount Everest,1954
Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿,Mount Everest,1955
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956
Nanga Parbat,8126,Nanga Parbat Himalaya,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953
Annapurna I,8091,Annapurna Himalaya,28°35′44″N 83°49′13″E﻿,Cho Oyu,1950


Here we used the `:` symbol to select all the rows and we then used the slice notation `'Height (m)':'First ascent'` to select all columns from `'Height (m)'` (inclusive) up to `'First ascent'` (inclusive). In the slice notation we can also include a step size just as before. So if we wanted to select every other column from this previous selection we would write

In [15]:
df.loc[:,'Height (m)':'First ascent':2]

Unnamed: 0_level_0,Height (m),Coordinates,First ascent
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,27°59′17″N 86°55′31″E﻿,1953
K2 / Qogir / Godwin Austen,8611,35°52′53″N 76°30′48″E﻿,1954
Kangchenjunga,8586,27°42′12″N 88°08′51″E﻿,1955
Lhotse,8516,27°57′42″N 86°55′59″E﻿,1956
Makalu,8485,27°53′23″N 87°05′20″E﻿,1955
Cho Oyu,8188,28°05′39″N 86°39′39″E﻿,1954
Dhaulagiri I,8167,28°41′48″N 83°29′35″E﻿,1960
Manaslu,8163,28°33′00″N 84°33′35″E﻿,1956
Nanga Parbat,8126,35°14′14″N 74°35′21″E﻿,1953
Annapurna I,8091,28°35′44″N 83°49′13″E﻿,1950


# Boolean selection

The methods we have considered so far select rows and columns based on either their position in the DataFrame or their index label and column names respectively. We can also select rows and columns based on a boolean condition. Boolean conditions can be used with either the `[]` operator or the `.loc` operator. Below we consider the possible ways to do this.





## Boolean selection of rows using the `[ ]` operator

This applies when we want to select some rows based on a boolean condition on one or more of the columns. For example we might be interested in all the rows that correspond to mountains over 8000 meters. Then boolean condition for this is 

```
df['Height'] > 8000]
```

When executed by itself it will return a Series of `True` `False` values corresponding to each row of the DataFrame. When passed to the index operator `[]`, the rows corresponding to `True` values will be returned.

In [16]:
df[df['Height (m)'] > 8000]

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿,,1953,>>145,121.0
K2 / Qogir / Godwin Austen,8611,Baltoro Karakoram,35°52′53″N 76°30′48″E﻿,Mount Everest,1954,45,44.0
Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38,24.0
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26.0
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52.0
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28.0
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960,51,39.0
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49,45.0
Nanga Parbat,8126,Nanga Parbat Himalaya,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953,52,67.0
Annapurna I,8091,Annapurna Himalaya,28°35′44″N 83°49′13″E﻿,Cho Oyu,1950,36,47.0


We can also use boolean selection with multiple criteria. To do this we must use the logical operators to combine our conditions. Although in Python we can use the syntax `and`, `or`, and `not`, these will not work when testing multiple conditions with pandas. Instead, we must use the operators 

* `&` for and
* `|` for or
* `~` for not

In addition, we must use parentheses to separate the boolean conditions. Let's give this a try

In [17]:
df[(df['Height (m)'] > 8000) & (df['Range']=='Mahalangur Himalaya')]

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿,,1953,>>145,121.0
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26.0
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52.0
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28.0


## Boolean selection of rows and columns using the `.loc` operator

As mentioned before we can also use the `.loc` operator to perform boolean selection. For example if we wanted to replicate the results from our last command, we would just pass the combined boolean selection to the `loc` operator and also mention that we want all the columns to be returned. Here is the equivalent command

In [18]:
df.loc[(df['Height (m)'] > 8000) & (df['Range']=='Mahalangur Himalaya'), :]

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿,,1953,>>145,121.0
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26.0
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52.0
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28.0


So when used with the `:` symbol for column selection, the `.loc` operator is equivalent to the `[]` operator for boolean selection of rows. However the advantage of `.loc` is that we can change the `:` symbol to any other selection of the columns that we want. For example in our selection above suppose we now only want the first two columns to be returned. We can achieve this as follows 


In [19]:
df.loc[(df['Height (m)'] > 8000) & (df['Range']=='Mahalangur Himalaya'), 'Height (m)':'Range']

Unnamed: 0_level_0,Height (m),Range
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya
Lhotse,8516,Mahalangur Himalaya
Makalu,8485,Mahalangur Himalaya
Cho Oyu,8188,Mahalangur Himalaya


In fact, we can even include a boolean selection for the columns as well. Here is an example

In [20]:
col_criteria = [True, False, False, False, True, True, False]
df.loc[df['Height (m)'] > 8000, col_criteria] 

Unnamed: 0_level_0,Height (m),First ascent,Ascents bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,1953,>>145
K2 / Qogir / Godwin Austen,8611,1954,45
Kangchenjunga,8586,1955,38
Lhotse,8516,1956,26
Makalu,8485,1955,45
Cho Oyu,8188,1954,79
Dhaulagiri I,8167,1960,51
Manaslu,8163,1956,49
Nanga Parbat,8126,1953,52
Annapurna I,8091,1950,36
