# Filtering rows 1: Indexing with `[]`

By the end of this lecture you will be able to:
- select single rows with `[]` indexing
- select multiple rows with `[]` indexing


In [None]:
import polars as pl

In [None]:
csv_file = "../data/titanic.csv"

In [None]:
df = pl.read_csv(csv_file)
df.head()

## Selecting individual rows with `[]`

A Polars `DataFrame` doesn't have an explicit index as a Pandas `DataFrame` has. A Polars `DataFrame` does, however, have an implicit integer row number index. 

We select an individual row with the integer row number

In [None]:
df[0]

Note that if we select a single row as in this example the output is a one-row `DataFrame` - unlike Pandas where a one-row query selection becomes a `Series`

## Selecting multiple rows

### List

We can pass a list of integers to `[]`

In [None]:
df[[2,3]]

### Slice

We can use slice notation to select rows

In [None]:
df[:2]

### Range
We can use a range of integers 

In [None]:
df[range(2,4)]

### Numpy array

Polars can accept a Numpy array of row numbers in `[]`

In [None]:
import numpy as np
df[np.arange(0,3)]

## Data types not accepted in `[]`

### Boolean lists
We cannot pass a `list` of **Boolean values** in `[]`

In [None]:
# #Passing this list of Boolean values produces an exception
# df[
#     [True for _ in range(len(df))]
# ]

The Polars developers chose not to allow this functionality to discourage Pandas-style queries and encourage use of expressions as we see in the next lecture.

### Boolean `Series`
We cannot pass a Boolean `Series` to `[]` - but we see how to do this with `filter` in the next section.

In [None]:
# df[df["Age"]>30]

## Use case of indexing with `[]`

Square bracket has a limited use case in Polars. It is limited because indexing with `[]` cannot be used in lazy mode and so we lose the advantages of query optimisation and streaming large datasets. 

We see in the next section that the `filter` method is the primary way to filter rows in Polars.

There are good uses for `[]`, however.

One example if when we are inspecting data in interactive mode and want to see e.g. the first row or the last rows.

Square bracket indexing is also useful for extracting scalar values from a `DataFrame`.

In this example we extract the first row from the `Age` column

In [None]:
df[0,'Age']

# Exercises
In the exercises you will develop your understanding of
- selecting individual rows with `[]`
- selecting multiple rows with `[]`

## Exercise 1
Select the fifth row using `[]`

In [None]:
df = pl.read_csv(csv_file)
df<blank>

Select the first 5 rows using a `slice`

In [None]:
df = pl.read_csv(csv_file)
df<blank>

Select the second to fifth rows using a `range`

In [None]:
df = pl.read_csv(csv_file)
df<blank>

## Solutions

## Solution to Exercise 1
Select the fifth row using `[]`

In [None]:
df = pl.read_csv(csv_file)
df[4]

Select the first 5 rows using a `slice`

In [None]:
df = pl.read_csv(csv_file)
df[:5]

Select the second to fifth rows using a `range`

In [None]:
df = pl.read_csv(csv_file)
df[range(1,5)]