# Introduction to Pandas

In this tutorial you will be introduced to using the Pandas Data Frame to read and manipulate data. By the end of this tutorial you should be able to:

- Read in .csv data.
- Select columns.
- Locating elements based on a boolean condition.


## Pandas Documentation

Just like any other popular python library, pandas is widely used and well documented. This means there will be plenty of solutions to common bugs on Stack Overflow. If at any point you are unsure about syntax, google what you'd like to do and you're likely to find a solution. 

## Reading Data from .csv

Pandas makes reading data from .csv (comma separated values) extremely easy. If you haven't seen a .csv file before don't worry, it is simply a set of values separated by a comma. Open the bikes.csv file in the data-v1 folder from desktop and see what it looks like. 

Let's now open the bikes.csv file in pandas. 

In [None]:
# Importing the pandas library
import pandas as pd

# Read from csv
bike_df = pd.read_csv('../data-v1/bikes.csv')

bike_df

## Getting list of Columns

A list of columns can be retrieved from the data frame using the code below. 

In [None]:
bike_df.columns.values

## Selecting a Columns

The above data is organised in columns with each column showing the number of bikes at each location. If we wanted to retrieve a particular column we would use the syntax below:  

In [None]:
bike_df['Côte-Sainte-Catherine']

### Setting index

The above data is *indexed* by meaningless integers. It would be more convenient to organise it by date. To do this we can set the index in pandas to a particular column using the set_index function. 

In [None]:
# We do not usually have to reload data. Done here for demonstration. 
bike_df = pd.read_csv('../data-v1/bikes.csv')

# Setting index
bike_df = bike_df.set_index('Date')

#Selecting column
bike_df['Côte-Sainte-Catherine']

## Getting Index Values

One can retrieve a list the index values for the data frame using the code below. The list can be stored in an array to be used later. 

In [None]:
# Getting index values
indices = bike_df.index.values

# Looping through the list of values and printing each element
for index in indices:
    print(index)

## Locating from Index

To locate an element from the data frame using the index of the element we can use the .loc function. 

```
    df.loc['index']
    
```

An example is shown below where we have first randomly selected an index from the list of indices retrieved earlier and then used this to retrieve the corresponding row. 

In [None]:
import random

random_index = random.choice(indices)
print("The randomly selected index is", random_index)
bike_df.loc[random_index]

**Exercise: using the data frame indexed by date print every 20th element. Hint: the list _indices_ is indexed as 0,1,2,3... You could loop through this.**



## Locating using boolean operators

We can also locate elements based on boolean operations. This means, we can select elements if they meet a certain condition. This is again, best illustrated with an example.

In the code below we will use our data frame to create a subset of the data that only contains elements where there were more than  200 bikes in 'du parc'. 

In [None]:
print("The original data frame has shape", bike_df.shape)
sub_df = bike_df.loc[bike_df['du Parc']>200]
print("The reduced data frame has shape", sub_df.shape)

By cutting out all elements in the data frame that have less than 200 bikes in 'du Parc' we have 258 values in the data frame instead of the original 310. 