<a href="https://colab.research.google.com/github/gopal2812/mlblr/blob/master/PandasSlicing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# Import libraries
import pandas as pd
import numpy as np


- [Instructor] When working with large data sets, oftentimes you're only interested in a smaller subset of your data, and that's why slicing is so important. And in this video, I'll teach you how to select columns in Pandas, because oftentimes you're only interested in a smaller subset of columns in your data, and I'll also show you how to use slicing operations in Pandas. I'm working with a car loans data set where I have the data frame DF and I'm looking at the first five rows. Let's say you're only interested in looking at a few columns of your data set. So let's go over how to use brackets to select just a few columns. What the codier does is we're using double square brackets to only output one column of our data set. And as you see, I've only pulled out the car type column. Now we can also select multiple columns using double brackets, and let me show you how that's done. So right here, you notice that we have a list within these brackets. So I'm looking at the car type column and I'm looking at the principal paid column. So I run this and now I have the car type column and the principal paid column. And notice that when I use the in build type function, that this is still a Pandas data frame. One thing a lot of beginners often have difficulties with when working with Pandas, is if they just have single brackets, they'll end up with something that looks like this. This is called a panda series. And what this is is a one-dimensional array which can be labeled. In this case, our labels are zero or one, or two, or three, and our four. These are called indexes. And notice when I use the in-build type function, I have a panda series. Keep in mind that when you use panda series, you cannot select multiple columns. This will result in a key error, as you can see here. And this is a really common error that a lot of beginners run into and this usually results from people wanting to select multiple columns. And the simple solution to this is simply to use a pandas data frame. In other words, use double brackets. So I have my data frame. I'm selecting my car type column and I'm selecting my principal paid column. One reason why you might use a panda series as opposed to a data frame is that with a panda series, you can select rows using slicing where you have the series, the start index of what you want to select, the end index of what you want to select. And keep in mind the end index is not inclusive. And this behavior is very similar to Python lists. So I have a panda series here where I'm looking at the car type column and this is the entire car type column. Say I'm only interested in, let's say, index zero up until, but not including, index 10. In other words, from here to here. I can use a slicing operation. So over here, I have my car type column, and this is a panda series, and here's my slice. And I'm just selecting from index zero up until, but not including, index 10. So from zero to nine. Keep in mind you can also select columns using dot notation, however, this is not the recommended syntax as you'll see in this cell over here. This can result in an error as there's a space in this column name. Keep in mind that this also fails if your column name is the same as the pandas data frame's attributes or methods. So a safer syntax is just to use single brackets. And lastly, I want to show you the preferred syntax for selecting columns. And this is by using the .loc attribute, and this allows you to select columns, index, as well as slice your data. So over here, I'm selecting all the rows of my pandas data frame. I'm pacifically saying I just want the car type column, and I want the first five rows. Similarly, if you just want a panda series, you just take out the square brackets around your column name. So that's it. If, in the future, you're presented with a big data set and you want to look at a subset of it, consider slicing.

### Load Excel File

In [0]:
filename = 'data/car_financing.xlsx'
df = pd.read_excel(filename)

## Slicing
1. How to select columns in pandas 
2. How to use slicing operations in pandas

In [0]:
df.head()

### Select columns using brackets
With square brackets, you can select one or more columns.

In [0]:
# Select one column using double brackets
df[['car_type']].head()

In [0]:
# Select multiple columns using double brackets
df[['car_type', 'Principal Paid']].head()

In [0]:
# This is a Pandas DataFrame
type(df[['car_type']].head())

In [0]:
# Select one column using single brackets
# This produces a pandas series which is a one-dimensional array which can be labeled
df['car_type'].head()

In [0]:
# This is a pandas series
type(df['car_type'].head())

In [0]:
# Keep in mind that you can't select multiple colums using single brackets
# This will result in a KeyError
df['car_type', 'Principal Paid']

In [0]:
df[['car_type', 'Principal Paid']]

### Pandas Slicing

With a pandas series, we can select rows using slicing like this: series[start_index:end_index]

The end_index is not inclusive. This behavior is very similar to Python lists.

In [0]:
df['car_type']

In [0]:
df['car_type'][0:10]

In [0]:
# Select column using dot notation. 
# This is not recommended.
df.car_type.head()

In [0]:
"""
This won't work as there is a space in the column name. 
Dot notation also fails if your column has the same name 
of a DataFrame's attributes or methods.
"""
df.Principal Paid

In [0]:
df['Principal Paid']

### Selecting Columns using loc
The pandas attribute .loc allow you to select columns, index, and slice your data. 

In [0]:
# pandas dataframe
df.loc[:, ['car_type']].head()

In [0]:
# pandas series
df.loc[:, 'car_type'].head()