<a href="https://www.kaggle.com/edwardakalarrywelch/very-basic-guide-to-pandas-loc-and-iloc?scriptVersionId=87571038" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

## ----- Very Basic Guide to Using loc and iloc in Pandas ----- ##

I wrote this notebook as a reference for myself, but decided to publish it in case anyone else needs another look at loc and iloc as much as I needed it at one time.  I hope you benefit from it. 

Be sure to run each cell individually as you work through.  Also, be sure to let me know if I've messed up something.  Seriously, I'm learning just like everyone else.  I make my share of mistakes just like everyone else.  Some topics I'm really good with, some topics still challenge me to the point of embarassment. Still, I'm humble, ready to learn more, ready to help and I enjoy the Kaggle community.  

In [None]:
# run cell
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
df_housing = pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/train.csv')

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

To begin, it's best to think of loc as the Pandas command that uses labels to gets rows and/or column values from a dataframe (df).

Similarly, it's best to think of iloc as the Pandas command that uses integer locations to get rows and or column values from a dataframe (df).

In [None]:
# run this cell to create the a sample dataframe
data = {
    'F' : ['a', 'b', 'c'],
    'G' : ['d', 'e', 'f'],
    'H' : ['g', 'h', 'i']
}

df = pd.DataFrame.from_dict(data, orient='index')
df.columns = ['C','D','E']
df.head()

## ----- Index Labels vs Index Integer Positions ----- ##
First, you need to understand the difference between index *LABELS* and index *INTEGER POSITIONS*:

Below is a Pandas dataframe with three rows labeled (F,G,H) and three columns labeled (C,D,E).  In this dataframe, you can think of 'F,G,H' as a row index and 'C', 'D', 'E' as a column index.  Think of each item in the dataframe ('a','b','c','d','e','f','g','h','i') sort of like a cell position containing that data in a spreadsheet.

Labels:

C is the label for the first column, D is the label for the second column and E is the lable for the third column.  F is the label for the first row, G is the label for the second row and H is the label for the third row.

To begin, it's best to think of loc as the Pandas command that uses labels to gets rows and/or column values from a dataframe (df).

Similarly, it's best to think of iloc as the Pandas command that uses integer locations to get rows and or column values from a dataframe (df).

In [None]:
# run cell
df.head()

## ----- Very Basic Example of loc ----- ##
When using 'loc' with Pandas Dataframes, it's helpful to think of the sytax in this manner: 

loc [ row label, column label ]  

The left side of the comma represents rows, and the right side represents columns.

The following code references the row label 'F' and the column label 'C'.  The output is where the row and column labels intersect, the cell containing the letter 'a'.

In [None]:
# run cell
df.head()

In [None]:
# Run this cell

# When using 'loc', it's helpful to think of the sytax in this manner:  loc[row labels, column labels].   
# The left side of the comma represents rows, and the right side represents columns.

# Notice the output of this code is 'a'.

df.loc['F','C']

# We referenced row label 'F' for the row and  column label 'C' for the column and the output is 'a'

Think of 'a', above, as the data located at location (loc) where row labeled 'F' intersects with column label 'C', and the command df.loc['F','C'] in the dataframe, df.

Now, consider a slightly different dataframe, below.  Notice this dataframe's rows are indexed with integers.

In [None]:
# Run this cell:

data = {
    3 : ['a', 'b', 'c'],
    4 : ['d', 'e', 'f'],
    5 : ['g', 'h', 'i']
}

df = pd.DataFrame.from_dict(data, orient='index')
df.columns = ['C','D','E']
df.head()

Notice in the example code below, we don't use quotes around the 3, even though it's considered a label, it's still treated as an integer.  Also, notice loc treats it as a label position, NOT the row number 3.  Technically, the 3 label is the row at zero or position 0.

In [None]:
# Here, we use loc to locate the value at row label 3 and column label C.  Notice we don't put quotes around the 3.
#  Also, notice this returns the value, 'a', the value in the first row and first column.

df.loc[3,'C']

You see the 'a' above?  The 'a' is located at the location or loc where the row labeled 3, intersects with the column labeled 'C'.


When accessing rows labeled as integers, you won't need to use quotes around them, unless they happen to be strings. This was terribly confusing to me when first learning it.

## ----- Very Basic Example of iloc ----- ##
When using iloc, it's helpful to think of the syntax in this manner: iloc[ row integer position, column integer position ]

In fact, the i in iloc stand for integer.

Below, we'll access the same values we accessed with loc above, but we'll use iloc this time:


In [None]:
# head of the same dataframe we've been using
df.head()

In [None]:
# When using iloc, it's helpful to think of the syntax in this manner:  iloc[row integer position, column integer position]
# Here, we'll access the same values we accessed with loc above, but we'll use iloc this time:

df.iloc[0,0]

# 'a' is in the spot where the integer location for the row is 0 and the integer locatgion for the column is 0.

Think about what's happening with the output of 'a' above.  The row position in df.iloc[0,0] is the integer 0, and the column position is the integer 0.  In this situation, they ARE NOT labels, they are positions, based on the integer location where row position 0 intersects with column position 0.

Check out another example below:

In [None]:
# Here's another example, what value to you think iloc will return this time?

df.iloc[2,2]

iloc returns 'i' here because 'i' is in the position of the dataframe, df, where row integer position 2 and column integer position 2 intersects.  

In case you're still a little confused, take a look at the dataframe below.  It's the same as the dataframe we've been using, except, I've listed the integer positions to the left of the rows and above the columns.

In [None]:
# run cell
print("      0  1  2")
print("             ")
print("             ")
print("0     a  b  c")
print("1     d  e  f")
print("2     g  h  i")

Above, we have the same dataframe we've been using. But, notice the row and column labels have been removed and replace with the integer positions.  These integer positions are what iloc uses to find values in the dataframe.

## ----- Accessing Slices of Dataframes Using loc and iloc ----- ##

Depending on your point of view, slicing with loc and iloc are not exactly 'Very Basic', but they're here if you want to take a look.

Recall slicing in Python (example list[5:7]).  You can slice with loc and iloc.  Remember, loc is label based, and
iloc is integer position based.  

For slicing, we'll use some of the housing data from the housing competion.  First, take a look at the row and column labels.  The row labels are integers and the column labels are strings.  

In [None]:
df_housing.head()

## ----- loc ----- ##

Below, we'll access a slice of 2 values from row labels 2,3 and column 'LotFrontage'.  Remember, slicing requires the use of the ':' (colon).  Also, remember to keep the use the basics of loc and iloc you learned above.


In [None]:
# run this code
# Slicing with loc.

# Here, we'll access a slice of 2 values from row labels 2,3 and column 'LotFrontage'.

df_housing.loc[2:3,'LotFrontage']

# The 2:3 gives us the column range of 2 to 3 from column 'LotFrontage'.  Remember, we're using loc, 
# so the returned rows are actually 2 and 3.

## ----- iloc ------ ##

Slicing with iloc works much the same as slicing with loc.  In the code below, we'll take a slice of the information in the dataframe - df_housing - where row integer positions 2 through 4 interesect with column integer position 3.  Notice the data returned.  The row slice (2:4) returned only rows 2 and 3.  Slicing with iloc is sort of like slicing with Python.  The index starts and 0, so you need to add the extra number to include it in the slice.

In [None]:
# To slice the exact same information from the dataframe using iloc, use:

df_housing.iloc[2:4,3]

# NOTE:  Two important things here.  To slice rows 2 & 3, we needed to specify 2:4 and to access the 'LotFrontage' 
# column, we needed to list column integer position 3.  Remember, the indexes start at 0 and have to be extended by 1.

In [None]:
# run this cell to include row 4
df_housing.iloc[2:5,3]

Just remember, to slice rows 2,3 and 4 we needed to specify 2:5 and to access the 'LotFrontage' 
column, we needed to list column integer position 3.  Again, the indexes start at 0 and have to be extended by 1.

## ----- Select an Entire Single Row with iloc or loc ----- ##

In [None]:
df_housing.head()

## ----- iloc ----- ##
The code that follows shows how to select a row(s) using iloc - returning a dataframe and also how to select a row and return a series.  Additionally, you'll find many other examples of using iloc you can play with.  All you need to do is comment out a line using the '#' in front of the line of code and 'uncomment' the code you desire to run by removing the '#' in front of the code.  

Play around with these to see what they will do.

In [None]:
# Here, we'll select the entire third row of the df_housing dataframe using iloc. We'll also explore some other 
# iloc commands you might try.

df_housing.iloc[[3]] # selects row 3 (0,1,2,3) of the dataframe and returns it as a dataframe
# df_housing.iloc[3,] # selects row 3, returns it as a series
# df_housing.iloc[3] # selects row 3, returns it as a series
# df_housing.iloc[:] # selects every row and column in the df and returns them in a df
# df_housing.iloc[:, :] # selects every row and column in the df and returns them in a df
# df_housing.iloc[3:5] # returns a dataframe of rows 3 and 4 from df_housing
# df_housing.iloc[3:5, ] # returns a dataframe of rows 3 and 4 from df_housing
# df_housing.iloc[,3:5] # error - invalid syntax
# df_housing.iloc[,[3:5] # error - invalid syntax
# df_housing.iloc[[3:5]] # error - invalid syntax
# df_housing.iloc[3,5] # returns the scalar at the interection of row 3 and column 5 in df_housing.
# df_housing.iloc[[3,4]] # returns dataframe of rows 3 & 4 from the df_housing df.
# df_housing.iloc[:, 0:2] # returns a dataframe of all rows in columns 0,1 of df_housing


In [None]:
df_housing.head()

## ----- loc ------ ##
The next code will show you how to select rows & columns using loc.  

Additionally, you'll find many other examples of using loc you can play with. All you need to do is comment out a line using the '#' in front of the line of code and 'uncomment' the code you desire to run by removing the '#' in front of the code.

Play around with these loc commands to see what they will do.

In [None]:
df_housing.loc[[3]] # selects the row labeled 3 and all of the dataframe and returns it as a dataframe
# df_housing.loc[3,] # selected the entire third row, returns it as a series
# df_housing.loc[3] # selects the entire third row, returns it as a series
# df_housing.loc[:] # selects every row and column in the df and returns them in a df
# df_housing.loc[:, :] # selects every row and column in the df and returns them in a df
# df_housing.loc[3:5] # returns a dataframe of rows 3,4 and 5 from df_housing - iloc only returns rows 3&4
# df_housing.loc[3:5, ] # returns a dataframe of rows 3,5 and 5 from df_housing - iloc only returns rows 3&4
# df_housing.loc[,3:5] # error - invalid syntax
# df_housing.loc[,[3:5] # error - invalid syntax
# df_housing.loc[[3:5]] # error - invalid syntax
# df_housing.loc[3,5] # returns a key error 5
# df_housing.loc[[3,4]] # returns dataframe of rows 3 & 4 from the df_housing df.
# df_housing.loc[:, 'MSSubClass': 'LotFrontage'] # returns all rows from columns 'MSSubClass' through 'LotFrontage
# df_housing.loc[3,'MSSubClass':'LotFrontage'] # returns values where row 3 intersects with 'MSSubClass' - 'LotFrontage

I hope this notebook has been helpful to you.  If you have any questions, comments, concerns or ideas to make it better, I'm always will to learn from you.   
  
Thanks,  Ed  

## ----- Credits, Sources, Etc. ----- ##
I would like to take this opportunity to say 'thanks' to the coding and data scionce content creators/contributors accross the internet.  Without y'all asking questions, answering questions, sharing your wisdom and generally always being there when I need to learn something new or just a simple answer - I would not be enjoying learning from this amazing Kaggle community today.

https://stackoverflow.com/questions/31593201/how-are-iloc-and-loc-different  
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html  
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html  

And, I've learned so much here on Kaggle, I'll simply link to the discussion section.  
https://www.kaggle.com/discussion  

Thanks again,  

Ed  