<!-- INTRODUCTION -->
# An introduction to pandas

In this introduction we'll explore the pandas library and some common use cases for working with data.

<!-- TITLE -->
# Importing the data

<!-- TASK 1.1-->
## Importing the appropriate libraries

In this example we'll load four commonly used libraries in data science. I'll give a brief overview of each to get you familiar with them. We'll be using these throughout the course and so I'll go into greater detail as we go along.

In [3]:
## STARTER CODE
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston

# makes the dataframes easier to view. Just formatting.
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [None]:
## SOLUTION CODE
import pandas as pd # pandas is built on top of numpy with better ease of use and functionality
import numpy as np # numpy is a very low-level tool great for optimized data manipulation
import matplotlib.pyplot as plt # a very low-level plotting library for python

# sklearn is a machine learning library.
# From sklearn we are importing the boston housing data dataset.
from sklearn.datasets import load_boston

# makes the dataframes easier to view. Just formatting.
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

<!-- HINT for 1.1-->
Just run the cell

In [None]:
## TEST for 1.1
# this is where the test script goes for task 1

<!-- TASK for 1.2-->
## Load the boston dataset

Call the `load_boston()` function that we just imported from the sklearn library and assign the result to a variable named `boston`, and print it. (`Ctrl + Enter` will run the cell for you & `Shift + Enter` will run the cell for you and move to the next cell). As you can see the dataset looks a little messy. The boston dataset we imported from sklearn is structured as a dictionary with 4 different keys ('data', 'DESCR', 'feature_names' and 'target'). Note that the keys are case sensitive! We can access the data within each of these keys by printing `boston.data` or `boston.DESCR` to view the contents of those keys individually. Note that tab completion is available when using jupyter notebooks so if you type `boston.` then use `TAB`, this will show you all the available options. Print each of those contents individually to get a feel for the structure of the dataset.

In [24]:
## STARTER CODE
boston = 
print

In [122]:
## SOLUTION CODE
boston = load_boston()
boston.data
boston.DESCR
boston.feature_names
boston.target

array([[  6.32000000e-03,   1.80000000e+01,   2.31000000e+00, ...,
          1.53000000e+01,   3.96900000e+02,   4.98000000e+00],
       [  2.73100000e-02,   0.00000000e+00,   7.07000000e+00, ...,
          1.78000000e+01,   3.96900000e+02,   9.14000000e+00],
       [  2.72900000e-02,   0.00000000e+00,   7.07000000e+00, ...,
          1.78000000e+01,   3.92830000e+02,   4.03000000e+00],
       ..., 
       [  6.07600000e-02,   0.00000000e+00,   1.19300000e+01, ...,
          2.10000000e+01,   3.96900000e+02,   5.64000000e+00],
       [  1.09590000e-01,   0.00000000e+00,   1.19300000e+01, ...,
          2.10000000e+01,   3.93450000e+02,   6.48000000e+00],
       [  4.74100000e-02,   0.00000000e+00,   1.19300000e+01, ...,
          2.10000000e+01,   3.96900000e+02,   7.88000000e+00]])



array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
       'TAX', 'PTRATIO', 'B', 'LSTAT'], 
      dtype='|S7')

array([ 24. ,  21.6,  34.7,  33.4,  36.2,  28.7,  22.9,  27.1,  16.5,
        18.9,  15. ,  18.9,  21.7,  20.4,  18.2,  19.9,  23.1,  17.5,
        20.2,  18.2,  13.6,  19.6,  15.2,  14.5,  15.6,  13.9,  16.6,
        14.8,  18.4,  21. ,  12.7,  14.5,  13.2,  13.1,  13.5,  18.9,
        20. ,  21. ,  24.7,  30.8,  34.9,  26.6,  25.3,  24.7,  21.2,
        19.3,  20. ,  16.6,  14.4,  19.4,  19.7,  20.5,  25. ,  23.4,
        18.9,  35.4,  24.7,  31.6,  23.3,  19.6,  18.7,  16. ,  22.2,
        25. ,  33. ,  23.5,  19.4,  22. ,  17.4,  20.9,  24.2,  21.7,
        22.8,  23.4,  24.1,  21.4,  20. ,  20.8,  21.2,  20.3,  28. ,
        23.9,  24.8,  22.9,  23.9,  26.6,  22.5,  22.2,  23.6,  28.7,
        22.6,  22. ,  22.9,  25. ,  20.6,  28.4,  21.4,  38.7,  43.8,
        33.2,  27.5,  26.5,  18.6,  19.3,  20.1,  19.5,  19.5,  20.4,
        19.8,  19.4,  21.7,  22.8,  18.8,  18.7,  18.5,  18.3,  21.2,
        19.2,  20.4,  19.3,  22. ,  20.3,  20.5,  17.3,  18.8,  21.4,
        15.7,  16.2,

<!-- HINT for 1.2-->
Print boston.data, boston.DESCR, boston.target and boston.feature_names. Also set boston = load_boston().

In [None]:
## TEST for 1.2
# This is where the test script goes for task 2

<!-- TASK for 1.3-->
## Convert the boston dataset to a pandas dataframe

We now have all the data we need to pull together a dataframe. A pandas dataframe is basically a supercharged Excel workbook. It has the familiar layout as an Excel workbook as a 2-d tabular structure with columns and rows. However, it's optimized for performing large calculations at speed. So as your datasets get larger you're able to keep up!

In the below example we are going to create a pandas dataframe and assign it to the variable `boston_df`. Note the capitalization on `pd.DataFrame()` as this is necessary when calling the function. Create the dataframe by setting the data parameter equal to one of `boston.data`, `boston.DESCR`, `boston.target` or `boston.feature_names`. Then do the same with the columns parameter. Then call the `.head()` and `.tail()` methods and the `.shape` attribute on your newly created dataframe to view the first 5 rows, the last 5 rows and the number of rows and columns respectively.

In [27]:
## STARTER CODE
boston_df = pd.DataFrame(data=, columns=)
type(boston_df)

In [116]:
## SOLUTION CODE
boston_df = pd.DataFrame(data=boston.data, columns=boston.feature_names)
type(boston_df)
boston_df.head()
boston_df.tail()
boston_df.shape

pandas.core.frame.DataFrame

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
501,0.06263,0.0,11.93,0.0,0.573,6.593,69.1,2.4786,1.0,273.0,21.0,391.99,9.67
502,0.04527,0.0,11.93,0.0,0.573,6.12,76.7,2.2875,1.0,273.0,21.0,396.9,9.08
503,0.06076,0.0,11.93,0.0,0.573,6.976,91.0,2.1675,1.0,273.0,21.0,396.9,5.64
504,0.10959,0.0,11.93,0.0,0.573,6.794,89.3,2.3889,1.0,273.0,21.0,393.45,6.48
505,0.04741,0.0,11.93,0.0,0.573,6.03,80.8,2.505,1.0,273.0,21.0,396.9,7.88


(506, 13)

<!-- HINT for 1.3-->
Call the `.head()` and `.tail()` methods on the boston_df. Shape is an attribute so you don't need parenthesis.

In [None]:
## TEST for 1.3
# This is where the test script goes for task 2

<!-- TASK for 1.4-->
## Display the index, columns and underlying numpy array data

Explore various attributes of the dataframe we've created. In the code below call the following attributes: index, columns, and values. Then call the `.describe()` and `.info()` methods to show a quick summaries of the data. Describe provides a statistical summary and info provides the index, datatype and memory information.

In [155]:
## STARTER CODE
# call the index attribute
# call the columns attribute
# call the values attribute
# call the info attribute
# call the .describe() method

In [157]:
## SOLUTION CODE
boston_df.index  # call the index attribute
boston_df.columns  # call the columns attribute
boston_df.values  # call the values attribute
boston_df.info()  # call the .info() method
boston_df.describe()  # call the .describe() method

RangeIndex(start=0, stop=506, step=1)

Index([u'CRIM', u'ZN', u'INDUS', u'CHAS', u'NOX', u'RM', u'AGE', u'DIS',
       u'RAD', u'TAX', u'PTRATIO', u'B', u'LSTAT'],
      dtype='object')

array([[  6.32000000e-03,   1.80000000e+01,   2.31000000e+00, ...,
          1.53000000e+01,   3.96900000e+02,   4.98000000e+00],
       [  2.73100000e-02,   0.00000000e+00,   7.07000000e+00, ...,
          1.78000000e+01,   3.96900000e+02,   9.14000000e+00],
       [  2.72900000e-02,   0.00000000e+00,   7.07000000e+00, ...,
          1.78000000e+01,   3.92830000e+02,   4.03000000e+00],
       ..., 
       [  6.07600000e-02,   0.00000000e+00,   1.19300000e+01, ...,
          2.10000000e+01,   3.96900000e+02,   5.64000000e+00],
       [  1.09590000e-01,   0.00000000e+00,   1.19300000e+01, ...,
          2.10000000e+01,   3.93450000e+02,   6.48000000e+00],
       [  4.74100000e-02,   0.00000000e+00,   1.19300000e+01, ...,
          2.10000000e+01,   3.96900000e+02,   7.88000000e+00]])

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 13 columns):
CRIM       506 non-null float64
ZN         506 non-null float64
INDUS      506 non-null float64
CHAS       506 non-null float64
NOX        506 non-null float64
RM         506 non-null float64
AGE        506 non-null float64
DIS        506 non-null float64
RAD        506 non-null float64
TAX        506 non-null float64
PTRATIO    506 non-null float64
B          506 non-null float64
LSTAT      506 non-null float64
dtypes: float64(13)
memory usage: 51.5 KB


Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.593761,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063
std,8.596783,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36
75%,3.647423,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97


<!-- HINT for 1.4-->
Use the keywords in the description as methods or attributes.

In [None]:
## TEST for 1.4
# This is where the test script goes for task 2

<!-- TASK for 1.5-->
## Explore Transpose, sorting by index and sorting by a column value

In the code below call the `.sort_values()` and `.sort_index()` methods. In both cases you will need to pass through certain parameters to indicate the values you want to sort by and whether you want to sort by ascending or descending. We will also use the `.T` attribute to transpose the dataframe. Once you've completed these three tasks try limiting the results returned by each by calling the .head() method at the end of the sort method called. You can also specify the number of values to take from the beginning by passing an integer in the head method. So if you wanted 8 values at the beginning of the dataframe you would pass `.head(8)`. Also see what happens if you change keyword argument `axis` from 0 to 1 and `ascending` from `True` to `False`. Can you figure out what is going on?

In [42]:
## STARTER CODE
# call the sort_index() method and pass axis=0 and ascending=True
# call the sort_values() method and pass a column name in quotes to the by= parameter
# call the T attribute to transpose the dataframe

In [69]:
## SOLUTION CODE
boston_df.sort_index(axis=0, ascending=False).head(4) # call the sort_index() method and pass axis=0 and ascending=True
boston_df.sort_values(by='AGE', ascending=False).head(8) # call the sort_values() method and pass a column name in quotes to the by= parameter
boston_df.T.head(6) # call the T attribute to transpose the dataframe

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
505,0.04741,0.0,11.93,0.0,0.573,6.03,80.8,2.505,1.0,273.0,21.0,396.9,7.88
504,0.10959,0.0,11.93,0.0,0.573,6.794,89.3,2.3889,1.0,273.0,21.0,393.45,6.48
503,0.06076,0.0,11.93,0.0,0.573,6.976,91.0,2.1675,1.0,273.0,21.0,396.9,5.64
502,0.04527,0.0,11.93,0.0,0.573,6.12,76.7,2.2875,1.0,273.0,21.0,396.9,9.08


Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
158,1.34284,0.0,19.58,0.0,0.605,6.066,100.0,1.7573,5.0,403.0,14.7,353.89,6.43
159,1.42502,0.0,19.58,0.0,0.871,6.51,100.0,1.7659,5.0,403.0,14.7,364.31,7.39
409,14.4383,0.0,18.1,0.0,0.597,6.852,100.0,1.4655,24.0,666.0,20.2,179.36,19.78
410,51.1358,0.0,18.1,0.0,0.597,5.757,100.0,1.413,24.0,666.0,20.2,2.6,10.11
411,14.0507,0.0,18.1,0.0,0.597,6.657,100.0,1.5275,24.0,666.0,20.2,35.05,21.22
412,18.811,0.0,18.1,0.0,0.597,4.628,100.0,1.5539,24.0,666.0,20.2,28.79,34.37
413,28.6558,0.0,18.1,0.0,0.597,5.155,100.0,1.5894,24.0,666.0,20.2,210.97,20.08
414,45.7461,0.0,18.1,0.0,0.693,4.519,100.0,1.6582,24.0,666.0,20.2,88.27,36.98


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,496,497,498,499,500,501,502,503,504,505
CRIM,0.00632,0.02731,0.02729,0.03237,0.06905,0.02985,0.08829,0.14455,0.21124,0.17004,...,0.2896,0.26838,0.23912,0.17783,0.22438,0.06263,0.04527,0.06076,0.10959,0.04741
ZN,18.0,0.0,0.0,0.0,0.0,0.0,12.5,12.5,12.5,12.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
INDUS,2.31,7.07,7.07,2.18,2.18,2.18,7.87,7.87,7.87,7.87,...,9.69,9.69,9.69,9.69,9.69,11.93,11.93,11.93,11.93,11.93
CHAS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
NOX,0.538,0.469,0.469,0.458,0.458,0.458,0.524,0.524,0.524,0.524,...,0.585,0.585,0.585,0.585,0.585,0.573,0.573,0.573,0.573,0.573
RM,6.575,6.421,7.185,6.998,7.147,6.43,6.012,6.172,5.631,6.004,...,5.39,5.794,6.019,5.569,6.027,6.593,6.12,6.976,6.794,6.03


<!-- HINT for 1.5-->
The order for calling methods and passing parameters would look like this: `boston_df.sort_method(define parameters).head()` with the desired number of rows inside the `.head()` function.

In [None]:
## TEST for 1.2
# This is where the test script goes for task 2

<!-- CONCLUSION for 1-->
Great job completing these tasks!

<!-- TITLE -->
# Slicing & Selecting

<!-- INTRODUCTION For 2-->
## Selecting data by columns or rows

One of the most important things you will learn and where you'll spend a lot of time getting good is learning how to properly slice and select your data to give you the information you want. At first we will go through the basics in these topics and add some additional difficulty.

<!-- TASK 2.1-->
In pandas dataframes we can select a single column by passing the column name as a parameter to the dataframe. Running `boston_df['CRIM']` will select only the CRIM column as a pandas Series. If we want to select specific rows by index we can also pass them through as a argument to the dataframe. Running `boston_df[0:9]` will select the first 10 rows from index position 0-9. Alternatively, we do not need to specify the 0, so `boston_df[:10]` does the same thing. Finally, utilizing a colon on its own will select all rows in the dataframe.

In [94]:
## STARTER CODE
boston_df[] # select the single column 'TAX'
boston_df[] # select the first 5 rows
boston_df[] # select all rows

In [95]:
## SOLUTION CODE
boston_df['TAX'] # select the single column 'TAX'
boston_df[0:5] # select the first 5 rows
boston_df[:] # select all rows

0      296.0
1      242.0
2      242.0
3      222.0
4      222.0
5      222.0
6      311.0
7      311.0
8      311.0
9      311.0
10     311.0
11     311.0
12     311.0
13     307.0
14     307.0
15     307.0
16     307.0
17     307.0
18     307.0
19     307.0
20     307.0
21     307.0
22     307.0
23     307.0
24     307.0
25     307.0
26     307.0
27     307.0
28     307.0
29     307.0
       ...  
476    666.0
477    666.0
478    666.0
479    666.0
480    666.0
481    666.0
482    666.0
483    666.0
484    666.0
485    666.0
486    666.0
487    666.0
488    711.0
489    711.0
490    711.0
491    711.0
492    711.0
493    391.0
494    391.0
495    391.0
496    391.0
497    391.0
498    391.0
499    391.0
500    391.0
501    273.0
502    273.0
503    273.0
504    273.0
505    273.0
Name: TAX, dtype: float64

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.0900,1.0,296.0,15.3,396.90,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.90,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.90,5.33
5,0.02985,0.0,2.18,0.0,0.458,6.430,58.7,6.0622,3.0,222.0,18.7,394.12,5.21
6,0.08829,12.5,7.87,0.0,0.524,6.012,66.6,5.5605,5.0,311.0,15.2,395.60,12.43
7,0.14455,12.5,7.87,0.0,0.524,6.172,96.1,5.9505,5.0,311.0,15.2,396.90,19.15
8,0.21124,12.5,7.87,0.0,0.524,5.631,100.0,6.0821,5.0,311.0,15.2,386.63,29.93
9,0.17004,12.5,7.87,0.0,0.524,6.004,85.9,6.5921,5.0,311.0,15.2,386.71,17.10


<!-- HINT for 2.1-->
Remember to pass the column name in as a string and be aware of case. When selecting rows the colon acts as the intermediate to specify a range of numbers.

<!-- TASK 2.2 -->
## Selection by label

In the below code we'll work on getting a selection of the dataframe by a given label or set of labels. We will use `.loc[]` which functions as a label-location based indexer for selection by label. When using `.loc[]` the first argument passed indicates information about the rows you want to select, and the second argument (separated) by a comma indicates the information about the columns you want to select. So `boston_df.loc[10:20, ['AGE', 'TAX']]` would select rows 10 to 20 inclusive of both 10 and 20 (11 total rows) and the columns 'AGE' and 'TAX'.

In [None]:
## STARTER CODE for 2.2
boston_df.loc[] # select the first 6 rows and columns AGE and LSTAT
boston_df.loc[] # select rows 5-10 and columns ZN and RM
boston_df.loc[] # select all rows and all columns FROM AGE to LSTAT

In [111]:
## SOLUTION CODE for 2.2
boston_df.loc[:5, ['AGE', 'LSTAT']] # select the first 6 rows and columns AGE and LSTAT
boston_df.loc[5:10, ['ZN', 'RM']] # select rows 5-10 and columns ZN and RM
boston_df.loc[:, 'AGE':'LSTAT'] # select all rows and all columns FROM AGE to LSTAT

Unnamed: 0,AGE,LSTAT
0,65.2,4.98
1,78.9,9.14
2,61.1,4.03
3,45.8,2.94
4,54.2,5.33
5,58.7,5.21


Unnamed: 0,ZN,RM
5,0.0,6.43
6,12.5,6.012
7,12.5,6.172
8,12.5,5.631
9,12.5,6.004
10,12.5,6.377


Unnamed: 0,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,65.2,4.0900,1.0,296.0,15.3,396.90,4.98
1,78.9,4.9671,2.0,242.0,17.8,396.90,9.14
2,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,54.2,6.0622,3.0,222.0,18.7,396.90,5.33
5,58.7,6.0622,3.0,222.0,18.7,394.12,5.21
6,66.6,5.5605,5.0,311.0,15.2,395.60,12.43
7,96.1,5.9505,5.0,311.0,15.2,396.90,19.15
8,100.0,6.0821,5.0,311.0,15.2,386.63,29.93
9,85.9,6.5921,5.0,311.0,15.2,386.71,17.10


<!-- HINT for 2.2 -->
Remember that you specify the rows you want to access first followed by columns. Also since we're using `.loc` the values you pass through are the values. So for the columns you need to pass through the string corresponding to that column.

<!-- TASK 2.3 -->
## Selection of values based on index location

What if we don't know the names of the index positions or columns? We can use `.iloc[]` to select what rows and columns we want purely by an integer index value. In this instance we can't pass column names through we will need to refer to them by their index position. So if we wanted to pull the first five rows for the 'CRIM', 'ZN' and 'B' columns our code would look like this `boston_df.iloc[:5, [0, 1, 12]]`. Where 0, 1 and 12 represent the columns 'CRIM', 'ZN' and 'B' index locations respectively. Note that unlike `.loc[]`, using a range with `.iloc[]` is not inclusive on the upper end of the range. So a range of 0:5 returns 0, 1, 2, 3, 4 as values.

In [None]:
## STARTER CODE for 2.3
boston_df.iloc[] # select the first 8 rows and columns from index position 5 and on.
boston_df.iloc[] # select rows 100 to 109 and columns from index position 6 to 9
boston_df.iloc[] # select the first 5 rows and columns 'CRIM', 'ZN' and 'RM'

In [124]:
## SOLUTION CODE for 2.3
boston_df.iloc[:8, 5:] # 
boston_df.iloc[100:110, 6:9] # 
boston_df.iloc[:5, [0,1,5]]

Unnamed: 0,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33
5,6.43,58.7,6.0622,3.0,222.0,18.7,394.12,5.21
6,6.012,66.6,5.5605,5.0,311.0,15.2,395.6,12.43
7,6.172,96.1,5.9505,5.0,311.0,15.2,396.9,19.15


Unnamed: 0,AGE,DIS,RAD
100,79.9,2.7778,5.0
101,71.3,2.8561,5.0
102,85.4,2.7147,5.0
103,87.4,2.7147,5.0
104,90.0,2.421,5.0
105,96.7,2.1069,5.0
106,91.9,2.211,5.0
107,85.2,2.1224,5.0
108,97.1,2.4329,5.0
109,91.2,2.5451,5.0


Unnamed: 0,CRIM,ZN,RM
0,0.00632,18.0,6.575
1,0.02731,0.0,6.421
2,0.02729,0.0,7.185
3,0.03237,0.0,6.998
4,0.06905,0.0,7.147


<!-- HINT for 2.3 -->
Remember that with `.iloc[]` we can only refer to integer index locations. Also remember that with `.loc[]` and `.iloc[]` we define the rows we want to select first, then columns.

<!-- CONCLUSION for 2-->
Great job slicing and dicing the dataframe.