<a href="https://colab.research.google.com/github/krauseannelize/nb-py-ms-exercises/blob/sprint03/notebooks/s03_pandas_foundation/32_accessing_filtering_dataframe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 32 | Accessing and Filtering DataFrame

## Selecting data in a DataFrame

- Data selection allows you to extract specific rows, columns, or subsets of data for analysis.
- It is a fundamental skill for data manipulation and exploration.
- We can access both columns and rows using various ways:

  - Accessing columns by name.
  - Slicing rows and columns using `loc[ ]` and `iloc[ ]`.
  - Filtering rows using boolean indexing.

| Operation | Syntax | Result |
| --- | --- | --- |
| Select column | `df[col]` | Series |
| Select row by label | `df.loc[label]` | Series |
| Select row by integer location | `df.iloc[loc]` | Series |
| Slice rows | `df[5:10]` | DataFrame |
| Select rows by boolean vector | `df[bool_vec]` | DataFrame |

## Importing `Pandas` & `NumPy`

In [None]:
import pandas as pd
import numpy as np

## Preparing for Random Data Generation

Before creating a sample DataFrame, we set up our environment to generate random numbers in a controlled way:

- We're importing `randn` from `numpy.random` to generate random values drawn from a standard normal distribution to populate our DataFrame.
- By using `np.random.seed(101)`, we ensure that the random numbers generated are reproducible. This means every time the code runs, it produces the same output.

In [None]:
from numpy.random import randn
np.random.seed(101)

## Creating a Sample DataFrame with Random Values

- `randn(5, 4)` generates 5 rows and 4 columns of random numbers
- **Rows** are labeled 'A', 'B', 'C', 'D' and 'E'.
- **Columns** are labeled 'W', 'X', 'Y', and 'Z'.

In [None]:
df = pd.DataFrame(randn(5,4), index='A B C D E'.split(), columns='W X Y Z'.split())
df

Unnamed: 0,W,X,Y,Z
A,0.302665,1.693723,-1.706086,-1.159119
B,-0.134841,0.390528,0.166905,0.184502
C,0.807706,0.07296,0.638787,0.329646
D,-0.497104,-0.75407,-0.943406,0.484752
E,-0.116773,1.901755,0.238127,1.996652


## Accessing Columns by Name

- Use the column name as a key: `df["column_name"]`
- Use dot notation: `df.column_name` (only works if the column name has no spaces or special characters).

In [None]:
# access a single column by label
df['W']

Unnamed: 0,W
A,2.70685
B,0.651118
C,-2.018168
D,0.188695
E,0.190794


In [None]:
# access a single column using dot notation
df.W

Unnamed: 0,W
A,2.70685
B,0.651118
C,-2.018168
D,0.188695
E,0.190794


In [None]:
# access a list of column names
df[['W', 'Z']]

Unnamed: 0,W,Z
A,2.70685,0.503826
B,0.651118,0.605965
C,-2.018168,-0.589001
D,0.188695,0.955057
E,0.190794,0.683509


## Slicing Rows and Columns Using `.loc[ ]`

`loc` is a label-based indexing method used to access rows and columns by their labels.

```python
# basic syntax:
df.loc[row_labels, column_labels]
```

Example usage:

```python
# access a specific row by label
df.loc[0]

# selecting rows and columns by labels
df.loc[0:5, ["column_name"]]

# access specific rows and columns
df.loc[0:1, "Name":"City"]

# access specific columns for all rows
df.loc[:, ["Name", "City"]]
```

⚠️ _Note: These examples assume a DataFrame with **numeric row indices**. When using custom-labeled DataFrames, the labels must match._

In [None]:
# access a specific row by label
df.loc['A']

Unnamed: 0,A
W,2.70685
X,0.628133
Y,0.907969
Z,0.503826


In [None]:
# selecting a single cell by label
# returns a scalar value (float)
df.loc['B', 'Y']

np.float64(-0.8480769834036315)

In [None]:
# selecting a single cell as a 1×1 DataFrame
df.loc[['B'], ['Y']]

Unnamed: 0,Y
B,-0.848077


In [None]:
# selecting one row and multiple columns
# returns a Series
df.loc['A', ['W', 'Y']]

Unnamed: 0,A
W,2.70685
Y,0.907969


In [None]:
# selecting multiple rows and columns
# returns a 2×2 DataFrame
df.loc[['A', 'B'], ['W', 'Y']]

Unnamed: 0,W,Y
A,2.70685,0.907969
B,0.651118,-0.848077


## Slicing Rows and Columns Using `.iloc[ ]`

`iloc` is an integer-based indexing method used to access rows and columns by their positions.

```python
# basic syntax:
df.iloc[row_indices, column_indices]
```

Example usage:

```python
# access a specific row by index
df.iloc[0]

# access specific rows and columns by index
df.iloc[0:2, 0:2]

# access specific columns for all rows
df.iloc[:, [0, 2]]

# select rows and columns by integers position
df.iloc[0:5, [0, 1]]
```

In [None]:
# access a specific row by label
# return first row or row 'A'
df.iloc[0]

Unnamed: 0,A
W,2.70685
X,0.628133
Y,0.907969
Z,0.503826


In [None]:
# selecting a single cell by label
# returns a scalar value (float)
df.iloc[1, 2]

np.float64(-0.8480769834036315)

In [None]:
# selecting a single cell as a 1×1 DataFrame
df.iloc[[1], [2]]

Unnamed: 0,Y
B,-0.848077


In [None]:
# selecting one row and multiple columns
# returns a Series
df.iloc[0, [0, 2]]

Unnamed: 0,A
W,2.70685
Y,0.907969


In [None]:
# selecting multiple rows and columns using list indexing
# returns a 2×2 DataFrame
df.iloc[[0, 1], [0, 2]]

Unnamed: 0,W,Y
A,2.70685,0.907969
B,0.651118,-0.848077


In [None]:
# selecting multiple rows and columns using slice indexing
# returns a 2×2 DataFrame
df.iloc[0:2, 0:2]

Unnamed: 0,W,X
A,2.70685,0.628133
B,0.651118,-0.319318


## Filtering Using `Boolean` Indexing

- `Boolean` indexing allows you to filter rows based on a condition.
- The condition returns a boolean Series, and only rows with `True` are selected.

Example usage:

```python
# filter rows where Age is greater than 30
df[df["Age"] > 30]

# filter rows where City is "Los Angeles"
df[df["City"] == "Los Angeles"]

# combine multiple conditions using & (and), | (or)
df[(df["Age"] > 25) & (df["City"] == "Chicago")]
```

In [None]:
# view current DataFrame
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [None]:
# perform element-wise comparison
# returns DataFrame of booleans (Boolean mask)
df>0

Unnamed: 0,W,X,Y,Z
A,True,True,True,True
B,True,False,False,True
C,False,True,True,False
D,True,False,False,True
E,True,True,True,True


In [None]:
# applies Boolean mask to DataFrame
# returns only values that meet the condition (conditional selection)
df[df>0]

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,,,0.605965
C,,0.740122,0.528813,
D,0.188695,,,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [None]:
# filter rows where column 'W' is greater than 0
df[df['W'] > 0]

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [None]:
# filter rows where 'W' > 0
# select corresponding values from column 'Y'
df[df['W'] > 0]['Y']

Unnamed: 0,Y
A,0.907969
B,-0.848077
D,-0.933237
E,2.605967


In [None]:
# filter rows where 'W' > 0
# select corresponding values from column 'Z' and 'X' (use list format)
df[df['W'] > 0][['Z', 'X']]

Unnamed: 0,Z,X
A,0.503826,0.628133
B,0.605965,-0.319318
D,0.955057,-0.758872
E,0.683509,1.978757


In [None]:
# filter rows where 'W' > 0 AND 'Y' > 1
# use & for 'and' and wrap each condition in parentheses
df[(df['W'] > 0) & (df['Y'] > 1)]

Unnamed: 0,W,X,Y,Z
E,0.190794,1.978757,2.605967,0.683509


In [None]:
# filter rows where 'W' > 0 OR 'Y' > 1
# use | for 'or' and wrap each condition in parentheses
df[(df['W'] > 0) | (df['Y'] > 1)]

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


## More Index Details

Let's discuss some more features of indexing, including resetting the index or setting it something else. We'll also talk about index hierarchy!

In [None]:
# create a copy of the DataFrame
copy_df = df.copy()
copy_df

Unnamed: 0,W,X,Y,Z
A,0.302665,1.693723,-1.706086,-1.159119
B,-0.134841,0.390528,0.166905,0.184502
C,0.807706,0.07296,0.638787,0.329646
D,-0.497104,-0.75407,-0.943406,0.484752
E,-0.116773,1.901755,0.238127,1.996652


In [None]:
# reset to default 0,1...n index
copy_df.reset_index()

Unnamed: 0,index,W,X,Y,Z
0,A,0.302665,1.693723,-1.706086,-1.159119
1,B,-0.134841,0.390528,0.166905,0.184502
2,C,0.807706,0.07296,0.638787,0.329646
3,D,-0.497104,-0.75407,-0.943406,0.484752
4,E,-0.116773,1.901755,0.238127,1.996652


In [None]:
# adding a new 'States' column
copy_df['States'] = 'CA NY WY OR CO'.split()
copy_df

Unnamed: 0,W,X,Y,Z,States
A,0.302665,1.693723,-1.706086,-1.159119,CA
B,-0.134841,0.390528,0.166905,0.184502,NY
C,0.807706,0.07296,0.638787,0.329646,WY
D,-0.497104,-0.75407,-0.943406,0.484752,OR
E,-0.116773,1.901755,0.238127,1.996652,CO


In [None]:
# temporarily sets 'States' as index
copy_df.set_index('States')

Unnamed: 0_level_0,W,X,Y,Z
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,0.302665,1.693723,-1.706086,-1.159119
NY,-0.134841,0.390528,0.166905,0.184502
WY,0.807706,0.07296,0.638787,0.329646
OR,-0.497104,-0.75407,-0.943406,0.484752
CO,-0.116773,1.901755,0.238127,1.996652


In [None]:
# DataFrame remains unchanged
copy_df

Unnamed: 0,W,X,Y,Z,States
A,0.302665,1.693723,-1.706086,-1.159119,CA
B,-0.134841,0.390528,0.166905,0.184502,NY
C,0.807706,0.07296,0.638787,0.329646,WY
D,-0.497104,-0.75407,-0.943406,0.484752,OR
E,-0.116773,1.901755,0.238127,1.996652,CO


In [None]:
# inplace=True parameter modifies original DataFrame
copy_df.set_index('States', inplace=True)

In [None]:
copy_df

Unnamed: 0_level_0,W,X,Y,Z
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,0.302665,1.693723,-1.706086,-1.159119
NY,-0.134841,0.390528,0.166905,0.184502
WY,0.807706,0.07296,0.638787,0.329646
OR,-0.497104,-0.75407,-0.943406,0.484752
CO,-0.116773,1.901755,0.238127,1.996652


## Reading Excel files

To load a file from local computer in Colab, execute the following block and select the file manually.

In [91]:
from google.colab import files
uploaded = files.upload()

Saving sample-superstore.xls to sample-superstore.xls


In [94]:
# import excel data into a DataFrame and view
df_super = pd.read_excel('sample-superstore.xls', sheet_name=0)
df_super

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country/Region,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,US-2020-103800,2020-01-03,2020-01-07,Standard Class,DP-13000,Darren Powers,Consumer,United States,Houston,...,77095,Central,OFF-PA-10000174,Office Supplies,Paper,"Message Book, Wirebound, Four 5 1/2"" X 4"" Form...",16.448,2,0.2,5.5512
1,2,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-BI-10004094,Office Supplies,Binders,GBC Standard Plastic Binding Systems Combs,3.540,2,0.8,-5.4870
2,3,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-LA-10003223,Office Supplies,Labels,Avery 508,11.784,3,0.2,4.2717
3,4,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-ST-10002743,Office Supplies,Storage,SAFCO Boltless Steel Shelving,272.736,3,0.2,-64.7748
4,5,US-2020-141817,2020-01-05,2020-01-12,Standard Class,MB-18085,Mick Brown,Consumer,United States,Philadelphia,...,19143,East,OFF-AR-10003478,Office Supplies,Art,Avery Hi-Liter EverBold Pen Style Fluorescent ...,19.536,3,0.2,4.8840
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10189,10190,US-2023-143259,2023-12-30,2024-01-03,Standard Class,PO-18865,Patrick O'Donnell,Consumer,United States,New York City,...,10009,East,OFF-BI-10003684,Office Supplies,Binders,Wilson Jones Legal Size Ring Binders,52.776,3,0.2,19.7910
10190,10191,US-2023-115427,2023-12-30,2024-01-03,Standard Class,EB-13975,Erica Bern,Corporate,United States,Fairfield,...,94533,West,OFF-BI-10004632,Office Supplies,Binders,GBC Binding covers,20.720,2,0.2,6.4750
10191,10192,US-2023-156720,2023-12-30,2024-01-03,Standard Class,JM-15580,Jill Matthias,Consumer,United States,Loveland,...,80538,West,OFF-FA-10003472,Office Supplies,Fasteners,Bagged Rubber Bands,3.024,3,0.2,-0.6048
10192,10193,US-2023-143259,2023-12-30,2024-01-03,Standard Class,PO-18865,Patrick O'Donnell,Consumer,United States,New York City,...,10009,East,TEC-PH-10004774,Technology,Phones,Gear Head AU3700S Headset,90.930,7,0.0,2.7279


In [95]:
# view first 5 rows of DataFrame only
df_super.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country/Region,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,US-2020-103800,2020-01-03,2020-01-07,Standard Class,DP-13000,Darren Powers,Consumer,United States,Houston,...,77095,Central,OFF-PA-10000174,Office Supplies,Paper,"Message Book, Wirebound, Four 5 1/2"" X 4"" Form...",16.448,2,0.2,5.5512
1,2,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-BI-10004094,Office Supplies,Binders,GBC Standard Plastic Binding Systems Combs,3.54,2,0.8,-5.487
2,3,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-LA-10003223,Office Supplies,Labels,Avery 508,11.784,3,0.2,4.2717
3,4,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-ST-10002743,Office Supplies,Storage,SAFCO Boltless Steel Shelving,272.736,3,0.2,-64.7748
4,5,US-2020-141817,2020-01-05,2020-01-12,Standard Class,MB-18085,Mick Brown,Consumer,United States,Philadelphia,...,19143,East,OFF-AR-10003478,Office Supplies,Art,Avery Hi-Liter EverBold Pen Style Fluorescent ...,19.536,3,0.2,4.884


In [97]:
# accessing column with label
df_super['Order Date']

Unnamed: 0,Order Date
0,2020-01-03
1,2020-01-04
2,2020-01-04
3,2020-01-04
4,2020-01-05
...,...
10189,2023-12-30
10190,2023-12-30
10191,2023-12-30
10192,2023-12-30


In [98]:
# use .loc to filter rows by condition on 'Order ID'
# returns the full row(s) where 'Order ID' equals 'US-2020-112326'
df_super.loc[df_super['Order ID'] == 'US-2020-112326']

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country/Region,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
1,2,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-BI-10004094,Office Supplies,Binders,GBC Standard Plastic Binding Systems Combs,3.54,2,0.8,-5.487
2,3,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-LA-10003223,Office Supplies,Labels,Avery 508,11.784,3,0.2,4.2717
3,4,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-ST-10002743,Office Supplies,Storage,SAFCO Boltless Steel Shelving,272.736,3,0.2,-64.7748


In [99]:
# use .loc to access row by label (index = 3)
# returns values from columns 'Product Name' and 'Profit' for that row
df_super.loc[3, ['Product Name', 'Profit']]

Unnamed: 0,3
Product Name,SAFCO Boltless Steel Shelving
Profit,-64.7748


In [100]:
# use .loc to filter rows by condition on 'Category'
# returns the full row(s) where 'Category' equals 'Office Supplies'
df_super.loc[df_super['Category'] == 'Office Supplies']

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country/Region,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,US-2020-103800,2020-01-03,2020-01-07,Standard Class,DP-13000,Darren Powers,Consumer,United States,Houston,...,77095,Central,OFF-PA-10000174,Office Supplies,Paper,"Message Book, Wirebound, Four 5 1/2"" X 4"" Form...",16.448,2,0.2,5.5512
1,2,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-BI-10004094,Office Supplies,Binders,GBC Standard Plastic Binding Systems Combs,3.540,2,0.8,-5.4870
2,3,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-LA-10003223,Office Supplies,Labels,Avery 508,11.784,3,0.2,4.2717
3,4,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-ST-10002743,Office Supplies,Storage,SAFCO Boltless Steel Shelving,272.736,3,0.2,-64.7748
4,5,US-2020-141817,2020-01-05,2020-01-12,Standard Class,MB-18085,Mick Brown,Consumer,United States,Philadelphia,...,19143,East,OFF-AR-10003478,Office Supplies,Art,Avery Hi-Liter EverBold Pen Style Fluorescent ...,19.536,3,0.2,4.8840
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10188,10189,US-2023-115427,2023-12-30,2024-01-03,Standard Class,EB-13975,Erica Bern,Corporate,United States,Fairfield,...,94533,West,OFF-BI-10002103,Office Supplies,Binders,"Cardinal Slant-D Ring Binder, Heavy Gauge Vinyl",13.904,2,0.2,4.5188
10189,10190,US-2023-143259,2023-12-30,2024-01-03,Standard Class,PO-18865,Patrick O'Donnell,Consumer,United States,New York City,...,10009,East,OFF-BI-10003684,Office Supplies,Binders,Wilson Jones Legal Size Ring Binders,52.776,3,0.2,19.7910
10190,10191,US-2023-115427,2023-12-30,2024-01-03,Standard Class,EB-13975,Erica Bern,Corporate,United States,Fairfield,...,94533,West,OFF-BI-10004632,Office Supplies,Binders,GBC Binding covers,20.720,2,0.2,6.4750
10191,10192,US-2023-156720,2023-12-30,2024-01-03,Standard Class,JM-15580,Jill Matthias,Consumer,United States,Loveland,...,80538,West,OFF-FA-10003472,Office Supplies,Fasteners,Bagged Rubber Bands,3.024,3,0.2,-0.6048


In [101]:
# use .loc to filter rows where multiple conditions are met
# returns rows where 'Region' is 'Central' AND 'Profit' is greater than 0
df_super.loc[(df_super['Region'] == 'Central') & (df_super['Profit'] > 0)]

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country/Region,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,US-2020-103800,2020-01-03,2020-01-07,Standard Class,DP-13000,Darren Powers,Consumer,United States,Houston,...,77095,Central,OFF-PA-10000174,Office Supplies,Paper,"Message Book, Wirebound, Four 5 1/2"" X 4"" Form...",16.448,2,0.2,5.5512
2,3,US-2020-112326,2020-01-04,2020-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-LA-10003223,Office Supplies,Labels,Avery 508,11.784,3,0.2,4.2717
16,17,US-2020-135405,2020-01-09,2020-01-13,Standard Class,MS-17830,Melanie Seite,Consumer,United States,Laredo,...,78041,Central,OFF-AR-10004078,Office Supplies,Art,Newell 312,9.344,2,0.2,1.1680
17,18,US-2020-135405,2020-01-09,2020-01-13,Standard Class,MS-17830,Melanie Seite,Consumer,United States,Laredo,...,78041,Central,TEC-AC-10001266,Technology,Accessories,Memorex Micro Travel Drive 8 GB,31.200,3,0.2,9.7500
44,45,US-2020-167927,2020-01-20,2020-01-26,Standard Class,XP-21865,Xylona Preis,Consumer,United States,Westland,...,48185,Central,FUR-FU-10002268,Furniture,Furnishings,Ultra Door Push Plate,14.730,3,0.0,4.8609
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10158,10159,US-2023-136539,2023-12-28,2024-01-01,Standard Class,GH-14665,Greg Hansen,Consumer,United States,Round Rock,...,78664,Central,OFF-AR-10001958,Office Supplies,Art,Stanley Bostitch Contemporary Electric Pencil ...,27.168,2,0.2,2.7168
10159,10160,US-2023-135111,2023-12-28,2024-01-02,Standard Class,CS-12400,Christopher Schild,Home Office,United States,Fargo,...,58103,Central,OFF-AR-10004707,Office Supplies,Art,Staples in misc. colors,2.480,1,0.0,0.8680
10163,10164,US-2023-135111,2023-12-28,2024-01-02,Standard Class,CS-12400,Christopher Schild,Home Office,United States,Fargo,...,58103,Central,OFF-BI-10004040,Office Supplies,Binders,Wilson Jones Impact Binders,25.900,5,0.0,12.6910
10181,10182,US-2023-158673,2023-12-29,2024-01-04,Standard Class,KB-16600,Ken Brennan,Corporate,United States,Grand Rapids,...,49505,Central,OFF-PA-10000994,Office Supplies,Paper,Xerox 1915,209.700,2,0.0,100.6560


In [102]:
# use .iloc to select a subset by position
# returns the first 3 rows and the last 2 columns of the DataFrame
df_super.iloc[:3, -2:]

Unnamed: 0,Discount,Profit
0,0.2,5.5512
1,0.8,-5.487
2,0.2,4.2717
