<p style="font-family:Verdana; font-size: 26px; color: orange"> Indexing and Selecting Data with Pandas</p>

In [2]:
import pandas as pd
import numpy as np 

<p style="font-family:Verdana; font-size: 22px; color: orange"> Indexing Data using the [] Operator</p>

<p style="font-family:Verdana; font-size: 22px; color: orange"> Selecting a Single Column</p>

In [3]:
# Load the data
data = pd.read_csv("../nba.csv", index_col="Name")
print("Dataset")
display(data.head(5))

# Select a single column
first = data["Age"]
print("\nSingle Column selected from Dataset")
display(first.head(5))

Dataset


Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0



Single Column selected from Dataset


Name
Avery Bradley    25.0
Jae Crowder      25.0
John Holland     27.0
R.J. Hunter      22.0
Jonas Jerebko    29.0
Name: Age, dtype: float64

<p style="font-family:Verdana; font-size: 22px; color: orange"> Selecting Multiple Columns</p>

In [4]:
first = data[["Age", "College", "Salary"]]
print("\nMultiple Columns selected from Dataset")
display(first.head(5))


Multiple Columns selected from Dataset


Unnamed: 0_level_0,Age,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avery Bradley,25.0,Texas,7730337.0
Jae Crowder,25.0,Marquette,6796117.0
John Holland,27.0,Boston University,
R.J. Hunter,22.0,Georgia State,1148640.0
Jonas Jerebko,29.0,,5000000.0


> DataFrame[]: Known as the indexing operator, used for basic selection.
>> DataFrame.loc[]: Label-based indexing for selecting data by row/column labels.
>>> DataFrame.iloc[]: Position-based indexing for selecting data by row/column integer positions.

<p style="font-family:Verdana; font-size: 22px; color: orange"> Indexing a DataFrame using .loc[ ]</p>

In [5]:
# Selecting a single row
# retrieving row by loc method
first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]


print(first, "\n\n\n", second)

Team        Boston Celtics
Number                 0.0
Position                PG
Age                   25.0
Height                 6-2
Weight               180.0
College              Texas
Salary           7730337.0
Name: Avery Bradley, dtype: object 


 Team        Boston Celtics
Number                28.0
Position                SG
Age                   22.0
Height                 6-5
Weight               185.0
College      Georgia State
Salary           1148640.0
Name: R.J. Hunter, dtype: object


In [6]:
# Select multiple rows
first = data.loc[["Avery Bradley", "R.J. Hunter"]]
display(first)

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0


In [7]:
# Selecting Specific Rows and Columns --> Dataframe.loc[["row1", "row2"], ["column1", "column2", "column3"]]
# Select two rows and three columns
first = data.loc[["Avery Bradley", "R.J. Hunter"], ["Team", "Number", "Position"]]
print(first)

                         Team  Number Position
Name                                          
Avery Bradley  Boston Celtics     0.0       PG
R.J. Hunter    Boston Celtics    28.0       SG


In [8]:
# Selecting all of the rows and some columns --> Dataframe.loc[:, ["column1", "column2", "column3"]]
# Select all rows and specific columns
first = data.loc[:, ["Team", "Number", "Position"]]
print(first)

                         Team  Number Position
Name                                          
Avery Bradley  Boston Celtics     0.0       PG
Jae Crowder    Boston Celtics    99.0       SF
John Holland   Boston Celtics    30.0       SG
R.J. Hunter    Boston Celtics    28.0       SG
Jonas Jerebko  Boston Celtics     8.0       PF
...                       ...     ...      ...
Shelvin Mack        Utah Jazz     8.0       PG
Raul Neto           Utah Jazz    25.0       PG
Tibor Pleiss        Utah Jazz    21.0        C
Jeff Withey         Utah Jazz    24.0        C
NaN                       NaN     NaN      NaN

[458 rows x 3 columns]


<p style="font-family:Verdana; font-size: 22px; color: orange"> Indexing a DataFrame using <em>.iloc[ ]</em></p>

In [9]:
# To select a single row using .iloc[], provide the integer position of the row:
# Select a single row by position
row2 = data.iloc[3]
print(row2)

Team        Boston Celtics
Number                28.0
Position                SG
Age                   22.0
Height                 6-5
Weight               185.0
College      Georgia State
Salary           1148640.0
Name: R.J. Hunter, dtype: object


In [10]:
# Select multiple rows by position
row2 = data.iloc[[3, 5, 7]]
display(row2)

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0


In [11]:
# Selecting Specific Rows and Columns
# Select two rows and two columns by position
row2 = data.iloc[[3, 4], [1, 2]]
print(row2)

               Number Position
Name                          
R.J. Hunter      28.0       SG
Jonas Jerebko     8.0       PF


In [12]:
# Selecting All Rows and Some Columns
# Select all rows and specific columns
row2 = data.iloc[:, [1, 2]]
print(row2)

               Number Position
Name                          
Avery Bradley     0.0       PG
Jae Crowder      99.0       SF
John Holland     30.0       SG
R.J. Hunter      28.0       SG
Jonas Jerebko     8.0       PF
...               ...      ...
Shelvin Mack      8.0       PG
Raul Neto        25.0       PG
Tibor Pleiss     21.0        C
Jeff Withey      24.0        C
NaN               NaN      NaN

[458 rows x 2 columns]
