<a href="https://colab.research.google.com/github/leslie-zi-pan/pandas/blob/main/Pandas_Indexing%2C_Selecting_%26_Assigning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas -  Indexing, Selecting & Assigning
https://www.kaggle.com/code/residentmario/indexing-selecting-assigning/tutorial

In [2]:
import pandas as pd

## Native accessors

Native python objects allows easy way of indexing data. Pandas carries over these object types.



In [3]:
# Example of native accessors
# lets create a dataframe first
starting_five_df = pd.DataFrame({
    'lakers': ['lebron james', 'davis', 'maclung', 'anthony', 'westbrook'],
    'bulls': ['caruso', 'lavine', 'ball',  'derozan', 'vucevic'],
    'hornets': ['ball', 'rozier', 'washington', 'bridges', 'plumlee']
}, index=['pg', 'sg', 'sf', 'pf', 'c'])

# Here we can access the column lakers using python native accessor
starting_five_df.lakers

pg    lebron james
sg           davis
sf         maclung
pf         anthony
c        westbrook
Name: lakers, dtype: object

We can also access this using the indexing operator

In [4]:
starting_five_df['lakers']

pg    lebron james
sg           davis
sf         maclung
pf         anthony
c        westbrook
Name: lakers, dtype: object

## Indexing in pandas

You can use the indexing operator and attribute selection on pandas just like in python. However, it is recommended to use the loc and iloc functionality for more advanced operations. 

### Index-based selection

Pandas indexing works in one of two paradigms. The first is index-based selection - selecting data based on its numerical position in the data.

Iloc follows this paradigm.

In [10]:
# Lets select the second row, which represents the shooting guards row
starting_five_df.iloc[1]

lakers      davis
bulls      lavine
hornets    rozier
Name: sg, dtype: object

Loc and Iloc operate on a row-first, column-second basis. Therefore, its slightly harder to retrieve based on columns - but still doable ;). 

Hers how we retrieve a column.

In [11]:
starting_five_df.iloc[:, 1]

pg     caruso
sg     lavine
sf       ball
pf    derozan
c     vucevic
Name: bulls, dtype: object

We are able to select a subset of a column too.

In [12]:
# Lets select the first two of the previous column
starting_five_df.iloc[:2, 1]

pg    caruso
sg    lavine
Name: bulls, dtype: object

In [13]:
# We can also select based on a list, so lets get [pg, sf, c]
starting_five_df.iloc[[0, 2, 4], 1]

pg     caruso
sf       ball
c     vucevic
Name: bulls, dtype: object

In [15]:
# We can also obtain the last n number of rows using the following
# lets get the last 2 rows
n = 2
starting_five_df.iloc[-n:]

Unnamed: 0,lakers,bulls,hornets
pf,anthony,derozan,bridges
c,westbrook,vucevic,plumlee


### Label-based selection

Label based selection is the second paradigm for attribute selection using the loc operator. Here we use the data index value instead of its position. 

In [16]:
starting_five_df

Unnamed: 0,lakers,bulls,hornets
pg,lebron james,caruso,ball
sg,davis,lavine,rozier
sf,maclung,ball,washington
pf,anthony,derozan,bridges
c,westbrook,vucevic,plumlee


In [18]:
# Selecting based on the 'sg' index
starting_five_df.loc['sg']

lakers      davis
bulls      lavine
hornets    rozier
Name: sg, dtype: object

Its usually easier to filter through loc operator if your dataframe has meaningful indicies


In [19]:
starting_five_df.loc[['pg', 'sg'], ['hornets', 'lakers']]

Unnamed: 0,hornets,lakers
pg,ball,lebron james
sg,rozier,davis


### Choosing between loc and iloc

Indexing schemes are slightly different between iloc and loc. 

Iloc uses the python stdlib indexing scheme - so [0:10] will contain 10 elements, excluding the last index 10. 

HOWEVER, loc indexes inclusively. So an index of [0:10] will contain 11 entries including index 10. 

The reasoning for this is that loc index can contain any stdlib type. So index string would make it easier to be inclusive df.loc['Apples':'Potatoes'] than finding the next index for the upper end to be exclusive df.loc['Apples', 'Potatoet'] (t coming after s in the alphabet).

## Manipulating the index

Indexes are not immutable and can be manipulated. 

The set_index() method can be used to manipulate this.


In [22]:
starting_five_df.set_index('lakers')

Unnamed: 0_level_0,bulls,hornets
lakers,Unnamed: 1_level_1,Unnamed: 2_level_1
lebron james,caruso,ball
davis,lavine,rozier
maclung,ball,washington
anthony,derozan,bridges
westbrook,vucevic,plumlee


# Conditional selection