## Pandas Unit 2:  Indexing, Selecting, Assigning

### Indexing using Python native accessor(column first, row second)

In [None]:
import pandas as pd
#reviews = pd.read_csv('world-happiness-report-2021/world-happiness-report-2021.csv', index_col=0)
reviews = pd.read_csv('world-happiness-report-2021/world-happiness-report-2021.csv')
pd.set_option('display.max_rows',5)
reviews

- access 'Generosity' property of reviews
   - using dot operator (like an object property)
   - using indexing operator ([]) (like a dict key)

In [None]:
reviews.Generosity # only works when there is no space in the property name

In [None]:
reviews['Generosity']

In [None]:
reviews['Ladder score']  # can contain space

In [None]:
#reviews['Ladder score']['Finland'] # column first, row second
reviews['Ladder score'][0]

### pandas accessor operators (row first, column second)
- index-based selection (iloc) : 
   - selecting data based on its numerical position in the data
   - index scheme: iloc uses the Python stdlib indexing scheme, where the first element of the range is included and **the last one excluded**. So 0:10 will select entries 0,...,9.

In [None]:
reviews.iloc[0] # first row

In [None]:
reviews.iloc[:, 0] # first column

In [None]:
reviews.iloc[0:3,0] # use a slice ; the 1st column in rows 0 to 2

In [None]:
reviews.iloc[[0,2,4],0] # use a list 

In [None]:
reviews.iloc[-3:,0] # minus index means counting from the end

In [None]:
reviews.iloc[:3, 0:3]

### pandas accessor operators (row first, column second)
- label-based selection (loc) 
   - selecting data based on its **index value** in the data
   - indexing scheme : loc indexes inclusively. So 0:10 will select entries 0,...,10 .
   

In [None]:
reviews.loc[0, 'Country name']

In [None]:
reviews.loc[:, ['Country name','Regional indicator','Ladder score']]

In [None]:
reviews.loc[:, 'Country name':'Ladder score']

In [None]:
reviews.loc[0:3, 'Country name':'Ladder score'] # indices are inclusive for loc

In [None]:
# reread the csv , but assign the index column 'Country name'
reviews = pd.read_csv('world-happiness-report-2021/world-happiness-report-2021.csv', index_col=0)
pd.set_option('display.max_rows',10)
reviews

In [None]:
reviews.loc["Finland":"Switzerland", 'Ladder score':'lowerwhisker']

### Manipulate the index
- Label-based selection use the labels in the index, which is not immutable
- set_index() method can change the index field
   

In [None]:
reviews = pd.read_csv('world-happiness-report-2021/world-happiness-report-2021.csv')
reviews.set_index('Regional indicator')

### Conditional selection
- select by condition
- the result is a Series of True/False
- the result can be used inside of loc to select the relevant data

In [None]:
reviews = pd.read_csv('world-happiness-report-2021/world-happiness-report-2021.csv')
reviews['Regional indicator']=='Western Europe'

In [None]:
reviews.loc[reviews['Regional indicator']=='Western Europe']

In [None]:
reviews.loc[reviews['Regional indicator']=='Western Europe',['Country name','Ladder score']]

In [None]:
reviews.loc[(reviews['Regional indicator']=='East Asia') & (reviews['Ladder score']>6.0)]