# Indexing, Selecting, and Assigning

## Native Accessors
In python, like other languages such as JavaScript, we can access we can access a property of an object by using the dot notation. For example, if we have a dictionary, we can access a value by using the key of the dictionary.

In [355]:
import pandas as pd

wines = pd.read_csv('./wines.csv')

wines.head()

Unnamed: 0.1,Unnamed: 0,country,price,comments
0,0,USA,55.0,comments 1
1,1,Canada,,comments 2
2,2,Brazil,33.0,comments 3
3,3,,73.0,comments 4
4,4,USA,37.0,comments 5


However, in order to do this, we need to make sure there is no white space in the column names on the ends. If white spaces must be present, we can use the bracket notation to access the column or we can rename the column to remove the white space. This is before the white space is removed.

In [356]:
wines.columns
try:
    print(wines.country)
except:
    print('Error accessing country column')

Error accessing country column


Now, look at the effects of stripping the whitespace.

In [357]:
wines.columns = wines.columns.str.strip() # strip() removes leading and trailing whitespace
try:
    print(wines.country)
except:
    print('Error accessing country column')

0        USA
1     Canada
2     Brazil
3        NaN
4        USA
Name: country, dtype: object


Alternatively, we can access the column by using the bracket notation.

In [358]:
try:
    print(wines['country'])
except:
    print('Error accessing country column')

0        USA
1     Canada
2     Brazil
3        NaN
4        USA
Name: country, dtype: object


## Indexing in Pandas

There are two major ways to index a dataframe in pandas. The first is the `.loc` accessor. This is used to select data by label or by a boolean array. The second is the `.iloc` accessor. This is used to select data by position or by a integer array.

This first example will get the first row of the dataframe.

### Integer Indexing with `.iloc`

In [359]:
wines.iloc[0]

Unnamed: 0              0
country               USA
price                  55
comments       comments 1
Name: 0, dtype: object

This next example will get the data from the first row and the third column.

In [360]:
wines.iloc[0, 2]

' 55'

This next example will get the entire second column.

In [361]:
wines.iloc[:, 1]

0        USA
1     Canada
2     Brazil
3        NaN
4        USA
Name: country, dtype: object

This example will get all rows and columns after the first row and first column. That is, it will get the second row and second column and all rows and columns after that.

In [362]:
wines.iloc[1:, 1:]

Unnamed: 0,country,price,comments
1,Canada,,comments 2
2,Brazil,33.0,comments 3
3,,73.0,comments 4
4,USA,37.0,comments 5


So, now we will clean up the data a bit by getting rid of the useless first column.

In [363]:
wines = wines.iloc[:,1:]
wines.head()

Unnamed: 0,country,price,comments
0,USA,55.0,comments 1
1,Canada,,comments 2
2,Brazil,33.0,comments 3
3,,73.0,comments 4
4,USA,37.0,comments 5


Of note, negative numbers can be used to index from the end of the dataframe. This gets the entire dataframe except for the last row.

In [364]:
wines.iloc[:-1]

Unnamed: 0,country,price,comments
0,USA,55.0,comments 1
1,Canada,,comments 2
2,Brazil,33.0,comments 3
3,,73.0,comments 4


### Label Indexing with `.loc`

In [365]:
wines.loc[0, 'country']

' USA'

A major difference in the behavior of `.loc` and `.iloc` is that `.loc` is inclusive of the last element. This means that if we want to get the last two rows, we need to use the following code.

In [366]:
wines.loc[:, ['price', 'comments']]

Unnamed: 0,price,comments
0,55.0,comments 1
1,,comments 2
2,33.0,comments 3
3,73.0,comments 4
4,37.0,comments 5


### Changing the Index
If we want to change the index of the dataframe, we can use the `.set_index()` method. This will change the index to the column specified.

In [367]:
wines.set_index('price')
wines.head()


Unnamed: 0,country,price,comments
0,USA,55.0,comments 1
1,Canada,,comments 2
2,Brazil,33.0,comments 3
3,,73.0,comments 4
4,USA,37.0,comments 5


### Conditional Selection

We can use conditional selection to select data from a dataframe. For example, if we want to get all the rows where the `country` column is equal to `USA`, we can use the following conditional logic.

In [368]:
wines.country.str.strip() == "USA"

0     True
1    False
2    False
3    False
4     True
Name: country, dtype: bool

If we take this conditional logic and apply it to the dataframe, we will get the following result.

In [369]:
wines.loc[wines.country.str.strip() == "USA"]

Unnamed: 0,country,price,comments
0,USA,55,comments 1
4,USA,37,comments 5


Note that we apply the `[obj].str.strip()` methods to the column to remove the white space from the column. This is because the white space is not removed from the columns when we import them.

We can expand this conditional logic using the `&` and `|` operators. This will allow us to get all the rows where the `country` column is equal to `USA` and the `price` column is greater than or equal to 50.

In [370]:
wines.country = wines.country.str.strip()
wines.price = wines.price.astype(float)

wines.loc[(wines.country == "USA") & (wines.price > 50)]


Unnamed: 0,country,price,comments
0,USA,55.0,comments 1


Note, we applyed the `astype()` method to the `price` column to convert the column to a float. This is because the column is imported as a string.

### Built-in Selectors

Pandas comes with a few built-in selectors that can be used to select data from a dataframe. The first is the `isin()` method. This method can be used to select data from a dataframe where the data is in a list of values. For example, if we want to get all the rows where the `country` column is either `USA` or `Brazil`, we can use the following code.

In [371]:
wines.loc[wines.country.isin(['USA', 'Brazil'])]

Unnamed: 0,country,price,comments
0,USA,55.0,comments 1
2,Brazil,33.0,comments 3
4,USA,37.0,comments 5


We can do the same for values that are not null using the following code.

In [372]:
wines.loc[wines.price.notnull()]

Unnamed: 0,country,price,comments
0,USA,55.0,comments 1
2,Brazil,33.0,comments 3
3,,73.0,comments 4
4,USA,37.0,comments 5


Or, we can do the same for values that are null using the following code.

In [373]:
wines.loc[wines.price.isnull()]

Unnamed: 0,country,price,comments
1,Canada,,comments 2


### Assigning Data

You can assign data quite simply in Pandas. Simply use the proper selector to select the data you want to change and assign it a new value. For example, if we want to change each entry in the `country` column to `country_of_origin`, we can use the following code.

In [374]:
wines["country"] = "unkown CoA"
wines.head()

Unnamed: 0,country,price,comments
0,unkown CoA,55.0,comments 1
1,unkown CoA,,comments 2
2,unkown CoA,33.0,comments 3
3,unkown CoA,73.0,comments 4
4,unkown CoA,37.0,comments 5


And, if we wanted to change the value of a single cell, we can use the following code.

In [375]:
wines.loc[3, "country"] = "United States of America"
wines.head()

Unnamed: 0,country,price,comments
0,unkown CoA,55.0,comments 1
1,unkown CoA,,comments 2
2,unkown CoA,33.0,comments 3
3,United States of America,73.0,comments 4
4,unkown CoA,37.0,comments 5
