# Updating data frames

## Updating values in a dataframe

Let's import packages and date

In [8]:
import numpy as np
import pandas as pd
import random # Use for randomly sampling integers

# Set the seed
random.seed(42)

# Import data
URL = 'https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv'
penguins = pd.read_csv(URL)

In [9]:
# Add column body mass in kg
penguins['body_mass_kg'] = penguins['body_mass_g']/1000

# Confirm the new column is in the data frame without printing the whole dataframe and is more clear
print('body_mass_kg' in penguins.columns) 

True


In [10]:
# Create random 3-digit codes
codes = random.sample(range(100,1000), len(penguins))  # Sampling w/o replacement

# Insert codes at the front of data frame
penguins.insert(loc=0,  # Index
                column='id_code',
                value=codes)

penguins.head

<bound method NDFrame.head of      id_code    species     island  bill_length_mm  bill_depth_mm  \
0        754     Adelie  Torgersen            39.1           18.7   
1        214     Adelie  Torgersen            39.5           17.4   
2        125     Adelie  Torgersen            40.3           18.0   
3        859     Adelie  Torgersen             NaN            NaN   
4        381     Adelie  Torgersen            36.7           19.3   
..       ...        ...        ...             ...            ...   
339      140  Chinstrap      Dream            55.8           19.8   
340      183  Chinstrap      Dream            43.5           18.1   
341      969  Chinstrap      Dream            49.6           18.2   
342      635  Chinstrap      Dream            50.8           19.0   
343      883  Chinstrap      Dream            50.2           18.7   

     flipper_length_mm  body_mass_g     sex  year  body_mass_kg  
0                181.0       3750.0    male  2007         3.750  
1        

## A single value 

 Access a single value in a `pandas/DataFrame` using locators
    
- at[] to select by labels, or
- iat[] to select by position.

The syntax for `at[]` is:
```
df.at[single_index_value, 'column_name']
```
Think of `at[]` as the equivalent to `loc[]` when trying to access a single value.



In [12]:
# Modifying the data frame setting the code as the index
penguins = penguins.set_index('id_code')
penguins

Unnamed: 0_level_0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year,body_mass_kg
id_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
754,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007,3.750
214,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007,3.800
125,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007,3.250
859,Adelie,Torgersen,,,,,,2007,
381,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007,3.450
...,...,...,...,...,...,...,...,...,...
140,Chinstrap,Dream,55.8,19.8,207.0,4000.0,male,2009,4.000
183,Chinstrap,Dream,43.5,18.1,202.0,3400.0,female,2009,3.400
969,Chinstrap,Dream,49.6,18.2,193.0,3775.0,male,2009,3.775
635,Chinstrap,Dream,50.8,19.0,210.0,4100.0,male,2009,4.100


In [15]:
# Check bill length of penguins with ID 214
penguins.at[214, 'bill_length_mm']

39.5

In [17]:
# Correct bill length of penguin with ID 214
penguins.at[214, 'bill_length_mm'] = 38.3

# Confirm the value was updated
penguins.loc[127]

species              Adelie
island               Biscoe
bill_length_mm         38.2
bill_depth_mm          18.1
flipper_length_mm     185.0
body_mass_g          3950.0
sex                    male
year                   2007
body_mass_kg           3.95
Name: 127, dtype: object

If we want to access or update a single value by index position we use `iat[]` locator: 

Syntax:
```
df.iat[index_integer_location, column_integer_location]
```
Dynaically get the location of a single column
```
df.columns.get_loc('column_name')
```

# Check-in¶
Obtain the location of the bill_length_mm column.

Use iat[] to access the same bill length value for the penguin with ID 859 and revert it back to an NA. Confirm your update using iloc[].

In [30]:
# Obtain the location of the bill_length_mm column
bill_length_index = penguins.columns.get_loc('bill_length_mm')

In [31]:
# Ues `iat[]` to rename bill_length_mm for penguin with ID 214
penguins.iat[2, bill_length_index] = np.nan
penguins.iloc[2]

species                 Adelie
island               Torgersen
bill_length_mm             NaN
bill_depth_mm             18.0
flipper_length_mm        195.0
body_mass_g             3250.0
sex                     female
year                      2007
body_mass_kg              3.25
2                          NaN
bill_length                NaN
Name: 125, dtype: object

In [22]:
?penguins.iat

[0;31mType:[0m        property
[0;31mString form:[0m <property object at 0x7f01bd06ede0>
[0;31mDocstring:[0m  
Access a single value for a row/column pair by integer position.

Similar to ``iloc``, in that both provide integer-based lookups. Use
``iat`` if you only need to get or set a single value in a DataFrame
or Series.

Raises
------
IndexError
    When integer position is out of bounds.

See Also
--------
DataFrame.at : Access a single value for a row/column label pair.
DataFrame.loc : Access a group of rows and columns by label(s).
DataFrame.iloc : Access a group of rows and columns by integer position(s).

Examples
--------
>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                   columns=['A', 'B', 'C'])
>>> df
    A   B   C
0   0   2   3
1   0   4   1
2  10  20  30

Get value at specified row/column pair

>>> df.iat[1, 2]
1

Set value at specified row/column pair

>>> df.iat[1, 2] = 10
>>> df.iat[1, 2]
10

Get value within a series

>>> df.loc[0]