# 05 - Updating Rows and Columns - Modifying Data Within DataFrames

https://youtu.be/DCDe29sIKcE?si=d4DUqMqXt7kSc3o-

Notes by [Innovinitylabs](https://github.com/innovinitylabs)

In [48]:
import pandas as pd

#Setup for learning data

people = {
    "first": ["Corey", 'Jane', 'John'], 
    "last": ["Schafer", 'Doe', 'Doe'], 
    "email": ["CoreyMSchafer@gmail.com", 'JaneDoe@email.com', 'JohnDoe@email.com']
}
dft = pd.DataFrame(people)

---

#Setup for Real world data

In [49]:
df = pd.read_csv('data/survey_results_public.csv', index_col='Respondent' )
schema_df = pd.read_csv('data/survey_results_schema.csv', index_col='Column')
pd.set_option('display.max_columns', 85)


In [50]:
# pd.set_option('display.max_rows', 85)

---

#### Update columns/column names


How to update data within rows and columns

In [51]:
dft.columns

Index(['first', 'last', 'email'], dtype='object')

In [52]:
dft.columns = [ 'f', 'l', 'email']
dft

Unnamed: 0,f,l,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


Renames all the column in the dataframe.

In [53]:
# dft.columns = [ 'f', 'l']

But that method is only used when renaming all the columns in the dataframe and not when some of the columns are left

to rename specific column have to give a Dictionary (key:value pairs) like 'Old name' : 'new name'  to `.rename(columns = {})` method

In [54]:
dft.rename( columns = {'f': 'first name', 'l': 'last name'})

Unnamed: 0,first name,last name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


but the above wont change data in place so we have to use `inplace = True`

In [55]:
dft.rename( columns = {'f': 'first name', 'l': 'last name'}, inplace=True)
dft

Unnamed: 0,first name,last name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


can use string methods to uppercase the column using list comprehension

In [56]:
dft.columns = [ x.upper() for x in dft.columns ]
dft

Unnamed: 0,FIRST NAME,LAST NAME,EMAIL
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [57]:
dft.columns = dft.columns.str.lower() #can also use this method
dft

Unnamed: 0,first name,last name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


Or remove the spaces in between column names for ease of use

In [58]:
dft.columns = dft.columns.str.replace( ' ', '_')
dft

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [59]:
dft.columns = ['first', 'last', 'email'] #Back to original template

---

# Update data in rows
###### [Timestamp](https://youtu.be/DCDe29sIKcE?si=9Xbty9dqhF9it79K)

can change all values in dataframe like this

In [60]:
dft.loc[2] = ['John', 'Smith', 'JohnSMith@email.com']
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Smith,JohnSMith@email.com


if we only want to change couple of columns/values

In [61]:
dft.loc[1, ['last', 'email']] = ['Doe', 'JohnDoe@email.com']
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JohnDoe@email.com
2,John,Smith,JohnSMith@email.com


Pandas has indexer called `.at` which is meant specifically for changing single value  
###### not any difference or performance gains

In [63]:
dft.at[2, 'last'] = 'Doe'
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JohnDoe@email.com
2,John,Doe,JohnSMith@email.com


<mark>Errors when</mark> Indexers like `.loc` or `.at` is <mark>NOT</mark> used
###### [Timestamp](https://youtu.be/DCDe29sIKcE?si=KKqz5VgyaYQDaOui&t=722)

In [65]:
#lets create a filter
filt = dft['email'] == 'JohnDoe@email.com'
dft[filt]

Unnamed: 0,first,last,email
1,Jane,Doe,JohnDoe@email.com


In [67]:
dft[filt]['last'] = 'Smith' #if we try to assign like this

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dft[filt]['last'] = 'Smith' #if we try to assign like this


We get a warning like above

In [68]:
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JohnDoe@email.com
2,John,Doe,JohnSMith@email.com


But the name is not actually changed