# 05 - Updating Rows and Columns - Modifying Data Within DataFrames

https://youtu.be/DCDe29sIKcE?si=d4DUqMqXt7kSc3o-

Notes by [Innovinitylabs](https://github.com/innovinitylabs)

In [48]:
import pandas as pd

#Setup for learning data

people = {
    "first": ["Corey", 'Jane', 'John'], 
    "last": ["Schafer", 'Doe', 'Doe'], 
    "email": ["CoreyMSchafer@gmail.com", 'JaneDoe@email.com', 'JohnDoe@email.com']
}
dft = pd.DataFrame(people)

---

#Setup for Real world data

In [49]:
df = pd.read_csv('data/survey_results_public.csv', index_col='Respondent' )
schema_df = pd.read_csv('data/survey_results_schema.csv', index_col='Column')
pd.set_option('display.max_columns', 85)


In [50]:
# pd.set_option('display.max_rows', 85)

---
---
---

#### Update columns/column names


How to update data within rows and columns

In [51]:
dft.columns

Index(['first', 'last', 'email'], dtype='object')

In [52]:
dft.columns = [ 'f', 'l', 'email']
dft

Unnamed: 0,f,l,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


Renames all the column in the dataframe.

In [53]:
# dft.columns = [ 'f', 'l']

But that method is only used when renaming all the columns in the dataframe and not when some of the columns are left

to rename specific column have to give a Dictionary (key:value pairs) like 'Old name' : 'new name'  to `.rename(columns = {})` method

In [54]:
dft.rename( columns = {'f': 'first name', 'l': 'last name'})

Unnamed: 0,first name,last name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


but the above wont change data in place so we have to use `inplace = True`

In [55]:
dft.rename( columns = {'f': 'first name', 'l': 'last name'}, inplace=True)
dft

Unnamed: 0,first name,last name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


can use string methods to uppercase the column using list comprehension

In [56]:
dft.columns = [ x.upper() for x in dft.columns ]
dft

Unnamed: 0,FIRST NAME,LAST NAME,EMAIL
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [57]:
dft.columns = dft.columns.str.lower() #can also use this method
dft

Unnamed: 0,first name,last name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


Or remove the spaces in between column names for ease of use

In [58]:
dft.columns = dft.columns.str.replace( ' ', '_')
dft

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [59]:
dft.columns = ['first', 'last', 'email'] #Back to original template

---

# Update data in rows
###### [Timestamp](https://youtu.be/DCDe29sIKcE?si=9Xbty9dqhF9it79K)

can change all values in dataframe like this

In [60]:
dft.loc[2] = ['John', 'Smith', 'JohnSMith@email.com']
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Smith,JohnSMith@email.com


if we only want to change couple of columns/values

In [61]:
dft.loc[1, ['last', 'email']] = ['Doe', 'JohnDoe@email.com']
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JohnDoe@email.com
2,John,Smith,JohnSMith@email.com


Pandas has indexer called `.at` which is meant specifically for changing single value  
###### not any difference or performance gains

In [63]:
dft.at[2, 'last'] = 'Doe'
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JohnDoe@email.com
2,John,Doe,JohnSMith@email.com


---

<mark>Errors when</mark> Indexers like `.loc` or `.at` is <mark>NOT</mark> used
###### [Timestamp](https://youtu.be/DCDe29sIKcE?si=KKqz5VgyaYQDaOui&t=722)

In [65]:
#lets create a filter
filt = dft['email'] == 'JohnDoe@email.com'
dft[filt]

Unnamed: 0,first,last,email
1,Jane,Doe,JohnDoe@email.com


In [67]:
dft[filt]['last'] = 'Smith' #if we try to assign like this

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dft[filt]['last'] = 'Smith' #if we try to assign like this


We get a warning like above

In [68]:
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JohnDoe@email.com
2,John,Doe,JohnSMith@email.com


But the name is not actually changed. 

##### SO Always use `.loc` or `.at` when setting values

In [70]:
dft.loc[filt, 'last'] = 'Smith'
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Smith,JohnDoe@email.com
2,John,Doe,JohnSMith@email.com


Above works.

---
### Update Multiple rows of data 
###### [Timestamp](https://youtu.be/DCDe29sIKcE?si=CwAJCgjQhG_i8hLo&t=935)

In [73]:
dft['email'] = dft['email'].str.lower() #one way to change mutiple rows (to lowercase)
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,coreymschafer@gmail.com
1,Jane,Smith,johndoe@email.com
2,John,Doe,johnsmith@email.com


---

#### There are 4 ways to change multiple rows at once
#####  1) Apply
            Applies on Series if applied on DataFrame, Applies on values if applied on Series
#####  2) ApplyMap
            Applies to every value in a dataframe.  
            * Only works on DataFrame * 
<mark>FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.</mark>
#####  3) Map

#####  4) Replace

---

### 1) APPLY

used for calling a function on value. can work on Datafram or Series object.  
but behaviour is different based on the type

##### APPLY on SERIES

can apply a function to every value in the series

In [74]:
dft['email'].apply(len)

0    23
1    17
2    19
Name: email, dtype: int64

In [75]:
def to_upper(email):
    return email.upper()

In [77]:
dft['email'].apply(to_upper) 
#wont change in place

0    COREYMSCHAFER@GMAIL.COM
1          JOHNDOE@EMAIL.COM
2        JOHNSMITH@EMAIL.COM
Name: email, dtype: object

`to_upper` and not `to_upper()` since we are just <mark>passing</mark> it a function (as argument) and <mark> not calling</mark>  the function

In [79]:
dft['email'] = dft['email'].apply(to_upper) # Changes in place
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,COREYMSCHAFER@GMAIL.COM
1,Jane,Smith,JOHNDOE@EMAIL.COM
2,John,Doe,JOHNSMITH@EMAIL.COM


we can use `lambda` functions in that place if we want like below.

In [80]:
dft['email'] = dft['email'].apply(lambda x: x.lower())
dft

Unnamed: 0,first,last,email
0,Corey,Schafer,coreymschafer@gmail.com
1,Jane,Smith,johndoe@email.com
2,John,Doe,johnsmith@email.com


##### APPLY on DATAFRAMES

In [82]:
# dft['email'].apply(len) # --- This returns a series so it works

dft.apply(len)

first    3
last     3
email    3
dtype: int64

instead of applying values to data in dataframes, it applies to the series instead so `len` returns the lenght of the series here 

In [85]:
len(dft['last']) # Like this

3

if we want to apply on columns we can change axis

In [86]:
dft.apply(len, axis = 'columns')

0    3
1    3
2    3
dtype: int64

We can use functions that can be applied on series object to applying them on dataframe  
like finding minimum or maximum values. Ex: `min` method of series object

In [87]:
dft.apply(pd.Series.min) #WORKS BETTER WITH NUMERICAL DATA

first                      Corey
last                         Doe
email    coreymschafer@gmail.com
dtype: object

we can also use `lambda` functions but it applies on series

In [88]:
dft.apply(lambda x: x.min()) #same result as above, applies the function on whole series

first                      Corey
last                         Doe
email    coreymschafer@gmail.com
dtype: object

#### Running apply to a Series applies function to every value in a Series
#### Running apply to a DataFrame applies function to every Series in a DataFrame

### 2) Apply Map
---

Applies function to every value in DataFrame
###### Instead of every Series in DataFrame

In [89]:
dft.applymap(len)

  dft.applymap(len)


Unnamed: 0,first,last,email
0,5,7,23
1,4,5,17
2,4,3,19


<mark>FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

In [90]:
dft.map(len) #not part of the YT video

Unnamed: 0,first,last,email
0,5,7,23
1,4,5,17
2,4,3,19


In [91]:
dft.applymap(str.lower)

  dft.applymap(str.lower)


Unnamed: 0,first,last,email
0,corey,schafer,coreymschafer@gmail.com
1,jane,smith,johndoe@email.com
2,john,doe,johnsmith@email.com


#### 3) MAP
---

It only works on Series <mark>(?)</mark>

In [92]:
dft['first'].map({'Corey': 'Chris', 'Jane': 'Mary'})

0    Chris
1     Mary
2      NaN
Name: first, dtype: object

The values that we dont substitute will change to `NaN` values

#### 4) Replace
---

but we can use `replace()` to keep unassigned values intect

In [93]:
dft['first'].replace({'Corey': 'Chris', 'Jane': 'Mary'})

0    Chris
1     Mary
2     John
Name: first, dtype: object

In [95]:
dft['first'] = dft['first'].replace({'Corey': 'Chris', 'Jane': 'Mary'}) #to apply changes
dft

Unnamed: 0,first,last,email
0,Chris,Schafer,coreymschafer@gmail.com
1,Mary,Smith,johndoe@email.com
2,John,Doe,johnsmith@email.com


---
---
#### SO Example 
###### [Timestamp](https://youtu.be/DCDe29sIKcE?si=CNkM04w4H5l2Jwfb&t=1936)

Rename column

In [98]:
df.rename(columns= {'ConvertedComp': 'SalaryUSD'})
df.head(1)

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,OrgSize,DevType,YearsCode,Age1stCode,YearsCodePro,CareerSat,JobSat,MgrIdiot,MgrMoney,MgrWant,JobSeek,LastHireDate,LastInt,FizzBuzz,JobFactors,ResumeUpdate,CurrencySymbol,CurrencyDesc,CompTotal,CompFreq,ConvertedComp,WorkWeekHrs,WorkPlan,WorkChallenge,WorkRemote,WorkLoc,ImpSyn,CodeRev,CodeRevHrs,UnitTests,PurchaseHow,PurchaseWhat,LanguageWorkedWith,LanguageDesireNextYear,DatabaseWorkedWith,DatabaseDesireNextYear,PlatformWorkedWith,PlatformDesireNextYear,WebFrameWorkedWith,WebFrameDesireNextYear,MiscTechWorkedWith,MiscTechDesireNextYear,DevEnviron,OpSys,Containers,BlockchainOrg,BlockchainIs,BetterLife,ITperson,OffOn,SocialMedia,Extraversion,ScreenName,SOVisit1st,SOVisitFreq,SOVisitTo,SOFindAnswer,SOTimeSaved,SOHowMuchTime,SOAccount,SOPartFreq,SOJobs,EntTeams,SOComm,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1
1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",United Kingdom,No,Primary/elementary school,,"Taught yourself a new language, framework, or ...",,,4,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,HTML/CSS;Java;JavaScript;Python,C;C++;C#;Go;HTML/CSS;Java;JavaScript;Python;SQL,SQLite,MySQL,MacOS;Windows,Android;Arduino;Windows,Django;Flask,Flask;jQuery,Node.js,Node.js,IntelliJ;Notepad++;PyCharm,Windows,I do not use containers,,,Yes,"Fortunately, someone else has that title",Yes,Twitter,Online,Username,2017,A few times per month or weekly,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was much faster,31-60 minutes,No,,"No, I didn't know that Stack Overflow had a jo...","No, and I don't know what those are",Neutral,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,14.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult


the advantage of `inplace=True` is we can apply only after we see the result and not mess with our original data

In [99]:
df.rename(columns= {'ConvertedComp': 'SalaryUSD'}, inplace=True)
df.head(1)

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,OrgSize,DevType,YearsCode,Age1stCode,YearsCodePro,CareerSat,JobSat,MgrIdiot,MgrMoney,MgrWant,JobSeek,LastHireDate,LastInt,FizzBuzz,JobFactors,ResumeUpdate,CurrencySymbol,CurrencyDesc,CompTotal,CompFreq,SalaryUSD,WorkWeekHrs,WorkPlan,WorkChallenge,WorkRemote,WorkLoc,ImpSyn,CodeRev,CodeRevHrs,UnitTests,PurchaseHow,PurchaseWhat,LanguageWorkedWith,LanguageDesireNextYear,DatabaseWorkedWith,DatabaseDesireNextYear,PlatformWorkedWith,PlatformDesireNextYear,WebFrameWorkedWith,WebFrameDesireNextYear,MiscTechWorkedWith,MiscTechDesireNextYear,DevEnviron,OpSys,Containers,BlockchainOrg,BlockchainIs,BetterLife,ITperson,OffOn,SocialMedia,Extraversion,ScreenName,SOVisit1st,SOVisitFreq,SOVisitTo,SOFindAnswer,SOTimeSaved,SOHowMuchTime,SOAccount,SOPartFreq,SOJobs,EntTeams,SOComm,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1
1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",United Kingdom,No,Primary/elementary school,,"Taught yourself a new language, framework, or ...",,,4,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,HTML/CSS;Java;JavaScript;Python,C;C++;C#;Go;HTML/CSS;Java;JavaScript;Python;SQL,SQLite,MySQL,MacOS;Windows,Android;Arduino;Windows,Django;Flask,Flask;jQuery,Node.js,Node.js,IntelliJ;Notepad++;PyCharm,Windows,I do not use containers,,,Yes,"Fortunately, someone else has that title",Yes,Twitter,Online,Username,2017,A few times per month or weekly,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was much faster,31-60 minutes,No,,"No, I didn't know that Stack Overflow had a jo...","No, and I don't know what those are",Neutral,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,14.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult


we can use `map` for columns with Yes/No values to change it into `True` or `False`

In [None]:
df['Hobbyist']

In [100]:
df['Hobbyist'].map({'Yes': 'True', 'No': 'False'})

Respondent
1         True
2        False
3         True
4        False
5         True
         ...  
88377     True
88601    False
88802    False
88816    False
88863     True
Name: Hobbyist, Length: 88883, dtype: object

Theres no `implace` for Map so we have to set that Series = Mapped version of that series

! Map replaces the empty values with `NaN`

so we can use `replace` in case we dont want to do that