## Pandas Part 5: Updating Rows and Columns - Modifying Data withn DataFrames

Corey Schafer's Pandas Part 5 [tutorial](https://www.youtube.com/watch?v=DCDe29sIKcE&list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS&index=5)

In [60]:
people = {
    'first': ['Corey', 'Jane', 'John'],
    'Last': ['Schafer', 'Doe', 'Doe'],
    'email': ['CoreMSchafer@gmail.com', 'JaneDoe@email.com', 'JohnDoe@email.com']
}

In [61]:
import pandas as pd

In [62]:
df = pd.DataFrame(people)

In [63]:
df

Unnamed: 0,first,Last,email
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [64]:
df.columns

Index(['first', 'Last', 'email'], dtype='object')

In [65]:
# rename column names
df.columns = ['first_name', 'last_name', 'email']

In [66]:
df

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [67]:
df.columns = [x.upper() for x in df.columns]
# df.columns = df.columns.str.replace(' ', '_')      replaces spaces with _ in column headings

In [68]:
df

Unnamed: 0,FIRST_NAME,LAST_NAME,EMAIL
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [69]:
df.columns = [x.lower() for x in df.columns]

In [70]:
df

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [71]:
df.rename(columns={'first_name': 'first', 'last_name': 'last'}, inplace=True)

In [72]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [73]:
df.loc[2]

first                 John
last                   Doe
email    JohnDoe@email.com
Name: 2, dtype: object

In [74]:
df.loc[2] = ['John', 'Smith', 'john.smith@email.com']

In [75]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Smith,john.smith@email.com


In [76]:
df.loc[2, ['last', 'email']] = ['Doe', 'john.doe@email.com']

In [77]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,john.doe@email.com


In [78]:
df.loc[2, 'last'] = 'Smith'

In [79]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Smith,john.doe@email.com


In [80]:
# alternative to loc for single values
df.at[2, 'last'] = 'Doe'

In [81]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,john.doe@email.com


In [82]:
filt = (df['email'] == 'john.doe@email.com')
df[filt]

Unnamed: 0,first,last,email
2,John,Doe,john.doe@email.com


In [83]:
df[filt]['last']

2    Doe
Name: last, dtype: object

In [84]:
# use loc. do not do this:
df[filt]['last'] = 'Smith'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[filt]['last'] = 'Smith'


In [85]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,john.doe@email.com


In [86]:
df.loc[filt, 'last'] = 'Smith'

In [87]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Smith,john.doe@email.com


In [88]:
# returns lower case email values, however these changes have not been made to the dataframe
df['email'].str.lower()

0    coremschafer@gmail.com
1         janedoe@email.com
2        john.doe@email.com
Name: email, dtype: object

In [89]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Smith,john.doe@email.com


In [90]:
# column 'email' values updated with lowercase values returned by lower() method
df['email'] = df['email'].str.lower()

In [91]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,coremschafer@gmail.com
1,Jane,Doe,janedoe@email.com
2,John,Smith,john.doe@email.com


In [92]:
# APPLY is used to call a function on our values on a dataframe or a series object
# return the length of all email values by applying the len function
df['email'].apply(len)

0    22
1    17
2    18
Name: email, dtype: int64

In [93]:
def update_email(email):
    return email.upper()

In [94]:
# return values resulting from applying function update_email to email column values
# note dataframe has not been altered
df['email'].apply(update_email)

0    COREMSCHAFER@GMAIL.COM
1         JANEDOE@EMAIL.COM
2        JOHN.DOE@EMAIL.COM
Name: email, dtype: object

In [95]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,coremschafer@gmail.com
1,Jane,Doe,janedoe@email.com
2,John,Smith,john.doe@email.com


In [96]:
df['email'] = df['email'].apply(update_email)

In [97]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,COREMSCHAFER@GMAIL.COM
1,Jane,Doe,JANEDOE@EMAIL.COM
2,John,Smith,JOHN.DOE@EMAIL.COM


In [98]:
# example using lambda function
df['email'] = df['email'].apply(lambda x: x.lower())

# lambdas are small nameless functions
# lambda [parameters] : expressions
# is equivalent to
# def lambda(parameters):
#    return expressions

In [99]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,coremschafer@gmail.com
1,Jane,Doe,janedoe@email.com
2,John,Smith,john.doe@email.com


In [100]:
# so far we've used "apply" on series. Let's now look at using "apply" on dataframes

In [101]:
df['email'].apply(len)

0    22
1    17
2    18
Name: email, dtype: int64

In [102]:
# this returns the length of each column ie the number of rows
df.apply(len)

first    3
last     3
email    3
dtype: int64

In [103]:
len(df['email'])

3

In [104]:
# use the axis option to get the length of each row ie the number of columns
df.apply(len, axis='columns')

0    3
1    3
2    3
dtype: int64

In [105]:
# min applied to strings gives min in alphabetical order
# in dataframe 'df' apply series.min for each column
df.apply(pd.Series.min)

first                     Corey
last                        Doe
email    coremschafer@gmail.com
dtype: object

In [106]:
# equivalent example to previous using lambdas. here x is a series ie x is a column
df.apply(lambda x: x.min())

first                     Corey
last                        Doe
email    coremschafer@gmail.com
dtype: object

In [107]:
# to recap: 
# using 'apply' on a series, we apply a function to each of the values in the series
# using 'apply' on a dataframe, we apply a function to each of the series in the dataframe

# To apply a function to all the values in a dataframe, we use 'applymap' as follows:


In [108]:
df.applymap(len)
# this gives the number of characters for each element in the dataframe

Unnamed: 0,first,last,email
0,5,7,22
1,4,3,17
2,4,5,18


In [109]:
df.applymap(str.lower)

Unnamed: 0,first,last,email
0,corey,schafer,coremschafer@gmail.com
1,jane,doe,janedoe@email.com
2,john,smith,john.doe@email.com


In [110]:
# map method only works on series and is used to substitute each value with another value

In [111]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,coremschafer@gmail.com
1,Jane,Doe,janedoe@email.com
2,John,Smith,john.doe@email.com


In [112]:
# sub first names with map method using a dictionary format like this:
df['first'].map({'Corey': 'Chris', 'Jane': 'Mary'})

0    Chris
1     Mary
2      NaN
Name: first, dtype: object

In [113]:
# However, notice that using map and replacing only 2 of the 3 names, the 3rd name become NaN
# To avoid losing the 3rd name, use the replace method, like this:

In [114]:
df['first'].replace({'Corey': 'Chris', 'Jane': 'Mary'})

0    Chris
1     Mary
2     John
Name: first, dtype: object

In [115]:
# To actually save the returned values in the above examples into the dataframe we do:
df['first'] = df['first'].replace({'Corey': 'Chris', 'Jane': 'Mary'})

In [116]:
df

Unnamed: 0,first,last,email
0,Chris,Schafer,coremschafer@gmail.com
1,Mary,Doe,janedoe@email.com
2,John,Smith,john.doe@email.com
