**05: Modifying Data - Updating Rows/Columns within a Dataframe**
- Updating Columns
- Updating Rows

In [4]:
import pandas as pd

In [6]:
students = {
    'names': ['Tom', 'Bob', 'Jane', 'May'],
    'age': [9, 10, 10, 9],
    'subjects': ['Science', 'Arts', 'Hybrid', 'Arts'],
    'award winner': [True, False, False, True]
}

In [8]:
df = pd.DataFrame(students)
df

Unnamed: 0,names,age,subjects,award winner
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True


***
Updating Columns (Labels):
- The list returned by df.columns is mutable. Values can be directly passed into the list to update data. Alternatively, list comprehensions and other pandas methods can be used.
- If columns are given new names, the new list must be the same length as the original one.

In [13]:
df.columns = [x.upper() for x in df.columns]
df

Unnamed: 0,NAMES,AGE,SUBJECTS,AWARD WINNER
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True


In [23]:
df.columns = df.columns.str.replace(' ', '_')
df
#pandas .str.replace(x,y) method, which replaces x with y for any x occurences in any column label

Unnamed: 0,NAMES,AGE,SUBJECTS,AWARD_WINNER
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True


In [27]:
df.rename(columns={'NAMES': 'FIRST_NAME', 'SUBJECTS': 'SUBJECT'}, inplace=True)
df
#pandas .rename method, taking in a columns argument. renames key to value
#inplace=True required

Unnamed: 0,FIRST_NAME,AGE,SUBJECT,AWARD_WINNER
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True


***

In [59]:
df = pd.DataFrame(students)
df

Unnamed: 0,names,age,subjects,award winner
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True


***
Updating Rows:
- First recall that grabbing a row returns a series. The Python counterpart of a series is a list. Similarly, it is mutable and values can be passed into it to update data.
- Do not modify values from a filtered dataframe. It returns a throwaway section of the dataframe for viewing purposes (something like a 'clone' that is discarded after it is returned)

In [61]:
df.loc[2, 'award winner'] = True
df
#rather than modifying the whole list, use loc to change a specific value 
#when setting with an iterable, keys and value must have the same length

Unnamed: 0,names,age,subjects,award winner
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,True
3,May,9,Arts,True


In [76]:
df['names'] = df['names'].str.lower()
df
#pandas .str retrieves string casted values from the series

Unnamed: 0,names,age,subjects,award winner
0,tom,9,Science,True
1,bob,10,Arts,False
2,jane,10,Hybrid,True
3,may,9,Arts,True


For advanced row manipulation, there are generally 4 methods used:
- .apply
- .applymap
- .map
- .replace

1. Apply
   - applies a specified function on a dataframe/series
   - applying on series: applies function on all values in the series
   - applying on dataframe: applies function on all series in the dataframe

In [82]:
df['subjects'].apply(len)

0    7
1    4
2    6
3    4
Name: subjects, dtype: int64

In [108]:
df['names'].apply(lambda x: x.upper())
#does not modify original dataframe - assign this to the column list

0     TOM
1     BOB
2    JANE
3     MAY
Name: names, dtype: object

In [132]:
df.apply(len, axis=0)
#axis = 0 -> applies function vertically (in a column)

names           4
age             4
subjects        4
award winner    4
dtype: int64

In [134]:
df.apply(len, axis=1)
#axis = 1 -> applies function horizontally (in a row)

0    4
1    4
2    4
3    4
dtype: int64

In [136]:
df.apply(pd.Series.min)
#pandas .Series.min which returns a series containing the lowest value for each column
#for strings, alphabetical order is used
#axis can be changed, but note data cannot be of different types (str/int etc.)

names             bob
age                 9
subjects         Arts
award winner    False
dtype: object

2. Applymap
   - applies a specified function on ALL values in a dataframe
   - does not accept series
   - *this method has since been deprecated in favor of map*

3. Map
   - applying a function on a dataframe/series
   - substituting each value in a series/dataframe with another value. however, this leaves other unsubstituted methods as NaN values. to avoid this, use .replace
   - its versatality makes it highly used over apply/applymap

In [197]:
df2 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
df2

Unnamed: 0,A,B,C
0,1,4,7
1,2,5,8
2,3,6,9


In [199]:
df2.map(lambda x: x + 1)
#mapping over a dataframe returns a dataframe
#to modify the original dataframe, assign to it

Unnamed: 0,A,B,C
0,2,5,8
1,3,6,9
2,4,7,10


In [212]:
df2['B'].map(lambda x: x + 1)
#mapping over a series also returns a series
#to modify the original series/dataframe, assign to it

0    5
1    6
2    7
Name: B, dtype: int64

In [214]:
df['subjects'].map({'Arts': 'History'})
#NaN values for non-substituted values
#to modify the original series/dataframe, assign to it

0        NaN
1    History
2        NaN
3    History
Name: subjects, dtype: object

4. Replace
   - substitutes values, while keeping unsubstituted values unchanged, unlike for .map

In [217]:
df['subjects'].replace({'Arts': 'History'})
#to modify the original series/dataframe, assign to it

0    Science
1    History
2     Hybrid
3    History
Name: subjects, dtype: object