# 2.6 Updating Rows and Columns
A really big part of cleaning data with Pandas is the ability to add/update rows and columns. We can do this using some Pandas functions as seen below.

### About the data
​
The data used in this notebook shows information about passengers on the *Titanic* cruiseliner, a ship which set out from Southampton, U.K. to sail across the Atlantic ocean and which tragically sank upon collision with an iceberg. The dataset contains information about each passenger's passenger class, name, sex, age, siblings, parents/children, ticket number, ticket fare, cabin number, and the embarked location. It also contains information about each passenger's survival status. This data set is extremely popular among data scientists and will facilitate demonstrations of Pandas concepts.

First, we will import pandas and read the data into a dataframe.

In [2]:
import pandas as pd
df = pd.read_csv("./data/titanic.csv")

In [3]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## Renaming Columns
There are a few different ways that we can rename columns in a dataframe. One way to do so is to set the `columns` property of the dataframe to a list of strings equal in length to the number of columns in the dataframe.

***Changing the columns in this way, however, is not easy***. You can't just change one column name at a time, but instead have to pass the full list of column names. If the dataset has 80 columns, that would be very tiresome. Below we'll see how to do it anyway.

First, let's make a copy of the dataframe using the `.copy()` method. The `.copy()` method makes a copy of the dataframe and stores it in a variable so that the original dataframe isn't accidentally changed during exploration, which would require re-importing the data. By making a copy of the dataframe, we won't accidentally change the column names of the original dataframe, which we may want again later.

In [4]:
# Make a copy of the dataframe
copy_df = df.copy()

### Using the `.columns` property
You can use the `.columns` property to get an array of columns on a dataframe. Because `.columns` is a property, it does not have parentheses.

In [5]:
# View the `columns` property of the dataframe
copy_df.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

We can then set the `.columns` property equal to a list of strings, which will be the new column names. Note that the list has to have the same number of strings as there are columns or it will not work.

In [6]:
copy_df.columns = ['ID', 'LIVED?', 'CLASS', 'FULL NAME', 'SEX', 'HOW OLD?', 'NUMBER OF SIBLINGS', 'NUMBER OF PARENTS OR CHILDREN', 'TICKET NUMBER', 'HOW MUCH PAID', 'CABIN LOCATION', 'EMBARKED WHERE?']

Now when we print out the dataframe, we can see that the column names have changed.

In [7]:
copy_df.head()

Unnamed: 0,ID,LIVED?,CLASS,FULL NAME,SEX,HOW OLD?,NUMBER OF SIBLINGS,NUMBER OF PARENTS OR CHILDREN,TICKET NUMBER,HOW MUCH PAID,CABIN LOCATION,EMBARKED WHERE?
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


### Using the `.rename()` method
The easiest way to rename a column in Pandas is by using the `.rename()` method. This method allows the programmer to change any number of column names. Let's give it a try with a new copy of the dataframe.

This method takes in a parameter called `columns`, which is then set equal to a dictionary. In this dictionary, the key (before the colon `:`) is the current name of the column and the value (after the colon `:`) is the new name of the column.

In [7]:
# Using the `.rename()` method
copy_df.rename(columns={'LIVED?': 'Survived'})
copy_df.head(2)

Unnamed: 0,ID,LIVED?,CLASS,FULL NAME,SEX,HOW OLD?,NUMBER OF SIBLINGS,NUMBER OF PARENTS OR CHILDREN,TICKET NUMBER,HOW MUCH PAID,CABIN LOCATION,EMBARKED WHERE?
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C


What do you see? We used the `.rename()` method above and there were no errors... But the column name didn't change! That's because the `.rename()` method returns an entirely new dataframe without modifying the old one. Thus, in the code above, we created a copy of the `copy_df` dataframe, changed it's column name...and then did nothing with it. It got thrown out, and then we looked at the original `copy_df` dataframe.

To get around this, many methods on Pandas dataframes have an optional parameter called `inplace`, which means "Replace the original dataframe with this new dataframe?". By default, it is set to `False`, but if we set it to `True`, we will see that the column name in the original dataframe is changed. (Note: We also could have just saved the the result of the `.rename()` method to a new dataframe)

In [9]:
copy_df.rename(columns={'LIVED?': 'Survived'}, inplace=True)
copy_df.head(2)

Unnamed: 0,ID,Survived,CLASS,FULL NAME,SEX,HOW OLD?,NUMBER OF SIBLINGS,NUMBER OF PARENTS OR CHILDREN,TICKET NUMBER,HOW MUCH PAID,CABIN LOCATION,EMBARKED WHERE?
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C


### Formatting Column Names
In some cases, you may want to change the format of column names. For example, notice how some of the column names above have spaces in them. Sometimes, these spaces might get in the way of your analysis. You can select a column that doesn't have spaces in its name by using a dot with the dataframe `df.column_name`. However, if there is a space in the column name, you can't access the column that way and must use square brackets instead `df['column_name']`.

In [10]:
copy_df.Survived

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    0
889    1
890    0
Name: Survived, Length: 891, dtype: int64

In [11]:
copy_df.FULL NAME

SyntaxError: invalid syntax (643309674.py, line 1)

You may also want all of your columns to be lowercase, uppercase, or exclude non-alphanumeric characters. We can do this with the `.str.replace()` method on the `columns` property of the dataframe. We can then use the `.str.lower()` method to make all of the letters lowercase.

Notice that we set the `columns` property of the dataframe equal to the new list of columns in which we replaced the spaces. The `.str` methods do not have an `inplace` argument.

In [None]:
# Replace spaces of the column names with underscore `_`
copy_df.columns = copy_df.columns.str.replace(" ", "_")
copy_df.columns = copy_df.columns.str.lower()
copy_df.head()

## Cleaning Data
Now that we've cleaned up the column names, we can clean up the data as well. There are four different methods that we can use to clean up the data in our dataframe, each of which has its own particular use.

### The `.apply()` method
**Applies to dataframes and Series objects.**

You may have seen before the `.max()` or `.mean()` methods used on a dataframe or Series.

In [None]:
df['Age'].mean()

#### Defining a named function
These methods are built into dataframe and Series objects. However, there may be times that you want to apply your own custom function. The `.apply()` method allows you to pass in and run a custom function over a dataframe or Series. The function will iterate through each *value in a Series* or each *row in a dataframe*.

For example, perhaps I am looking at the "Embarked" column of the dataframe and keep getting confused as to what each letter stands for. I can create a named function (a regular function) that will be run for each value in the Series and return the proper location in a full word.

In [None]:
def getFullEmbarkedLocation(letter):
    if letter == "S":
        return "Southampton"
    elif letter == "C":
        return "Cherbourg"
    else:
        return "Queenstown"

Now I apply this function (without the parentheses, we aren't calling the function but are just passing in its name) to the Series object.

In [None]:
df['Embarked'].apply(getFullEmbarkedLocation)

If I set the `Embarked` column equal to this new Series, I will see the changes reflected in the dataframe.

In [None]:
df['Embarked'] = df['Embarked'].apply(getFullEmbarkedLocation)

In [None]:
df.head()

Note: When using the `.apply()` method on a dataframe, it may be necessary to pass in the parameter `axis=1` to the method. The `axis` parameter tells the method to run the function on each column when equal to 0 or each row when equal to 1.

#### Using an anonymous (non-named) function 
Rather that defining a named function to be passed in to `.apply()`, most data scientists use what are called *lambda functions*. These are essentially non-named functions that are passed directly into another function and are not meant to be reused.

You don't have to use lambda functions, but you will probably see a lot of code that does. The syntax looks something like this:

In [None]:
# This changes "male" to "M" and "female" to "F"
df['Sex'] = df['Sex'].apply( lambda x : "F" if x == "female" else "M")
df.head()

### The `.applymap()` method
**Applies only to dataframes**

The `.applymap()` method allows you to apply a function to each individual value in the entire dataframe. This method is not commonly used but may be useful if you need to do mass conversions across several columns of data. The danger of using this method is that the function applies to all columns regardless of its data type. Thus, passing in a subtraction function, for example, would not work with our data since strings (as seen in the "Name" column) cannot be subtracted.

Here, we get the length of each value after converting it to a string. In other words, we calculate the number of characters.

In [None]:
df.applymap(lambda x : len(str(x)))

### The `.map()` method
**Applies only to Series**

The `.map()` method allows us to replace specific values with another value. Any value not specified in the method is changed to NULL (or as Pandas calls it, NaN). In this example, we change "Queenstown" and "Southampton" back to "Q" and "S". We do not include "Cherbourg" in the map.

We specify which values we want to change by passing in a dictionary where the key is the current data  and the value is the data we want to replace it with.

In [None]:
df['Embarked'].map( {'Queenstown': 'Q', 'Southampton': 'S'} ).head(6)

### The `.replace()` method
**Applies to dataframe and Series**

The `.replace()` function works in the same way as the `.map()` function but does not leave NULL (NaN) values if a key-value pair is not specified. It can also be used with an entire dataframe to replace a value anywhere it occurs.

In [None]:
df['Embarked'].replace( {'Queenstown': 'Q', 'Southampton': 'S'} ).head(6)

In [None]:
df.replace({'Queenstown': 'Q'})