# 3.1 Adding and Removing Rows and Columns

When working with dataframe, sometimes we need to add or remove columns from the database. Perhaps you need to perform an calculation using two columns and want to store that calculation in a new column, or have too many irrelevant columns and need to reduce the size. Pandas easily lets the programmer add and remove columns.

Adding and removing rows is easy too. Let's look at how it's done.

In [1]:
import pandas as pd

df = pd.read_csv("./data/titanic.csv")

In [2]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


### Adding Columns
Adding columns is as simple as accessing a new column in the existing dataframe and assigning it a Series object, giving it its column name in the process. The Series has to be the same length as the number of rows in the dataframe.

Usually, the new column is computed using an existing column, or several columns.

In [3]:
df['FareRounded'] = round(df['Fare'], 2)

In [4]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,FareRounded
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,7.25
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,71.28
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,7.92
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,53.1
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,8.05


You can even add more than one column at once by specifying a list of new columns and assigning them a dataframe with the same number of columns.

In [5]:
# The `.str.split()` method with expand=True splits a Series where each value is a list into one column per item in each list.
df['Name'].str.split(",", expand=True)

Unnamed: 0,0,1
0,Braund,Mr. Owen Harris
1,Cumings,Mrs. John Bradley (Florence Briggs Thayer)
2,Heikkinen,Miss. Laina
3,Futrelle,Mrs. Jacques Heath (Lily May Peel)
4,Allen,Mr. William Henry
...,...,...
886,Montvila,Rev. Juozas
887,Graham,Miss. Margaret Edith
888,Johnston,"Miss. Catherine Helen ""Carrie"""
889,Behr,Mr. Karl Howell


In [6]:
# Add the two columns from the dataframe returned above to the current dataframe.
df[['LastName', 'FirstMiddleName']] = df['Name'].str.split(",", expand=True)

In [7]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,FareRounded,LastName,FirstMiddleName
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,7.25,Braund,Mr. Owen Harris
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,71.28,Cumings,Mrs. John Bradley (Florence Briggs Thayer)
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,7.92,Heikkinen,Miss. Laina
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,53.1,Futrelle,Mrs. Jacques Heath (Lily May Peel)
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,8.05,Allen,Mr. William Henry


### Removing columns

If you want to get rid of a column, you can do so with the `.drop()` method. Note that this method does not change the original dataframe but instead returns a new dataframe. You can tell the method to modify the original dataframe by passing in `inplace=True`.

You can drop a single column by passing in its name or multiple columns by passing in a list of column names. Note the need for the `columns` argument.

In [8]:
df.drop(columns='Cabin') # The column is dropped, but the original dataframe isn't changed.
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,FareRounded,LastName,FirstMiddleName
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,7.25,Braund,Mr. Owen Harris
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,71.28,Cumings,Mrs. John Bradley (Florence Briggs Thayer)
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,7.92,Heikkinen,Miss. Laina
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,53.1,Futrelle,Mrs. Jacques Heath (Lily May Peel)
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,8.05,Allen,Mr. William Henry


In [9]:
df.drop(columns=['Cabin', 'Ticket'], inplace=True) # Multiple columns are dropped from the original dataframe.
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Fare,Embarked,FareRounded,LastName,FirstMiddleName
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,7.25,S,7.25,Braund,Mr. Owen Harris
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,71.2833,C,71.28,Cumings,Mrs. John Bradley (Florence Briggs Thayer)
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,7.925,S,7.92,Heikkinen,Miss. Laina
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,53.1,S,53.1,Futrelle,Mrs. Jacques Heath (Lily May Peel)
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,8.05,S,8.05,Allen,Mr. William Henry


### Adding rows
Adding rows isn't something you will typically do in Pandas, since data will likely be provided for you in the data file or database. Sometimes, however, you may have data coming from sources that need to be combined into a single dataframe.

The `.concat()` function is a **Pandas function** (not a dataframe method) that takes in a list of dataframe or Series objects and puts them on top of each other. If you need to add a single row of data, you can just turn it into a dataframe.

Pass in `ignore_index=True` to the `.concat()` function to give each row a unique index and not keep their original index.

In [10]:
new_titanic_passenger = { # Notice that not all fields have to be defined.
    'PassengerId': [999],
    'Survived': [1],
    'Pclass':3,
    'Name': ['Vespucci, Mr. Amerigo'],
    'Sex': ['male'],
    'Age': [57]
}

new_df = pd.DataFrame(new_titanic_passenger)

df = pd.concat([df, new_df], ignore_index=True) # ignore_index makes the data reset the index 
                                                # numbers-- remove it to see what happens!
df.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Fare,Embarked,FareRounded,LastName,FirstMiddleName
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0.0,0.0,30.0,S,30.0,Graham,Miss. Margaret Edith
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1.0,2.0,23.45,S,23.45,Johnston,"Miss. Catherine Helen ""Carrie"""
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0.0,0.0,30.0,C,30.0,Behr,Mr. Karl Howell
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0.0,0.0,7.75,Q,7.75,Dooley,Mr. Patrick
891,999,1,3,"Vespucci, Mr. Amerigo",male,57.0,,,,,,,


### Removing Rows

You can remove rows from a dataframe in the same way that you remove columns, with the `.drop()` method. This time, however, you will pass in a list of row indexes to the `indexes` argument.

In [11]:
df.drop(index=891, inplace=False) # Dropping the recently added row, but not saving to original dataframe.

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Fare,Embarked,FareRounded,LastName,FirstMiddleName
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1.0,0.0,7.2500,S,7.25,Braund,Mr. Owen Harris
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1.0,0.0,71.2833,C,71.28,Cumings,Mrs. John Bradley (Florence Briggs Thayer)
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0.0,0.0,7.9250,S,7.92,Heikkinen,Miss. Laina
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1.0,0.0,53.1000,S,53.10,Futrelle,Mrs. Jacques Heath (Lily May Peel)
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0.0,0.0,8.0500,S,8.05,Allen,Mr. William Henry
...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0.0,0.0,13.0000,S,13.00,Montvila,Rev. Juozas
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0.0,0.0,30.0000,S,30.00,Graham,Miss. Margaret Edith
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1.0,2.0,23.4500,S,23.45,Johnston,"Miss. Catherine Helen ""Carrie"""
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0.0,0.0,30.0000,C,30.00,Behr,Mr. Karl Howell


You can access the indexes of each row in a dataframe with the `.index` property. This is useful when dropping rows based on a condition or filter.

In [12]:
filt = df['Pclass'] == 2
df.drop(index=df[ filt ].index, inplace=True) # Overriding original dataframe
df.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Fare,Embarked,FareRounded,LastName,FirstMiddleName
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0.0,0.0,30.0,S,30.0,Graham,Miss. Margaret Edith
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1.0,2.0,23.45,S,23.45,Johnston,"Miss. Catherine Helen ""Carrie"""
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0.0,0.0,30.0,C,30.0,Behr,Mr. Karl Howell
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0.0,0.0,7.75,Q,7.75,Dooley,Mr. Patrick
891,999,1,3,"Vespucci, Mr. Amerigo",male,57.0,,,,,,,
