# 3.1 Adding and Removing Rows and Columns

When working with dataframes, sometimes we need to add or remove columns from the database. Perhaps you need to perform an calculation using two columns and want to store that calculation in a new column, or have too many irrelevant columns and need to reduce the size. Pandas easily lets the programmer add and remove columns.

Adding and removing rows is easy too. Let's look at how it's done.

### About the data

The data used in this notebook shows information about passengers on the *Titanic* cruiseliner, a ship which set out from Southampton, U.K. to sail across the Atlantic ocean and which tragically sank upon collision with an iceberg. The dataset contains information about each passenger's passenger class, name, sex, age, siblings, parents/children, ticket number, ticket fare, cabin number, and the embarked location. It also contains information about each passenger's survival status. This data set is extremely popular among data scientists and will facilitate demonstrations of Pandas concepts.

In [None]:
import pandas as pd
df = pd.read_csv("./data/titanic.csv")

In [None]:
df.head()

### Adding Columns
Adding columns is as simple as accessing a new column in the existing dataframe and assigning it a Series object, giving it its column name in the process. The Series has to be the same length as the number of rows in the dataframe. A new column can also be assigned a constant value.

In other words, a new column is retrieved from the dataframe (even before it is created) in the format `df['newColumn']` and then it is assigned a Series.

Usually, the new column is computed using an existing column, or several columns. In the code below, for example, a new column `FareRounded` is set to the values of the `Fare` column, after being rounded to two decimal places.

In [None]:
df['FareRounded'] = round(df['Fare'], 2)

In [None]:
df.head()

You can even add more than one column at once by specifying a list of new columns and assigning them a dataframe with the same number of columns.

In this example, we use the `.str` property and its accompanying `.split()` method to split each value in the `Name` column into two columns. Both columns are split by comma `,`. The argument to the `.split()` method `expand=True` turns the results into a dataframe, where everything on the left side of the comma (last name) is in the first column and everything on the right side of the comma (first name) is in the second column.

In [None]:
# The `.str.split()` method with expand=True splits a Series where each value is a list into one column per item in each list.
names_df = df['Name'].str.split(",", expand=True)
names_df

Notice that the code above didn't modify the original dataframe, but instead created a new dataframe and assigned it to the variable `names_df`. The `expand=True` argument forced the `.split()` method to return a dataframe instead of a Series. Only a dataframe with two columns can be assigned to two columns in a dataframe simultaneously.

Because a two-column dataframe was created above, we can create two columns at the same time by setting two columns equal to the `names_df` dataframe. We can create two columns at once by passing in a list of column names in the format `df[ [col1, col2] ]` and setting them equal to a dataframe with two columns.

In [None]:
# Add the two columns from the dataframe returned above to the current dataframe.
df[['LastName', 'FirstMiddleName']] = names_df

Run the code below to see how the new columns were created.

In [None]:
df.head()

### Removing columns

If you want to get rid of a column, you can do so with the `.drop()` method. Note that this method does not change the original dataframe but instead returns a new dataframe. You can tell the method to modify the original dataframe by passing in `inplace=True`.

You can drop a single column by passing in its name or multiple columns by passing in a list of column names. Note the need for the `columns` argument.

In [None]:
df.drop(columns='Cabin') # The column is dropped, but the original dataframe isn't changed.
df.head()

The `inplace=True` argument modifies the original dataframe.

In [None]:
df.drop(columns=['Cabin', 'Ticket'], inplace=True) # Multiple columns are dropped from the original dataframe.
df.head()

### Adding rows
Adding rows isn't something you will typically do in Pandas, since data will likely be provided for you in the data file or database. Sometimes, however, you may have data coming from other sources that need to be combined into a single dataframe.

The `concat()` function is a **Pandas function** (not a dataframe method) that takes in a list of dataframe or Series objects and puts them on top of each other. The data to be added *must* be either a dataframe or a Series. Because Series usually represent a single column, I would advise that new data be converted to a *dataframe* before it is concatted to the original dataframe.

This means that if new data is defined as a dictionary (as seen below), it should be reformatted and then converted to a dataframe with the `DataFrame()` function.

Pass in `ignore_index=True` to the `concat()` function to give each row a unique named index and not keep their original named index.

In [None]:
new_titanic_passenger = { # Notice that not all fields have to be defined.
    'PassengerId': [999],
    'Survived': [1],
    'Pclass':3,
    'Name': ['Vespucci, Mr. Amerigo'],
    'Sex': ['male'],
    'Age': [57]
}

new_df = pd.DataFrame(new_titanic_passenger)

df = pd.concat([df, new_df], ignore_index=True) # ignore_index=True makes the data reset the named
                                                # index numbers-- remove it to see what happens!
df.tail()

### Removing Rows

You can remove rows from a dataframe in the same way that you remove columns, with the `.drop()` method. This time, however, you will pass in a single index or a list of named row indexes to the `index` argument. Note again that the original dataframe isn't modified unless we pass in the argument `inplace=True`.

In [None]:
df.drop(index=891) # Dropping a single row whose named index is 891, but not saving to original dataframe.

In [None]:
df.drop(index=[0, 1, 2]) # Dropping several rows by passing in a list of indexes

You can access the indexes of each row in a dataframe with the `.index` property. This is useful when dropping rows based on a condition or filter.

In the code below, for example, a filter called `filt` is created which returns a Series of `True`/`False` values where `Pclass` is 3. Then, a variable called `indexes_to_drop` is created that has the filtered dataframe and the indexes of each row. This Series of indexes can then be passed into the `index` argument of the `.drop()` method to drop only rows where `Pclass` is 3.

In [None]:
filt = df['Pclass'] == 3
indexes_to_drop = df[filt].index
df.drop(index=indexes_to_drop, inplace=True) # Overriding original dataframe
df.head()

Powerfully, we can also combine the code above into a single line.

In [None]:
df.drop(index=df[ df['Pclass'] == 3 ].index, inplace=True)
df