# Add/Remove Rows and Columns from DataFrames

In [17]:
people = {
    "first": ["Corey", "Jane", "John"],
    "last": ["Schafer", "Doe", "Doe"],
    "email": ["CoreyMSchafer@gmail.com", "JaneDoe@email.com", "JohnDoe@email.com"]
}

In [18]:
import pandas as pd

In [19]:
df = pd.DataFrame(people)

In [20]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


Combine first and last name column:

In [21]:
df['first'] + ' ' + df['last']

0    Corey Schafer
1         Jane Doe
2         John Doe
dtype: object

In [22]:
df['full_name'] = df['first'] + ' ' + df['last']

In [23]:
df

Unnamed: 0,first,last,email,full_name
0,Corey,Schafer,CoreyMSchafer@gmail.com,Corey Schafer
1,Jane,Doe,JaneDoe@email.com,Jane Doe
2,John,Doe,JohnDoe@email.com,John Doe


Note: Cannot use dot notation when assigning columns like this, must use brackets (python will think you're trying to assign an attribute).

To delete columns:

In [24]:
df.drop(columns=['first', 'last'])

Unnamed: 0,email,full_name
0,CoreyMSchafer@gmail.com,Corey Schafer
1,JaneDoe@email.com,Jane Doe
2,JohnDoe@email.com,John Doe


This change not yet applied to df, need to explicitly use "inplace=True".

In [25]:
df

Unnamed: 0,first,last,email,full_name
0,Corey,Schafer,CoreyMSchafer@gmail.com,Corey Schafer
1,Jane,Doe,JaneDoe@email.com,Jane Doe
2,John,Doe,JohnDoe@email.com,John Doe


In [26]:
df.drop(columns=['first', 'last'], inplace=True)

In [27]:
df

Unnamed: 0,email,full_name
0,CoreyMSchafer@gmail.com,Corey Schafer
1,JaneDoe@email.com,Jane Doe
2,JohnDoe@email.com,John Doe


To split full_name into separate columns for each part of name:

In [28]:
df['full_name'].str.split(' ')

0    [Corey, Schafer]
1         [Jane, Doe]
2         [John, Doe]
Name: full_name, dtype: object

Result is first and last name in a list.

To assign to separate columns, use expand argument:

In [29]:
df['full_name'].str.split(' ', expand=True)

Unnamed: 0,0,1
0,Corey,Schafer
1,Jane,Doe
2,John,Doe


Now set two columns in data from for these, by passing in the list:

In [30]:
df[['first', 'last']] = df['full_name'].str.split(' ', expand=True)

In [31]:
df

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe


On to adding and removing rows...

First, add a single row of data with append:

In [32]:
df.append({'first': 'Tony'})

TypeError: Can only append a Series if ignore_index=True or if the Series has a name

Error because there's no index, but if we say "ignore_index=True" the existing df will assign an index by default.

In [33]:
df.append({'first': 'Tony'}, ignore_index=True)

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe
3,,,Tony,


New name was appended, but since we only assigned one value the other cells are "NaN".

We can also append a new dataframe to an existing dataframe. First create the second dataframe:

In [34]:
people = {
    "first": ["Tony", "Steve"],
    "last": ["Stark", "Rogers"],
    "email": ["ironman@avenge.com", "cap@avenge.com"]
}
df2 = pd.DataFrame(people)

In [35]:
df2

Unnamed: 0,first,last,email
0,Tony,Stark,ironman@avenge.com
1,Steve,Rogers,cap@avenge.com


Now append, remembering to ignore_index:

In [36]:
df.append(df2, ignore_index=True)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort,


Unnamed: 0,email,first,full_name,last
0,CoreyMSchafer@gmail.com,Corey,Corey Schafer,Schafer
1,JaneDoe@email.com,Jane,Jane Doe,Doe
2,JohnDoe@email.com,John,John Doe,Doe
3,ironman@avenge.com,Tony,,Stark
4,cap@avenge.com,Steve,,Rogers


Added new rows. Reason for the warning in that we didn't pass all columns in same order. In future will set sort to False by default.

If option "sort=False" is set, warning will be suppressed.

The append method doesn't have an "inplace" argument, so we have to redefine df to make permanent:

In [37]:
df = df.append(df2, ignore_index=True, sort=False)

In [38]:
df

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe
3,ironman@avenge.com,,Tony,Stark
4,cap@avenge.com,,Steve,Rogers


Now let's remove rows. Instead of specifying columns to drops, specify indexes:

In [39]:
df.drop(index=4)

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe
3,ironman@avenge.com,,Tony,Stark


To apply change permanently, use "inplace=True".

Can use filter with drop method by passing in indexes of filter:

In [40]:
df.drop(index=df[df['last'] == 'Doe'].index)

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
3,ironman@avenge.com,,Tony,Stark
4,cap@avenge.com,,Steve,Rogers


In [42]:
df

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe
3,ironman@avenge.com,,Tony,Stark
4,cap@avenge.com,,Steve,Rogers


Corey thinks this is hard to read. Instead do this:

In [43]:
filt = df['last'] == 'Doe'
df.drop(index=df[filt].index)

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
3,ironman@avenge.com,,Tony,Stark
4,cap@avenge.com,,Steve,Rogers
