# Add/Remove Rows and Columns from DataFrames

In [98]:
people = {
    "first": ["Corey", "Jane", "John"],
    "last": ["Schafer", "Doe", "Doe"],
    "email": ["CoreyMSchafer@gmail.com", "JaneDoe@email.com", "JohnDoe@email.com"]
}

In [99]:
import pandas as pd

In [100]:
df = pd.DataFrame(people)

In [101]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


Combine first and last name column:

In [102]:
df['first'] + ' ' + df['last']

0    Corey Schafer
1         Jane Doe
2         John Doe
dtype: object

In [103]:
df['full_name'] = df['first'] + ' ' + df['last']            #make a new coloumn out of 2 others

In [104]:
df

Unnamed: 0,first,last,email,full_name
0,Corey,Schafer,CoreyMSchafer@gmail.com,Corey Schafer
1,Jane,Doe,JaneDoe@email.com,Jane Doe
2,John,Doe,JohnDoe@email.com,John Doe


Note: Cannot use dot notation when assigning columns like this, must use brackets (python will think you're trying to assign an attribute).

To delete columns:

In [105]:
df.drop(columns=['first', 'last'])

Unnamed: 0,email,full_name
0,CoreyMSchafer@gmail.com,Corey Schafer
1,JaneDoe@email.com,Jane Doe
2,JohnDoe@email.com,John Doe


This change not yet applied to df, need to explicitly use "inplace=True".

In [106]:
df

Unnamed: 0,first,last,email,full_name
0,Corey,Schafer,CoreyMSchafer@gmail.com,Corey Schafer
1,Jane,Doe,JaneDoe@email.com,Jane Doe
2,John,Doe,JohnDoe@email.com,John Doe


In [107]:
df.drop(columns=['first', 'last'], inplace=True)

In [108]:
df

Unnamed: 0,email,full_name
0,CoreyMSchafer@gmail.com,Corey Schafer
1,JaneDoe@email.com,Jane Doe
2,JohnDoe@email.com,John Doe


To split full_name into separate columns for each part of name:

In [109]:
df['full_name'].str.split(' ')

0    [Corey, Schafer]
1         [Jane, Doe]
2         [John, Doe]
Name: full_name, dtype: object

Result is first and last name in a list.

To assign to separate columns, use expand argument:

In [110]:
df['full_name'].str.split(' ', expand=True)

Unnamed: 0,0,1
0,Corey,Schafer
1,Jane,Doe
2,John,Doe


Now set two columns in data from for these, by passing in the list:

In [111]:
df[['first', 'last']] = df['full_name'].str.split(' ', expand=True)

In [112]:
df

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe


On to adding and removing rows...

First, add a single row of data with append:

In [113]:
df._append({'first': 'Tony'})

TypeError: Can only append a dict if ignore_index=True

Error because there's no index, but if we say "ignore_index=True" the existing df will assign an index by default.

In [None]:
df._append({'first': 'Tony'}, ignore_index=True)

New name was appended, but since we only assigned one value the other cells are "NaN".

We can also append a new dataframe to an existing dataframe. First create the second dataframe:

In [None]:
people = {
    "first": ["Tony", "Steve"],
    "last": ["Stark", "Rogers"],
    "email": ["ironman@avenge.com", "cap@avenge.com"]
}
df2 = pd.DataFrame(people)

In [131]:
df2

Unnamed: 0,first,last,email
0,Tony,Stark,ironman@avenge.com
1,Steve,Rogers,cap@avenge.com


Now append, remembering to ignore_index:

In [133]:
df = df._append(df2, ignore_index=True)

In [134]:
df

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe
3,ironman@avenge.com,,Tony,Stark
4,cap@avenge.com,,Steve,Rogers


Now let's remove rows. Instead of specifying columns to drops, specify indexes:

In [135]:
df.drop(index=4)

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe
3,ironman@avenge.com,,Tony,Stark


To apply change permanently, use "inplace=True".

Can use filter with drop method by passing in indexes of filter:

In [136]:
df.drop(index=df[df['last'] == 'Doe'].index)

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
3,ironman@avenge.com,,Tony,Stark
4,cap@avenge.com,,Steve,Rogers


Corey thinks this is hard to read. Instead do this:

In [138]:
filt = df['last'] == 'Doe'
df.drop(index=df[filt].index)

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
3,ironman@avenge.com,,Tony,Stark
4,cap@avenge.com,,Steve,Rogers
