# Pandas - Add/remove rows and columns from dataframes

## Table of contents

* [Adding/removing columns](#Adding/removing-columns)
    * [Adding columns](#Adding-columns)
    * [Removing columns](#Removing-columns)
        * [Removing a SINGLE column: `DataFrame.drop(columns=<column_name>)` method](#Removing-a-SINGLE-column:-DataFrame.drop(columns=<column_name>)-method)
        * [Removing MULTIPLE columns: `DataFrame.drop(columns=[list])` method](#Removing-MULTIPLE-columns:-DataFrame.drop(columns=[list])-method)
    * [Splitting one column into two columns](#Splitting-one-column-into-two-columns)
* [Adding/removing rows of data](#Adding/removing-rows-of-data)
    * [Adding single row of data: `DataFrame.append({dict})`](#Adding-single-row-of-data:-DataFrame.append({dict}))
    * [Appending a dataframe with another dataframe](#Appending-a-dataframe-with-another-dataframe)
    * [Removing rows using indexes](#Removing-rows-using-indexes)
        * [Removing a SINGLE row: `DataFrame.drop(index=<row_index>)` method](#Removing-a-SINGLE-row:-DataFrame.drop(index=<row_index>)-method)
        * [Removing MULTIPLE rows: `DataFrame.drop(index=[list])` method](#Removing-MULTIPLE-rows:-DataFrame.drop(index=[list])-method)
    * [Removing rows using conditional filters](#Removing-rows-using-conditional-filters)  
    

***

In [237]:
people = {
    "first": ["Nabeel", "Jane", "John"],
    "last": ["Malik", "Doe", "Doe"],
    "email": ["nabeel_malik@email.com", "jane_doe@email.com", "john_doe@email.com"]
}

In [238]:
import pandas as pd
df_ppl = pd.DataFrame(people)

In [239]:
df_ppl

Unnamed: 0,first,last,email
0,Nabeel,Malik,nabeel_malik@email.com
1,Jane,Doe,jane_doe@email.com
2,John,Doe,john_doe@email.com


## Adding/removing columns

The cases we are going to look at here are:

1. Adding columns
2. Removing columns
    - Removing MULTIPLE columns
    - Removing SINGLE columns
3. Splitting one column into two columns

### Adding columns

Let's say we want to add a new column to our `DataFrame` with the full names:

In [240]:
df_ppl['first'] + ' ' + df_ppl['last']

0    Nabeel Malik
1        Jane Doe
2        John Doe
dtype: object

To assign the full names to a new *column*:

In [241]:
df_ppl['full'] = df_ppl['first'] + ' ' + df_ppl['last']

In [242]:
df_ppl

Unnamed: 0,first,last,email,full
0,Nabeel,Malik,nabeel_malik@email.com,Nabeel Malik
1,Jane,Doe,jane_doe@email.com,Jane Doe
2,John,Doe,john_doe@email.com,John Doe


Note: We can not use the dot (`.`) notation to assign columns like these, because Python will think that we are trying to assign an attribute onto the `DataFrame` object. Therefore, we have to use the bracket (`[]`) notation.

### Removing columns

#### Removing a SINGLE column: `DataFrame.drop(columns=<column_name>)` method

Let's say we want to remove the 'email' column:

In [243]:
df_ppl.drop(columns='email', inplace=True)

In [244]:
df_ppl

Unnamed: 0,first,last,full
0,Nabeel,Malik,Nabeel Malik
1,Jane,Doe,Jane Doe
2,John,Doe,John Doe


#### Removing MULTIPLE columns: `DataFrame.drop(columns=[list])` method

Now, since we have the *'full'* names, let's say that we want to remove the *'first'* and *'last'* names:

In [245]:
df_ppl

Unnamed: 0,first,last,full
0,Nabeel,Malik,Nabeel Malik
1,Jane,Doe,Jane Doe
2,John,Doe,John Doe


In [246]:
df_ppl.drop(columns=['first', 'last'], inplace=True)

In [247]:
df_ppl

Unnamed: 0,full
0,Nabeel Malik
1,Jane Doe
2,John Doe


Note: We could remove a SINGLE column with the `del DataFrame[<column_name>]` function as well.

### Splitting one column into two columns

In [248]:
df_ppl

Unnamed: 0,full
0,Nabeel Malik
1,Jane Doe
2,John Doe


Now let's say we want to split the *'full'* name column into 2 new columns for *'first'* and *'last'* names:

In [249]:
df_ppl['full'].str.split(' ')

0    [Nabeel, Malik]
1        [Jane, Doe]
2        [John, Doe]
Name: full, dtype: object

In [250]:
df_ppl['full'].str.split(' ', expand=True)

Unnamed: 0,0,1
0,Nabeel,Malik
1,Jane,Doe
2,John,Doe


In [251]:
df_ppl[['first', 'last']] = df_ppl['full'].str.split(' ', expand=True)

In [252]:
df_ppl

Unnamed: 0,full,first,last
0,Nabeel Malik,Nabeel,Malik
1,Jane Doe,Jane,Doe
2,John Doe,John,Doe


## Adding/removing rows of data

### Adding single row of data: `DataFrame.append({dict})`

In [253]:
people = {
    "first": ["Nabeel", "Jane", "John"],
    "last": ["Malik", "Doe", "Doe"],
    "email": ["nabeel_malik@email.com", "jane_doe@email.com", "john_doe@email.com"]
}

In [254]:
import pandas as pd
df_a = pd.DataFrame(people)

In [255]:
df_a

Unnamed: 0,first,last,email
0,Nabeel,Malik,nabeel_malik@email.com
1,Jane,Doe,jane_doe@email.com
2,John,Doe,john_doe@email.com


In [256]:
df_a = df_a.append({'first': 'Omar', 'last': 'Aziz', 'email': 'omar_aziz@email.com'}, ignore_index=True)

In [257]:
df_a

Unnamed: 0,first,last,email
0,Nabeel,Malik,nabeel_malik@email.com
1,Jane,Doe,jane_doe@email.com
2,John,Doe,john_doe@email.com
3,Omar,Aziz,omar_aziz@email.com


Note: 
- The `ignore_index=True` flag is added because without it Python will throw a TypeError as follows saying:<br> `TypeError: Can only append a Series if ignore_index=True or if the Series has a name`.


- When using the `.append() method`, if we do not provide ALL the values in a row, the missing values are going to be set to `NaN`.

### Appending a dataframe with another dataframe

In [258]:
people_2 = {
    "first": ["Tony", "Steve", "Kathy"],
    "last": ["Stark", "Rogers", "Newmann"],
    "email": ["tony_stark@email.com", "steve_rogers@email.com", "kathy_newmann@email.com"]
}

In [259]:
df_b = pd.DataFrame(people_2)

In [260]:
df_a

Unnamed: 0,first,last,email
0,Nabeel,Malik,nabeel_malik@email.com
1,Jane,Doe,jane_doe@email.com
2,John,Doe,john_doe@email.com
3,Omar,Aziz,omar_aziz@email.com


In [261]:
df_b

Unnamed: 0,first,last,email
0,Tony,Stark,tony_stark@email.com
1,Steve,Rogers,steve_rogers@email.com
2,Kathy,Newmann,kathy_newmann@email.com


In [262]:
df_a = df_a.append(df_b, ignore_index=True)

In [263]:
df_a

Unnamed: 0,first,last,email
0,Nabeel,Malik,nabeel_malik@email.com
1,Jane,Doe,jane_doe@email.com
2,John,Doe,john_doe@email.com
3,Omar,Aziz,omar_aziz@email.com
4,Tony,Stark,tony_stark@email.com
5,Steve,Rogers,steve_rogers@email.com
6,Kathy,Newmann,kathy_newmann@email.com


Note: The `.drop()` method has an `inplace=` flag, but the `.append()` method does not.

### Removing rows using indexes

In [264]:
df_a

Unnamed: 0,first,last,email
0,Nabeel,Malik,nabeel_malik@email.com
1,Jane,Doe,jane_doe@email.com
2,John,Doe,john_doe@email.com
3,Omar,Aziz,omar_aziz@email.com
4,Tony,Stark,tony_stark@email.com
5,Steve,Rogers,steve_rogers@email.com
6,Kathy,Newmann,kathy_newmann@email.com


#### Removing a SINGLE row: `DataFrame.drop(index=<row_index>)` method

In [265]:
df_a.drop(index=6, inplace=True)

In [266]:
df_a

Unnamed: 0,first,last,email
0,Nabeel,Malik,nabeel_malik@email.com
1,Jane,Doe,jane_doe@email.com
2,John,Doe,john_doe@email.com
3,Omar,Aziz,omar_aziz@email.com
4,Tony,Stark,tony_stark@email.com
5,Steve,Rogers,steve_rogers@email.com


#### Removing MULTIPLE rows: `DataFrame.drop(index=[list])` method

In [267]:
df_a.drop(index=[4,5], inplace=True)

In [268]:
df_a

Unnamed: 0,first,last,email
0,Nabeel,Malik,nabeel_malik@email.com
1,Jane,Doe,jane_doe@email.com
2,John,Doe,john_doe@email.com
3,Omar,Aziz,omar_aziz@email.com


### Removing rows using conditional filters

Now, if we want to remove all the rows with the *'last'* name *'Doe'*:

In [272]:
df_a

Unnamed: 0,first,last,email
0,Nabeel,Malik,nabeel_malik@email.com
1,Jane,Doe,jane_doe@email.com
2,John,Doe,john_doe@email.com
3,Omar,Aziz,omar_aziz@email.com


In [273]:
df_a.index

Int64Index([0, 1, 2, 3], dtype='int64')

In [274]:
type(df_a.index)

pandas.core.indexes.numeric.Int64Index

In [275]:
df_a= df_a.drop(index=df_a[df_a['last'] == 'Doe'].index)

In [276]:
df_a

Unnamed: 0,first,last,email
0,Nabeel,Malik,nabeel_malik@email.com
3,Omar,Aziz,omar_aziz@email.com
