# 06 - Add/Remove Rows and Columns From DataFrames

https://youtu.be/HQ6XO9eT-fc?si=6STRvIlEn1b-zwp8

Notes by [Innovinitylabs](https://github.com/innovinitylabs)

In [283]:
import pandas as pd

#Setup for learning data

people = {
    "first": ["Corey", 'Jane', 'John'], 
    "last": ["Schafer", 'Doe', 'Doe'], 
    "email": ["CoreyMSchafer@gmail.com", 'JaneDoe@email.com', 'JohnDoe@email.com']
}
dft = pd.DataFrame(people)

---

#Setup for Real world data

In [284]:
# df = pd.read_csv('data/survey_results_public.csv', index_col='Respondent' )
# schema_df = pd.read_csv('data/survey_results_schema.csv', index_col='Column')
# pd.set_option('display.max_columns', 85)


In [285]:
# pd.set_option('display.max_rows', 85)

---
---
---

#### Adding and removing columns

Combining two columns into one

In [286]:
dft['first'] + ' ' + dft['last']

0    Corey Schafer
1         Jane Doe
2         John Doe
dtype: object

 can assign and create a new column like this

In [287]:
dft['first_name'] = dft['first'] + ' ' + dft['last']
dft

Unnamed: 0,first,last,email,first_name
0,Corey,Schafer,CoreyMSchafer@gmail.com,Corey Schafer
1,Jane,Doe,JaneDoe@email.com,Jane Doe
2,John,Doe,JohnDoe@email.com,John Doe


to <mark>remove</mark> a column  
we have to use the `drop()` method on DataFrame  
###### [TimeStamp](https://youtu.be/HQ6XO9eT-fc?si=n4XTkr6gsG4xjlN9&t=184)

In [288]:
dft.drop(columns= ['first', 'last'])

Unnamed: 0,email,first_name
0,CoreyMSchafer@gmail.com,Corey Schafer
1,JaneDoe@email.com,Jane Doe
2,JohnDoe@email.com,John Doe


In [289]:
dft

Unnamed: 0,first,last,email,first_name
0,Corey,Schafer,CoreyMSchafer@gmail.com,Corey Schafer
1,Jane,Doe,JaneDoe@email.com,Jane Doe
2,John,Doe,JohnDoe@email.com,John Doe


to make changes in place have to use  `inplace=True`

In [290]:
dft.drop(columns= ['first', 'last'], inplace=True)
dft

Unnamed: 0,email,first_name
0,CoreyMSchafer@gmail.com,Corey Schafer
1,JaneDoe@email.com,Jane Doe
2,JohnDoe@email.com,John Doe


---
to <mark> create a new column from existing columns</mark>

In [291]:
dft['first_name'].str.split(' ')


0    [Corey, Schafer]
1         [Jane, Doe]
2         [John, Doe]
Name: first_name, dtype: object

we get first and last name in a list above. to assign this to two different columns we have to  expand `expand=True` this into two different columns like this

In [292]:
dft['first_name'].str.split(' ', expand=True)

Unnamed: 0,0,1
0,Corey,Schafer
1,Jane,Doe
2,John,Doe


In [293]:
dft

Unnamed: 0,email,first_name
0,CoreyMSchafer@gmail.com,Corey Schafer
1,JaneDoe@email.com,Jane Doe
2,JohnDoe@email.com,John Doe


In [294]:
dft[['email', 'first_name']] # we have to use double brackets to access dataframe

Unnamed: 0,email,first_name
0,CoreyMSchafer@gmail.com,Corey Schafer
1,JaneDoe@email.com,Jane Doe
2,JohnDoe@email.com,John Doe


so we have to us `[]` to create a dataframe

In [295]:
dft[['first', 'last']] = dft['first_name'].str.split(' ', expand=True)
dft#^               ^

Unnamed: 0,email,first_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe


In [296]:
#dft.columns
# column_name = ('first', 'last')
# dft.drop(column_name, axis=1, inplace=True)
# dft
# had an error with accidentally setting up column title as tuple so had to use the code above to remove

###### Result #####
#Index(['email', 'first_name', ('first', 'last'), 'first', 'last'], dtype='object')
#                                column name is a tuple

In [297]:
dft.drop(columns= ['first_name'], inplace=True)

---
#### Adding or removing data
1) Add Single row to data
2) combine two DataFrame

TO add single data to DataFrame we do that with `._append()` method   
<mark> `.append()` is deprecated </mark>

In [298]:
dft._append({'first': 'Tata'})

TypeError: Can only append a dict if ignore_index=True

It gives error because we dont have index in the DataFrame  
we can fix this with `ignoreindex=True`

In [None]:
dft._append({'first': 'Tata'}, ignore_index=True)

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
1,JaneDoe@email.com,Jane,Doe
2,JohnDoe@email.com,John,Doe
3,,Tata,


In [None]:
great_people = {
    "first": ["Ratan", 'Abdul'], 
    "last": ["Tata", 'Kalam'], 
    "email": ["ratan@tata.com", 'AbdulKalam@isro.com']
}

In [None]:
dfp = pd.DataFrame(great_people)
dfp

Unnamed: 0,first,last,email
0,Ratan,Tata,ratan@tata.com
1,Abdul,Kalam,AbdulKalam@isro.com


now this is a DataFrame we can combine this with existing DF  
they have  conflicting indexes  
and columns not in same order

In [None]:
dft._append(dfp)

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
1,JaneDoe@email.com,Jane,Doe
2,JohnDoe@email.com,John,Doe
0,ratan@tata.com,Ratan,Tata
1,AbdulKalam@isro.com,Abdul,Kalam


this may give error sometimes. i give error in video because its old pandas where it can be fixed by `ignoreindex=True` and `sort = False` (sort is False by default now)
![Alt text](06-image-01.png)

In [None]:
dft 

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
1,JaneDoe@email.com,Jane,Doe
2,JohnDoe@email.com,John,Doe
0,ratan@tata.com,Ratan,Tata
1,AbdulKalam@isro.com,Abdul,Kalam


But the data is not added to the DF  
but we have to no `inplace=True` here like drop  
so we have to assign it

we do have to use `ignoreindex=True` else pandas keep both the index and we have conflicting index like above

In [None]:
dft = dft._append(dfp, ignore_index=True)
dft

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
1,JaneDoe@email.com,Jane,Doe
2,JohnDoe@email.com,John,Doe
3,ratan@tata.com,Ratan,Tata
4,AbdulKalam@isro.com,Abdul,Kalam


now the index is from 0 - 4 instead of 0, 1, 2, 0, 1

if we want to <mark>drop a single row</mark>, we can use index of the row

In [None]:
dft.drop(index= 3, inplace=True) # have to change in place here
dft

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
1,JaneDoe@email.com,Jane,Doe
2,JohnDoe@email.com,John,Doe
4,AbdulKalam@isro.com,Abdul,Kalam


we can also use filter to delete rows

In [None]:
dft['last'] == 'Doe'

0    False
1     True
2     True
4    False
Name: last, dtype: bool

In [None]:
filt = (dft[dft['last'] == 'Doe'])
filt

Unnamed: 0,email,first,last
1,JaneDoe@email.com,Jane,Doe
2,JohnDoe@email.com,John,Doe


In [None]:
dft.drop(index= filt.index)
# dft

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
4,AbdulKalam@isro.com,Abdul,Kalam


<div class="alert alert-block alert-success">
<b>Info:</b> 
like this in video  

![Alt text](06-image-02.png)


</div>

---
---
---