**06: Entering Data - Adding & Removing Rows/Columns**
- Combining Columns

In [22]:
import pandas as pd

In [24]:
students = {
    'names': ['Tom', 'Bob', 'Jane', 'May'],
    'age': [9, 10, 10, 9],
    'subjects': ['Science', 'Arts', 'Hybrid', 'Arts'],
    'award winner': [True, False, False, True]
}

In [117]:
df = pd.DataFrame(students)
df

Unnamed: 0,names,age,subjects,award winner
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True


***
Combining & Splitting Columns:
- Combining done through concatenation just like in strings.
- Splitting is done through the .str.split method, with an expand parameter

In [108]:
df['intro'] = df['names'] + ', ' + df['subjects']
#concatenation - only works if all objects are strings

In [110]:
df

Unnamed: 0,names,age,subjects,award winner,intro
0,Tom,9,Science,True,"Tom, Science"
1,Bob,10,Arts,False,"Bob, Arts"
2,Jane,10,Hybrid,False,"Jane, Hybrid"
3,May,9,Arts,True,"May, Arts"


In [116]:
df['intro'].str.split(',', expand=True)
#returns a dataframe (2 columns)
#to split into original dataframe, assign df[['names', 'subjects']] to this dataframe

Unnamed: 0,0,1
0,Tom,Science
1,Bob,Arts
2,Jane,Hybrid
3,May,Arts


***
Adding Columns:
- Simply define a column. However, this adds the column to the last position. Additionally, if the column already exists, it is instead updated.
- To specify a position, use .insert instead -- .insert(position, label, values)
- allow_duplicates is a parameter accepted by .insert that allows it to create columns with labels that already exist
- Modifies the original dataframe

In [93]:
df['class'] = ['1A', '2A', '2B', '1B']
df

Unnamed: 0,names,age,subjects,award winner,class
0,Tom,9,Science,True,1A
1,Bob,10,Arts,False,2A
2,Jane,10,Hybrid,False,2B
3,May,9,Arts,True,1B


In [95]:
df.insert(2, 'class', ['1A', '2A', '2B', '1B'], allow_duplicates=True)
df

Unnamed: 0,names,age,class,subjects,award winner,class.1
0,Tom,9,1A,Science,True,1A
1,Bob,10,2A,Arts,False,2A
2,Jane,10,2B,Hybrid,False,2B
3,May,9,1B,Arts,True,1B


***
Removing Columns:
- .drop method - enter a columns parameter. If no such column exists, it does nothing.
- Removes columns and any with the same label.
- Modifies the original dataframe
- Accepts inplace parameter

In [99]:
df.drop(columns=['class'], inplace=True)
df

Unnamed: 0,names,age,subjects,award winner
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True


***
Adding Rows:
1. ._append
   - Similar to the list.append method. This works for adding singular rows (using dictionaries) and merging Dataframes.
   - Creates new columns if keys do not already exist. Conversely, missing data creates NaN values.
   - ignore_index parameter places new data under indexes that continue from the original dataframe. If set to False, index labels are copied from the appended dataframe.
2. .concat
   - Concatenates a dataframe to another.
   - Accepts axis parameter (0 - horizontal, 1 - vertical). This is called the concatenation axis
   - Accepts join parameter. If set to 'inner', returns a dataframe only including overlapped columns (i.e no columns with NaN values)
   - Accepts verify_integrity parameter. If set to True, returns an exception when there are duplicate indexes.
   - Similarly, ignore_index = True to clear existing indexes

This method only creates a temporary-view dataframe. To modify the original dataframe, assign it to the append statement.

In [64]:
df._append({'names': 'Larry', 'age': 11, 'subjects': 'Science', 'award winner': False}, ignore_index=True)

Unnamed: 0,names,age,subjects,award winner
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True
4,Larry,11,Science,False


In [66]:
df2 = pd.DataFrame({'names': ['Shaun', 'Isaac'], 'age': [19, 16]})
df._append(df2, ignore_index=True)

Unnamed: 0,names,age,subjects,award winner
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True
4,Shaun,19,,
5,Isaac,16,,


In [96]:
df3 = pd.concat([df, df2], ignore_index=True)
df3

Unnamed: 0,names,age,subjects,award winner
0,Tom,9,Science,True
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True
4,Shaun,19,,
5,Isaac,16,,


In [101]:
df4 = pd.concat([df, df2], ignore_index=True, join = 'inner')
df4

Unnamed: 0,names,age
0,Tom,9
1,Bob,10
2,Jane,10
3,May,9
4,Shaun,19
5,Isaac,16


***
Removing Rows:
- .drop - pass in index parameter

In [128]:
df.drop(index=[2, 3])
#uses index label, similar to loc instead of iloc

Unnamed: 0,names,age,subjects,award winner
0,Tom,9,Science,True
1,Bob,10,Arts,False


Conditional dropping of values:

In [141]:
df.drop(index=df[df['subjects']=='Science'].index)
#note df['subjects'] == 'Science' is the filter itself (can be placed in a filt variable if needed)
#.index to call the integer value from the filtered out index

Unnamed: 0,names,age,subjects,award winner
1,Bob,10,Arts,False
2,Jane,10,Hybrid,False
3,May,9,Arts,True
