# Preparing Data: Adding Columns
If you want to type along with me, use [this notebook](https://humboldt.cloudbank.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fbethanyj0%2Fdata271_sp24&branch=main&urlpath=tree%2Fdata271_sp24%2Fdemos%2Fdata271_demo25_live.ipynb) instead. 
If you don't want to type and want to follow along just by executing the cells, stay in this notebook. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
import seaborn as sns
import re
sns.set_style("darkgrid")
import warnings 
warnings.filterwarnings('ignore') 

## Adding independent columns

### Method 1: Add to the end

In [None]:
# Creating animals dataframe
animals_dct = {
    'Animal': ['cow', 'kitten', 'penguin', 'Puppy'],
    'Sound': ['moo', 'purr', 'chirp', 'bark'],
}

animals = pd.DataFrame(animals_dct)
animals

In [None]:
# add a column on the right (broadcasting)

animals

In [None]:
# add a column on the right (list)
animals = pd.DataFrame(animals_dct)

animals

### Method 2: Insert the column in specific position

In [None]:
# Insert at a specific position with .insert(loc, column, value)
animals = pd.DataFrame(animals_dct)

animals

In [None]:
# or do it with a list
animals = pd.DataFrame(animals_dct)

animals

### Method 3: Add more than 1 column at once

In [None]:
# create two columns at once


In [None]:
animals

## Adding columns based on other columns

In [None]:
# Making a bool column based on condition

animals.head()

In [None]:
# Making a categorical column based on another categorical column with map

animals.head()

In [None]:
# applying a function to everything in another column

animals.head()

In [None]:
# Creating new columns with element-wise arithmetic

animals.head()

In [None]:
# With string methods

animals.head()

In [None]:
# With list comprehension

animals.head()

## Activity

Let's revisit the titanic data from Wednesday.

In [None]:
# load titanic data 
titanic = sns.load_dataset('titanic')
titanic.head()

**Activity 1:** Create a new column called `my_alive` based on the original `survived` column. It should be "no" when `survived` is 0, and "yes" when `survived` is 1. Your goal is to make it match the `alive` column which was added by the Seaborn creators. 

In [None]:
# run this to check your work
(titanic['my_alive'] == titanic['alive']).all()

**Activity 2:** Create a new column called `my_alone` based on the original `sibsp` and `parch` columns. It should be False if the passenger had any family members on board, True otherwise. Your goal is to make it match the `alone` column which was added by the Seaborn creators. 

In [None]:
# run this to check your work
(titanic['my_alone'] == titanic['alone']).all()

**Activity 3 *CHALLENGE*:** Create a new column called `my_who` based on the original `sex` and `age` columns. If the passenger is under 16, they should be labelled as "child", if they are over 16 and are `sex` "male" they should be labelled "man", and if they are over 16 and are `sex` "female" they should be labelled "woman." Your goal is to make it match the `who` column which was added by the Seaborn creators. 

*HINT*: You might find it helpful to split this one up and solve in multiple lines of code. 

In [None]:
# run this to check your work
(titanic['my_who'] == titanic['who']).all()

### Code for Discussion Questions

In [None]:
flights = pd.read_csv('flight_delays.csv')
flights.head()

In [None]:
flights['Year'] = flights['Flight_Date'].str.split("-").str[0]

In [None]:
flights = flights.sort_values(by='Year')

In [None]:
sns.set(font_scale = 1.8)
fig = sns.displot(data = flights, x  = 'Departure_Delay_Minutes', col = 'Year',binwidth=5)
fig.set(xlim=(-20, 100))
plt.tight_layout()