# Add a column on `DataFrame`for EDA

## Summary

* Objective - Add a column on `DataFrame` without errors
* Dataset
* Instruction - Columns must have same length of the `DataFrame`

## Objective

You always get stacked on EDA because you see an error when you add a column on `DataFrame`?

If so, you need to notice the data length and type to avoid errors.

## Dataset

I created dataset for this practice. This data set has 2 columns. One has numbers between 0 and 99, and the other has alphabets.

In [1]:
import pandas as pd
import numpy as np
np.random.seed(42)

letters = [l for l in 'abcdefghijklmnopqrstuvwxyz']
numbers = [n for n in range(100)]

practice = pd.DataFrame()
practice['num'] = [np.random.choice(numbers) for i in range(100)]
practice['let'] = [np.random.choice(letters) for i in range(100)]
practice.to_csv('../data/practice.csv', index=False)

In [2]:
practice = pd.read_csv('../data/practice.csv')
practice.head()

Unnamed: 0,num,let
0,51,z
1,92,y
2,14,m
3,71,i
4,60,o


## Instruction

Make a list which has same length of the `DataFrame`.

One thing you always have to notice when you add a column on `DataFrame` is that the column must have same length of the data set.

In [3]:
# Check length of the DataFrame
len(practice)

100

This `DataFrame` has 100 rows so that let's make a list of which length is 100.

---

### Challenge 1 (Success)

Now I want to make a column which starts with `a` followed by `let` column.

In [4]:
# Make a list which starts with `a` followed by `let`

a = ['a'+l for l in practice['let']]
print(len(a))
a[:5]

100


['az', 'ay', 'am', 'ai', 'ao']

List `a` has length of 100 so that you can add this list to the `DataFrame`.

In [5]:
practice['a'] = a
practice.head()

Unnamed: 0,num,let,a
0,51,z,az
1,92,y,ay
2,14,m,am
3,71,i,ai
4,60,o,ao


---

### Challeng 2 (Fail)

I want to make a column which stores `high` where `num` is over `50`

In [7]:
high = ['high' for num in practice['num'] if num > 50]

print(len(high))
high[:5]

55


['high', 'high', 'high', 'high', 'high']

You cannot add `high` because the length is not 100. If you want to add `high`, you need to modify the list as it has 100 rows

In [None]:
# This is error
practice['high'] = high

In [8]:
high = ['high' if num > 50 else 'low' for num in practice['num']]

print(len(high))
high[:5]

100


['high', 'high', 'low', 'high', 'high']

This can be added. First `high` contains values only where `num` is over 50 so it failed. Second `high` contains values where both of `num` over 50 and under 50.

In [9]:
practice['high'] = high
practice.head()

Unnamed: 0,num,let,a,high
0,51,z,az,high
1,92,y,ay,high
2,14,m,am,low
3,71,i,ai,high
4,60,o,ao,high


### Challenge 3

I want to adda column which has capitalized `let`.

In [None]:
practice['cap'] = practice['let'].upper()

This returns error because `upper()` function is for strings but `practice['let']` is `pd.Series`.

So let's make a list using list comprehension so that I can apply `upper()` on each letters

In [10]:
practice['cap'] = [l.upper() for l in practice['let']]
practice.head()

Unnamed: 0,num,let,a,high,cap
0,51,z,az,high,Z
1,92,y,ay,high,Y
2,14,m,am,low,M
3,71,i,ai,high,I
4,60,o,ao,high,O
