# Adding Column to Dataframe

New columns can be added to dataframe using the pandas `apply()` function and direct dataframe operations. 



## How to create a new column on a existing pandas dataframe.

In this article, you will learn how to add a new column to an existing dataframe using -

*   Pandas apply function.
*   Direct operations on columns.

Let's create a Covid-19 dataframe. 

In [None]:
# Importing library
import pandas as pd

# Create dataframe consisting covid symptoms
df = pd.DataFrame({'Name': ['A','B','C','D'],
                   'Gender': ['M','F','F','M'], 
                   'Cough': [0,1,0,1], 
                   'Fever': [1,0,0,0],
                   'Headache': [1,0,1,1],
                   'Dyspnea': [1,0,1,1],
                   'Contact_with_Covid_patient': [1,0,1,0] })

# Here M: Male, F: Female, 0: No, 1: Yes
df

Unnamed: 0,Name,Gender,Cough,Fever,Headache,Dyspnea,Contact_with_Covid_patient
0,A,M,0,1,1,1,1
1,B,F,1,0,0,0,0
2,C,F,0,0,1,1,1
3,D,M,1,0,1,1,0


## 1.Create a new column 'Risk Coefficient' using pandas apply function

Let's see the syntax of apply() function.

### dataframe.apply() 

__DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), kwargs)__

__Purpose:__ add a new column to an existing dataframe.

__Parameters:__

 __- func:__ function to be applied on row or column.

 __- axis:__  0 or ‘index’ applies function to each column. 1 or ‘columns’ applies function to each row.

 __- raw:__ determines if a row or column is passed as a series or ndarray object.

 __- result_type:__ only acts when axis = 1.

 __- args:__ positional arguments.

 __- kwargs:__ keyworded arguments.
 

__Returns:__ Pandas series.

In [None]:
# Create new column using apply function

df['Risk_coef'] = df.apply(lambda row: (row.Cough*0.1) + (row.Fever*0.3) + (row.Headache*0.1) + (row.Dyspnea*0.1) + 
                  (row.Contact_with_Covid_patient*0.4), axis = 1)
df

Unnamed: 0,Name,Gender,Cough,Fever,Headache,Dyspnea,Contact_with_Covid_patient,Risk_coef
0,A,M,0,1,1,1,1,0.9
1,B,F,1,0,0,0,0,0.1
2,C,F,0,0,1,1,1,0.6
3,D,M,1,0,1,1,0,0.3


The lambda function will perform the operation on every row.

## 2.Create a new column 'Risk Coefficient' using column operations

In [None]:
# Create new column using column operations
df['Risk_coef'] = df['Cough']*0.1 + df['Fever']*0.3 + df['Headache']*0.1 + df['Dyspnea']*0.1 + df['Contact_with_Covid_patient']*0.4
df

Unnamed: 0,Name,Gender,Cough,Fever,Headache,Dyspnea,Contact_with_Covid_patient,Risk_coef
0,A,M,0,1,1,1,1,0.9
1,B,F,1,0,0,0,0,0.1
2,C,F,0,0,1,1,1,0.6
3,D,M,1,0,1,1,0,0.3


New column Risk_coef is added to dataFrame.

## 3.Practical Tips

* Try to avoid apply function on big dataframes because apply function is slow as it iteratively applies the function to each row or column. This consumes lot of memory.
* Make sure that while performing division operations on columns, any number (except 0) divided by 0 would give ```inf``` whereas ```0/0``` would give a ```NaN```.

## 4.Test Your Knowledge

__Q1.__ Is it necessary to use `axis = 1` in apply function ?

__Ans.__ Yes, `axis = 1` will result operations on columns.



__Q2.__ Which pandas function allows to manipulate data and create new variables ?

`A. read_csv()` `B. merge()` `C. apply()` `D. pivot_table()`

__Ans.__ C. `apply()`



__Q3.__ What would be the value at C3 column's first row (`NaN` or `inf`) ?

In [None]:
import pandas as pd

df = pd.DataFrame({'C1': [0,20,30],'C2': [0,2,3]})
df['C3'] = df['C1']/df['C2']

__Ans.__ `NaN`.

In [None]:
df

Unnamed: 0,C1,C2,C3
0,0,0,
1,20,2,10.0
2,30,3,10.0
