# SQL Union all behaviour

The default behaviour for `pandas.concat` is not to remove duplicates!
Use `ignore_index=True` to make sure sure the index gets reset in the new dataframe.

In [3]:
import pandas as pd

In [13]:
df1 = pd.DataFrame({
    'name': ['john', 'mary'],
    'age': [24, 45]
})

df2 = pd.DataFrame({
    'name': ['mary', 'john'],
    'age': [45, 89]
})

In [5]:
df1

Unnamed: 0,name,age
0,john,24
1,mary,45


In [6]:
df2

Unnamed: 0,name,age
0,mary,45
1,john,89


In [8]:
pd.concat([df1, df2])
## If you do not mention ignore_index=True, it will keep the priginal indicies

Unnamed: 0,name,age
0,john,24
1,mary,45
0,mary,45
1,john,89


In [9]:
pd.concat([df1, df2], ignore_index=True)

Unnamed: 0,name,age
0,john,24
1,mary,45
2,mary,45
3,john,89


Union of Dataframe 1 and 2:
Note that the `index was reset` and
the `duplicate row was NOT removed`

# SQL Union Behaviour
In SQL, the `union` keyword implies that duplicates are removed:

In [14]:
pd.concat([df1, df2], ignore_index=True).drop_duplicates()

Unnamed: 0,name,age
0,john,24
1,mary,45
3,john,89


We do not have duplicates but index must be fixed so:

In [18]:
pd.concat([df1, df2], ignore_index=True).drop_duplicates().reset_index()

Unnamed: 0,index,name,age
0,0,john,24
1,1,mary,45
2,3,john,89


As we can see if we don't use drop=True in reset_index() we will have 2 indicies column

In [19]:
pd.concat([df1, df2], ignore_index=True).drop_duplicates().reset_index(drop =True)

Unnamed: 0,name,age
0,john,24
1,mary,45
2,john,89


# Concatenate side-by-side

In [20]:
pd.concat([df1, df2], axis=1)

Unnamed: 0,name,age,name.1,age.1
0,john,24,mary,45
1,mary,45,john,89


Concatenation of Dataframe 1 and 2:
Pandas will not warn you if you try
to concatenate two dataframes that have
columns with the same name!