# Dataframes concatenations - Part 3 - Parameters

Concat parameters:

axis=0, <br>
ignore_index (bool) = False, <br>
join='outer', <br>
keys=None, <br>
names=None, <br>
verify_integrity (bool) = False, <br>
sort (bool) = False, <br>

In [1]:
import pandas as pd

In [2]:
data1 = {
    'Name' : ['Paul', 'Aaron', 'Krista', 'Veronica', 'Paxton', 'Madison', 'Aurora'],
    'Score': [98, 89, 99, 87, 90, 83, 82]
}

data2 = {
    'Name' : ['Robert', 'Craig', 'Frank', 'Jordyn', 'Isabel'],
    'Score': [98, 89, 99, 97, 97]
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

In [3]:
df1

Unnamed: 0,Name,Score
0,Paul,98
1,Aaron,89
2,Krista,99
3,Veronica,87
4,Paxton,90
5,Madison,83
6,Aurora,82


In [4]:
df2

Unnamed: 0,Name,Score
0,Robert,98
1,Craig,89
2,Frank,99
3,Jordyn,97
4,Isabel,97


## Basic concat

In [5]:
pd.concat([df1, df2], axis=0)

Unnamed: 0,Name,Score
0,Paul,98
1,Aaron,89
2,Krista,99
3,Veronica,87
4,Paxton,90
5,Madison,83
6,Aurora,82
0,Robert,98
1,Craig,89
2,Frank,99


## Using keys parameter

Vertically

In [6]:
pd.concat([df1, df2], axis=0, keys=['DF1', 'DF2'])

Unnamed: 0,Unnamed: 1,Name,Score
DF1,0,Paul,98
DF1,1,Aaron,89
DF1,2,Krista,99
DF1,3,Veronica,87
DF1,4,Paxton,90
DF1,5,Madison,83
DF1,6,Aurora,82
DF2,0,Robert,98
DF2,1,Craig,89
DF2,2,Frank,99


Horizontally

In [7]:
pd.concat([df1, df2], axis=1, keys=['DF1', 'DF2'])

Unnamed: 0_level_0,DF1,DF1,DF2,DF2
Unnamed: 0_level_1,Name,Score,Name,Score
0,Paul,98,Robert,98.0
1,Aaron,89,Craig,89.0
2,Krista,99,Frank,99.0
3,Veronica,87,Jordyn,97.0
4,Paxton,90,Isabel,97.0
5,Madison,83,,
6,Aurora,82,,


## Store concatenated dataframe to a variable and make hierarchical dataframe

In [9]:
data = pd.concat([df1, df2], axis=0, keys=['DF1', 'DF2'])

data

Unnamed: 0,Unnamed: 1,Name,Score
DF1,0,Paul,98
DF1,1,Aaron,89
DF1,2,Krista,99
DF1,3,Veronica,87
DF1,4,Paxton,90
DF1,5,Madison,83
DF1,6,Aurora,82
DF2,0,Robert,98
DF2,1,Craig,89
DF2,2,Frank,99


## Accessing the values in the dataframe by rows/columns

In [None]:
data.loc['DF1']

Or

In [13]:
data.loc['DF1', :]

Unnamed: 0,Name,Score
0,Paul,98
1,Aaron,89
2,Krista,99
3,Veronica,87
4,Paxton,90
5,Madison,83
6,Aurora,82


Or

In [14]:
data.loc['DF2']

Unnamed: 0,Name,Score
0,Robert,98
1,Craig,89
2,Frank,99
3,Jordyn,97
4,Isabel,97


In [15]:
data.loc['DF2', :]

Unnamed: 0,Name,Score
0,Robert,98
1,Craig,89
2,Frank,99
3,Jordyn,97
4,Isabel,97


## Add names to hierarchical index

In [16]:
pd.concat([df1, df2], keys=['DF1', 'DF2'], names=['Level_1', 'Level_2'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Name,Score
Level_1,Level_2,Unnamed: 2_level_1,Unnamed: 3_level_1
DF1,0,Paul,98
DF1,1,Aaron,89
DF1,2,Krista,99
DF1,3,Veronica,87
DF1,4,Paxton,90
DF1,5,Madison,83
DF1,6,Aurora,82
DF2,0,Robert,98
DF2,1,Craig,89
DF2,2,Frank,99


## Verify integrity parameter (avoid duplicate indices)

`verify_integrity=False` (bool, default=False) 

In [18]:
data1 = {
    'Name' : ['Paul', 'Aaron', 'Krista', 'Veronica', 'Paxton'],
    'Score': [98, 89, 99, 87, 90]
}

data2 = {
    'Name' : ['Robert', 'Craig', 'Frank', 'Jordyn', 'Isabel'],
    'Score': [98, 89, 99, 97, 97]
}

df1 = pd.DataFrame(data1, index=[i for i in range(5)])
df2 = pd.DataFrame(data2, index=[i for i in range(5)])

In [19]:
df1

Unnamed: 0,Name,Score
0,Paul,98
1,Aaron,89
2,Krista,99
3,Veronica,87
4,Paxton,90


In [20]:
df2

Unnamed: 0,Name,Score
0,Robert,98
1,Craig,89
2,Frank,99
3,Jordyn,97
4,Isabel,97


Verify integrity (checks if a dataframe contans duplicated indices) <br>

For the dataframe above we have duplicated indices such that in df1 we have 0, 1, 2, 3, 4 <br>
and df2 we also have 0, 1, 2, 3, 4, as indices. <br>

Because of this, `verify_integrity=True` should give an error

In [21]:
pd.concat([df1, df2], axis=0, verify_integrity=True)

ValueError: Indexes have overlapping values: Index([0, 1, 2, 3, 4], dtype='int64')

if you don't want the error, then you should set different values as indices to different datasets

Similar thing will happen if we set the axis to 1 (`axis=1`). <br>
Because in both dataframes we have the exact same column names as horizontal indices. ('Names' and 'Score')

In [22]:
pd.concat([df1, df2], axis=1, verify_integrity=True)

ValueError: Indexes have overlapping values: Index(['Name', 'Score'], dtype='object')

See, we get an error

## Sort parameter

Sort columns if we are adding rows, and sort rows if we are adding columns

In [24]:
data1 = {
    'Name' : ['Paul', 'Aaron', 'Krista', 'Veronica', 'Paxton'],
    'Score': [98, 89, 99, 87, 90]
}

data2 = {
    'Name' : ['Robert', 'Jordyn', 'Isabel'],
    'Grades': ['AA', 'AA', 'AA']
}

df1 = pd.DataFrame(data1, index=[i for i in range(5)])
df2 = pd.DataFrame(data2, index=[i for i in range(3)])

In [25]:
df1

Unnamed: 0,Name,Score
0,Paul,98
1,Aaron,89
2,Krista,99
3,Veronica,87
4,Paxton,90


In [26]:
df2

Unnamed: 0,Name,Grades
0,Robert,AA
1,Jordyn,AA
2,Isabel,AA


In [28]:
pd.concat([df1, df2], axis=0)

Unnamed: 0,Name,Score,Grades
0,Paul,98.0,
1,Aaron,89.0,
2,Krista,99.0,
3,Veronica,87.0,
4,Paxton,90.0,
0,Robert,,AA
1,Jordyn,,AA
2,Isabel,,AA


Again, we are sorting columns if we are adding rows, and vice versa

Using `axis=0`, we get:

In [29]:
pd.concat([df1, df2], axis=0, sort=True)

Unnamed: 0,Grades,Name,Score
0,,Paul,98.0
1,,Aaron,89.0
2,,Krista,99.0
3,,Veronica,87.0
4,,Paxton,90.0
0,AA,Robert,
1,AA,Jordyn,
2,AA,Isabel,


Using `axis=1`, we get:

In [30]:
pd.concat([df1, df2], axis=1, sort=True)

Unnamed: 0,Name,Score,Name.1,Grades
0,Paul,98,Robert,AA
1,Aaron,89,Jordyn,AA
2,Krista,99,Isabel,AA
3,Veronica,87,,
4,Paxton,90,,


Our dataframe is sorted by index values