In [1]:
import pandas as pd
import numpy as np

In [9]:
def make_df(cols, ind):
    data = {c:[str(c) + str(i)for i in ind] for c in cols}
    return pd.DataFrame(data, ind)
make_df('ABC', range(3))

Unnamed: 0,A,B,C
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2


In [12]:
#recall numpy concatenation
x = [1,2,3]
y = [4,5,6]
z = [7,8,9]
np.concatenate([x,y,z])

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [13]:
x = [[1,2],
     [3,4]]
np.concatenate([x,x], axis = 1)

array([[1, 2, 1, 2],
       [3, 4, 3, 4]])

## Simple Concatenation with pd.concat

##### Signature in Pandas v0.18
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,keys=None, levels=None, names=None, verify_integrity=False,copy=True)

##### pd.concat() can be used for a simple concatenation of Series or DataFrame objects"

In [15]:
ser1 = pd.Series([1,2,3], index=['a','b','c'])
ser2 = pd.Series([4,5,6], index=['d','e','f'])
pd.concat([ser1, ser2])

a    1
b    2
c    3
d    4
e    5
f    6
dtype: int64

In [18]:
# Concatenate Higher-dimensional objects,such as Dataframe
df1 = make_df('AB', [1,2])
df2 = make_df('AB', [3,4])
print(df1); print(df2); print(pd.concat([df1, df2])) #by default axis = 0

    A   B
1  A1  B1
2  A2  B2
    A   B
3  A3  B3
4  A4  B4
    A   B
1  A1  B1
2  A2  B2
3  A3  B3
4  A4  B4


In [36]:
df3 = make_df('AB', [0, 1])
df4 = make_df('CD', [0, 1])
print(df3); print(df4);
print(pd.concat([df1, df2], axis = 1))
print(pd.concat([df3,df4], axis = 1))

    A   B
0  A0  B0
1  A1  B1
    C   D
0  C0  D0
1  C1  D1
     A    B    A    B
1   A1   B1  NaN  NaN
2   A2   B2  NaN  NaN
3  NaN  NaN   A3   B3
4  NaN  NaN   A4   B4
    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1


#### Duplicate Indices

- one important difference between np.concatenate and pd.concat is that pd.concat allow duplicate indices

In [38]:
x = make_df('AB', [0, 1])
y = make_df('AB', [2, 3])
y.index = x.index
print(x); print(y); print(pd.concat([x, y]))

    A   B
0  A0  B0
1  A1  B1
    A   B
0  A2  B2
1  A3  B3
    A   B
0  A0  B0
1  A1  B1
0  A2  B2
1  A3  B3


#### Catching the repeat as an error

In [39]:
try:
    pd.concat([x,y], verify_integrity=True)
except ValueError as e:
    print("Value Error:", e)

Value Error: Indexes have overlapping values: Int64Index([0, 1], dtype='int64')


#### Ignoring Index

In [41]:
print(x); print(y); print(pd.concat([x,y], ignore_index=True))

    A   B
0  A0  B0
1  A1  B1
    A   B
0  A2  B2
1  A3  B3
    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3


#### Adding MultiIndex Keys

Another alternative is to use the keys option to specify a label
for the data sources; the result will be a hierarchically indexed series containing the
data:

In [45]:
print(pd.concat([x,y], keys=['x','y']))

      A   B
x 0  A0  B0
  1  A1  B1
y 0  A2  B2
  1  A3  B3


#### Concatenation with Joins

In [48]:
df5 = make_df('ABC', [1, 2])
df6 = make_df('BCD', [3, 4])
print(df5)
print(df6)
print(pd.concat([df5, df6]))

    A   B   C
1  A1  B1  C1
2  A2  B2  C2
    B   C   D
3  B3  C3  D3
4  B4  C4  D4
     A   B   C    D
1   A1  B1  C1  NaN
2   A2  B2  C2  NaN
3  NaN  B3  C3   D3
4  NaN  B4  C4   D4


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  """


By default, the entries for which no data is available are filled with NA values. To
change this, we can specify one of several options for the join and join_axes parameters of the concatenate function.

- By default, the join is a union of the input columns(join='outer'), but we can change this to an intersection of the columns using join='inner' :

In [52]:
print(pd.concat([df5, df6], join = 'inner')) #taking Intersection colomns

    B   C
1  B1  C1
2  B2  C2
3  B3  C3
4  B4  C4


- Another option is to directly specify the index of the remaining colums using the join_axes argument, which takes a list of index objects.

In [55]:
print(df5); print(df6); print(pd.concat([df5, df6], join_axes=[df5.columns]))

    A   B   C
1  A1  B1  C1
2  A2  B2  C2
    B   C   D
3  B3  C3  D3
4  B4  C4  D4
     A   B   C
1   A1  B1  C1
2   A2  B2  C2
3  NaN  B3  C3
4  NaN  B4  C4


#### The append() method

- rather than calling pd.concat([df1, df2]) , you can simply call df1.append(df2) :

In [56]:
df1.append(df2)

Unnamed: 0,A,B
1,A1,B1
2,A2,B2
3,A3,B3
4,A4,B4


Keep in mind that unlike the append() and extend() methods of Python lists, the
append() method in Pandas does not modify the original object—instead, it creates a
new object with the combined data. It also is not a very efficient method, because it
involves creation of a new index and data buffer. Thus, if you plan to do multiple
append operations, it is generally better to build a list of DataFrame s and pass them all
at once to the concat() function.