___

<center><a href=''><img src='../../../assets/img/logo1.png'/></a></center>

___ 


<center><em>Copyright Qalmaqihir</em></center>
<center><em>For more information, visit us at <a href='http://www.github.com/qalmaqihir/'>www.github.com/qalmaqihir/</a></em></center>

## Combining Datasets: Concat and Append
Some of the most interesting studies of data come from combining different data
sources. These operations can involve anything from very straightforward concatena‐
tion of two different datasets, to more complicated database-style joins and merges
that correctly handle any overlaps between the datasets. Series and DataFrames are
built with this type of operation in mind, and Pandas includes functions and methods
that make this sort of data wrangling fast and straightforward.

In [None]:
import numpy as np
import pandas as pd


In [30]:
def make_df(cols, ind):
    data={c:[str(c) + str(i) for i in ind] for c in cols}
    return pd.DataFrame(data,ind)

In [31]:
make_df('ABC',range(3))

Unnamed: 0,A,B,C
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2


In [32]:
# Recall Numpy concatenation 
x=[1,2,3]
y=[4,3,6]
z=[9,8,0]
np.concatenate([x,y,z])

array([1, 2, 3, 4, 3, 6, 9, 8, 0])

In [33]:
x=[[2,4],[3,5]]
x=np.concatenate([x,x], axis=1)
x

array([[2, 4, 2, 4],
       [3, 5, 3, 5]])

## Simple Concatenation with pd.concat
Pandas has a function, pd.concat(), which has a similar syntax to np.concatenate
but contains a number of options that we’ll discuss momentarily

` #Signature in Pandas v0.18
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False,
copy=True)`

In [34]:
ser1=pd.Series(['A','B','C'],index=[1,2,3])
ser2=pd.Series(['D','E','F'], index=[4,5,6])
pd.concat([ser1,ser2])

1    A
2    B
3    C
4    D
5    E
6    F
dtype: object

In [35]:
df3=make_df("AB",[0,1])
df4=make_df("CD",[0,1])

In [36]:
df3

Unnamed: 0,A,B
0,A0,B0
1,A1,B1


In [37]:
df4

Unnamed: 0,C,D
0,C0,D0
1,C1,D1


In [39]:
pd.concat([df3,df4])

Unnamed: 0,A,B,C,D
0,A0,B0,,
1,A1,B1,,
0,,,C0,D0
1,,,C1,D1


In [40]:
pd.concat([df3,df4],axis=1)

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1


### Duplicate indices
One important difference between np.concatenate and pd.concat is that Pandas
concatenation preserves indices, even if the result will have duplicate indices!

In [41]:
x=make_df("AB",[0,1])
y=make_df("AB",[2,3])
x

Unnamed: 0,A,B
0,A0,B0
1,A1,B1


In [42]:
y

Unnamed: 0,A,B
2,A2,B2
3,A3,B3


In [43]:
y.index

Int64Index([2, 3], dtype='int64')

In [45]:
x.index=y.index

In [46]:
x.index

Int64Index([2, 3], dtype='int64')

In [47]:
x

Unnamed: 0,A,B
2,A0,B0
3,A1,B1


In [48]:
y

Unnamed: 0,A,B
2,A2,B2
3,A3,B3


In [49]:
pd.concat([x,y])

Unnamed: 0,A,B
2,A0,B0
3,A1,B1
2,A2,B2
3,A3,B3


In [50]:
pd.concat([x,y],axis=1)

Unnamed: 0,A,B,A.1,B.1
2,A0,B0,A2,B2
3,A1,B1,A3,B3


### Catching the repeats as an error. If you’d like to simply verify that the indices in the
result of pd.concat() do not overlap, you can specify the verify_integrity flag.
With this set to True, the concatenation will raise an exception if there are duplicate
indices. Here is an example, where for clarity we’ll catch and print the error message:


In [51]:
try:
    pd.concat([x, y], verify_integrity=True)
except ValueError as e:
    print("ValueError:", e)


ValueError: Indexes have overlapping values: Int64Index([2, 3], dtype='int64')


### Ignoring the index. Sometimes the index itself does not matter, and you would prefer
it to simply be ignored. You can specify this option using the ignore_index flag. With
this set to True, the concatenation will create a new integer index for the resulting
Series:

In [53]:
print(x);print(y)

    A   B
2  A0  B0
3  A1  B1
    A   B
2  A2  B2
3  A3  B3


In [54]:
pd.concat([x,y],ignore_index=True)

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3


### Adding MultiIndex keys. Another alternative is to use the keys option to specify a label
for the data sources; the result will be a hierarchically indexed series containing the
data:

In [57]:
pd.concat([x,y], keys=['x','y'])

Unnamed: 0,Unnamed: 1,A,B
x,2,A0,B0
x,3,A1,B1
y,2,A2,B2
y,3,A3,B3


##  Concatenation with joins
In the simple examples we just looked at, we were mainly concatenating DataFrames
with shared column names. In practice, data from different sources might have differ‐
ent sets of column names, and pd.concat offers several options in this case. Consider
the concatenation of the following two DataFrames, which have some (but not all!)
columns in common:

In [58]:
df5=make_df('ABC',[1,2])
df6=make_df('BCD',[3,4])
df6

Unnamed: 0,B,C,D
3,B3,C3,D3
4,B4,C4,D4


In [59]:
df5

Unnamed: 0,A,B,C
1,A1,B1,C1
2,A2,B2,C2


In [60]:
pd.concat([df5,df6])

Unnamed: 0,A,B,C,D
1,A1,B1,C1,
2,A2,B2,C2,
3,,B3,C3,D3
4,,B4,C4,D4


___By default, the entries for which no data is available are filled with NA values. To
change this, we can specify one of several options for the join and join_axes param‐
eters of the concatenate function.___

In [61]:
pd.concat([df5,df6],join='inner')

Unnamed: 0,B,C
1,B1,C1
2,B2,C2
3,B3,C3
4,B4,C4


In [62]:
pd.concat([df5,df6],join='outer')

Unnamed: 0,A,B,C,D
1,A1,B1,C1,
2,A2,B2,C2,
3,,B3,C3,D3
4,,B4,C4,D4


### The append() method
Because direct array concatenation is so common, Series and DataFrame objects
have an append method that can accomplish the same thing in fewer keystrokes. For
example, rather than calling pd.concat([df1, df2]), you can simply call
df1.append(df2):

In [66]:
df1=make_df('AB',[1,2])
df2=make_df('AB',[3,4])
df1

Unnamed: 0,A,B
1,A1,B1
2,A2,B2


In [67]:
df2

Unnamed: 0,A,B
3,A3,B3
4,A4,B4


In [68]:
df1.append(df2)

  df1.append(df2)


Unnamed: 0,A,B
1,A1,B1
2,A2,B2
3,A3,B3
4,A4,B4
