In [1]:
import pandas as pd

#### Data Integration


Data integration is the process of combining data from different sources into a single, unified view. It involves merging data from multiple sources and ensuring that the data is consistent and accurate.



The need for data integration arises when organizations have data spread across multiple systems and applications, making it difficult to analyze and gain insights. For example, a company may have customer data stored in a CRM system, sales data stored in a separate database, and marketing data stored in yet another system. Integrating this data can provide a more complete picture of the business and help to identify trends and opportunities.



#### Concatenation


Concatenation is the process of combining two or more dataframes by appending them along a particular axis. To concatenate two or more dataframes along rows or columns, you can use the pd.concat() function from the pandas library.



In [11]:
df1 = pd.DataFrame(
    {"A" : [1,4,8,12],
     "B" : ['a','n','r','o']}
)

df2 = pd.DataFrame(
    {"A" : [0,2,3],
     "B" : ['a','e','i']}
)

In [12]:
df1

Unnamed: 0,A,B
0,1,a
1,4,n
2,8,r
3,12,o


In [13]:
df2


Unnamed: 0,A,B
0,0,a
1,2,e
2,3,i


In [14]:
# concactenate dataframe4 along row (axis=0)
row_conccat=pd.concat([df1,df2],axis=0)
row_conccat

Unnamed: 0,A,B
0,1,a
1,4,n
2,8,r
3,12,o
0,0,a
1,2,e
2,3,i


In [15]:
# concactenate dataframe4 along column (axis=1)
row_conccat=pd.concat([df1,df2],axis=1)
row_conccat

Unnamed: 0,A,B,A.1,B.1
0,1,a,0.0,a
1,4,n,2.0,e
2,8,r,3.0,i
3,12,o,,


#### Merging


Merging is the process of combining two or more dataframes based on common columns. This technique is used when the datasets have some common columns. You can use the pd.merge() function from the pandas library to merge dataframes.



In [16]:
df1 = pd.DataFrame(
    {"userid" : ['us1','us2','us3','us4'],
     "name" : ['a','n','r','o']}
)

df2 = pd.DataFrame(
    {"userid" : ['us3','us1','us5','us2'],
     "bank" : ['x','e','i','u']}
)

In [20]:
merged=pd.merge(left=df1,right=df2,on='userid',how='outer')
merged

Unnamed: 0,userid,name,bank
0,us1,a,e
1,us2,n,u
2,us3,r,x
3,us4,o,
4,us5,,i


#### Joining


Joining is similar to merging, but is specifically used to combine dataframes based on their indexes.

You can use the pd.DataFrame.join() method to join dataframes.



In [21]:
df1 = pd.DataFrame(
    {"userid" : ['us1','us2','us3','us4'],
     "name" : ['a','n','r','o']},
    index = ["x", 'y' , 'z', 'w']
)

df2 = pd.DataFrame(
    {"customer" : ['us3','us1','us5','us2'],
     "bank" : ['x','e','i','u']},
    index = ["x", 'y' , 'a', 'b']
)

In [23]:
# pd.join()
joined=df1.join(df2,how='right')
joined

Unnamed: 0,userid,name,customer,bank
x,us1,a,us3,x
y,us2,n,us1,e
a,,,us5,i
b,,,us2,u


#### Stacking


Stacking is the process of vertically combining datasets with the same columns. The datasets are aligned by their column names and then stacked on top of each other. You can use the pd.stack() function to stack a dataframe.



In [24]:
df1 = pd.DataFrame(
    {"userid" : [1,2,3,4],
     "name" : [0,9,8,7]},
    index = ["x", 'y' , 'z', 'w']
)
df1

Unnamed: 0,userid,name
x,1,0
y,2,9
z,3,8
w,4,7


In [26]:
stack=df1.stack()
stack

x  userid    1
   name      0
y  userid    2
   name      9
z  userid    3
   name      8
w  userid    4
   name      7
dtype: int64