## Pandas Concatination
When working with data we often would be required to concatenate two or multiple columns of text/string in pandas DataFrame, you can do this in several ways. In this article, I will cover the most used ways in my real-time projects to concatenate two or multiple columns of string/text type. While concat based on your need, you may be required to add a separator hence, I will explain examples with the separator as well.

In [1]:
import pandas as pd

Let's first have couple of DataFrames:

In [5]:
# India weather DataFrame:
india_weather = pd.DataFrame({
    "City": ['Mumbai', 'Banglor', 'Delhi'],
    "Temperature": [23, 34, 33], 
    "Humidity": [70, 65, 78]
})
india_weather

Unnamed: 0,City,Temperature,Humidity
0,Mumbai,23,70
1,Banglor,34,65
2,Delhi,33,78


In [6]:
# Us weather DataFrame:
us_weather = pd.DataFrame({
    "City": ['Newyork', 'Chicago', 'Orlando'],
    "Temperature": [30, 31, 35], 
    "Humidity": [78, 85, 75]
})
us_weather

Unnamed: 0,City,Temperature,Humidity
0,Newyork,30,78
1,Chicago,31,85
2,Orlando,35,75


In [9]:
# Now to concatinate these two  datasets, we use concate() method:
df = pd.concat([india_weather, us_weather])
df

Unnamed: 0,City,Temperature,Humidity
0,Mumbai,23,70
1,Banglor,34,65
2,Delhi,33,78
0,Newyork,30,78
1,Chicago,31,85
2,Orlando,35,75


In [16]:
# To get continue indexes, so we ignore index:
df = pd.concat([india_weather, us_weather], ignore_index = True)
df

Unnamed: 0,City,Temperature,Humidity
0,Mumbai,23,70
1,Banglor,34,65
2,Delhi,33,78
3,Newyork,30,78
4,Chicago,31,85
5,Orlando,35,75


In [18]:
# We can create a sub index for each DataFrame:
df = pd.concat([india_weather, us_weather], keys = ["india", "us"])
df

Unnamed: 0,Unnamed: 1,City,Temperature,Humidity
india,0,Mumbai,23,70
india,1,Banglor,34,65
india,2,Delhi,33,78
us,0,Newyork,30,78
us,1,Chicago,31,85
us,2,Orlando,35,75


In [19]:
# Now we can run loc() method to retrieve a sub DataFrame, it's most practical when we have big concat DataFrame & we want tor retrieve sub of them.
df.loc['india']

Unnamed: 0,City,Temperature,Humidity
0,Mumbai,23,70
1,Banglor,34,65
2,Delhi,33,78


In [20]:
# Same as for US:
df.loc["us"]

Unnamed: 0,City,Temperature,Humidity
0,Newyork,30,78
1,Chicago,31,85
2,Orlando,35,75


It's pretty useful when you have large concatinated DataFrames.

In [21]:
# so still what we did is joining DataFrames on top of each other. now let's have a DataFrame like:
weather_data = pd.DataFrame({
    "City": ['Newyork', 'Chicago', 'Orlando'],
    "Temperature": [30, 31, 35],
})
weather_data

Unnamed: 0,City,Temperature
0,Newyork,30
1,Chicago,31
2,Orlando,35


In [22]:
# & let's have other windspeed DataFrame:
windspeed = pd.DataFrame({
    "City": ['Newyork', 'Chicago', 'Orlando'],
    "windspeed": [8, 9, 7],
})
windspeed

Unnamed: 0,City,windspeed
0,Newyork,8
1,Chicago,9
2,Orlando,7


In [23]:
# Now simply when we concatinate these two DataFrames, it will look like this:
df = pd.concat([weather_data, windspeed])
df

Unnamed: 0,City,Temperature,windspeed
0,Newyork,30.0,
1,Chicago,31.0,
2,Orlando,35.0,
0,Newyork,,8.0
1,Chicago,,9.0
2,Orlando,,7.0


In [24]:
# Now we see this arrangement is not Ok, we have empty cell in 2nd and 3rd column. so we don't we like that.
# To arrange the DataFrame, we use 'axis=1' property. By default it's 'axis=0':
df = pd.concat([weather_data, windspeed], axis = 1)
df

Unnamed: 0,City,Temperature,City.1,windspeed
0,Newyork,30,Newyork,8
1,Chicago,31,Chicago,9
2,Orlando,35,Orlando,7


Yesssssssss! now we have the expected result.

In [25]:
# Again our real data is not much clear and ordered. for example if we don't have column order in upper cases, so we'll face problem with used procedure. See this:
weather_data = pd.DataFrame({
    "City": ['Newyork', 'Orlando', 'Chicogo'],
    "Temperature": [30, 31, 35],
})
weather_data

Unnamed: 0,City,windspeed
0,Newyork,30
1,Orlando,31
2,Chicogo,35


In [26]:
# Here[25] the places of last two cities are differnt with the 'windspeed' DataFrame, so when we concate them ussing 'axis=1', it will not produce the expected result:
df = pd.concat([weather_data, windspeed], axis = 1)
df

Unnamed: 0,City,windspeed,City.1,windspeed.1
0,Newyork,30,Newyork,8
1,Orlando,31,Chicago,9
2,Chicogo,35,Orlando,7


In [27]:
# Now in upper result[26], we see that against Orlando in row1 we have Chicago, so it's not good.
# To avoid this we use 'index=[1,2,3, ...]'
weather_data = pd.DataFrame({
    "City": ['Newyork', 'Orlando', 'Chicogo'],
    "Temperature": [30, 31, 35],
}, index = [0,1,2])
weather_data

Unnamed: 0,City,Temperature
0,Newyork,30
1,Orlando,31
2,Chicogo,35


In [29]:
# For windspeed also:
windspeed = pd.DataFrame({
   "City": ['Newyork', 'Chicago', 'Orlando'],
   "windspeed": [8, 9, 7],
}, index = [0, 2, 1])
windspeed

Unnamed: 0,City,windspeed
0,Newyork,8
2,Chicago,9
1,Orlando,7


In [31]:
# So now again when we concatinate them the result will be OK.
df = pd.concat([weather_data, windspeed], axis = 1)
df

Unnamed: 0,City,Temperature,City.1,windspeed
0,Newyork,30,Newyork,8
1,Orlando,31,Orlando,7
2,Chicogo,35,Chicago,9


In [32]:
# We can also join Pandas Series with a DataFrame, see if we have the cities weather data:
weather_data

Unnamed: 0,City,Temperature
0,Newyork,30
1,Orlando,31
2,Chicogo,35


In [34]:
# & We have a Series, this series has weather event for each city.
s = pd.Series(["Dry", "Humid", "Sunny"], name = "event")
s

0      Dry
1    Humid
2    Sunny
Name: event, dtype: object

In [35]:
# So now to join this series with a DataFrame, we agin use concat() fucntion:
df = pd.concat([weather_data, s], axis = 1)
df

Unnamed: 0,City,Temperature,event
0,Newyork,30,Dry
1,Orlando,31,Humid
2,Chicogo,35,Sunny


Excellllllllllent!!!! we have the right result.

In [42]:
# If we have ordered 'temperatrue' without cities names, so we can simply join it agian using concat() function.
# widspeed DataFrame:
windspeed = pd.DataFrame({
   "City": ['Newyork', 'Chicago', 'Orlando'],
   "windspeed": [8, 9, 7],
})
windspeed

Unnamed: 0,City,windspeed
0,Newyork,8
1,Chicago,9
2,Orlando,7


In [39]:
# Temperatuer DataFrame without cities names:
weather_data = pd.DataFrame({
    "Temperature": [30, 31, 35],
})
weather_data

Unnamed: 0,Temperature
0,30
1,31
2,35


In [43]:
# Concatination process using concat() function:
df = pd.concat([windspeed, weather_data], axis = 1)
df

Unnamed: 0,City,windspeed,Temperature
0,Newyork,8,30
1,Chicago,9,31
2,Orlando,7,35


Thats were all about Pandas concat() fucntion...