---
title: "Concatenate DataFrames With The Concat Function"
description: "Concatenate multiple DataFrames with the pandas concat() function."
tags: Pandas
URL: https://github.com/ageron/handson-ml
Licence: Apache License 2.0
Creator: 
Meta: ""

---

 <div>
    	<img src="./coco.png" style="float: left;height: 55px">
    	<div style="height: 150px;text-align: center; padding-top:5px">
        <h1>
      	Concatenate DataFrames With The Concat Function
        </h1>
        <p>Concatenate multiple DataFrames with the pandas concat() function.</p>
    	</div>
		</div> 

 <div style="height:40px">
		<div style="width:100%; text-align:center; border-bottom: 1px solid #000; line-height:0.1em; margin:40px 0 20px;">
    	<span style="background:#fff; padding:0 10px; font-size:25px; font-family: 'Open Sans', sans-serif;">
        Key Code
    	</span>
		</div>
		</div>
			

In [None]:
import pandas as pd

In [None]:
# igonre_index will rename the index 0 ... n-1 (default = False)
pd.concat([df1, df2, df...], ignore_index = True)

In [None]:
# only columns that exist in both DataFrames are returned
pd.concat([df1, df2, df...], join = 'inner')

In [None]:
# concatenate horizontally
pd.concat([df1, df2, df...], axis = 1)

 <div style="height:40px">
		<div style="width:100%; text-align:center; border-bottom: 1px solid #000; line-height:0.1em; margin:40px 0 20px;">
    	<span style="background:#fff; padding:0 10px; font-size:25px; font-family: 'Open Sans', sans-serif;">
        Example
    	</span>
		</div>
		</div>
			

## Create example DataFrames

In [136]:
city_loc = pd.DataFrame(
    [
        ["CA", "San Francisco", 37.781334, -122.416728],
        ["NY", "New York", 40.705649, -74.008344],
        ["FL", "Miami", 25.791100, -80.320733],
        ["OH", "Cleveland", 41.473508, -81.739791],
        ["UT", "Salt Lake City", 40.755851, -111.896657]
    ], columns=["state", "city", "lat", "lng"])
city_loc

Unnamed: 0,state,city,lat,lng
0,CA,San Francisco,37.781334,-122.416728
1,NY,New York,40.705649,-74.008344
2,FL,Miami,25.7911,-80.320733
3,OH,Cleveland,41.473508,-81.739791
4,UT,Salt Lake City,40.755851,-111.896657


In [137]:
city_pop = pd.DataFrame(
    [
        [808976, "San Francisco", "California"],
        [8363710, "New York", "New-York"],
        [413201, "Miami", "Florida"],
        [2242193, "Houston", "Texas"]
    ], index=[3,4,5,6], columns=["population", "city", "state"])
city_pop

Unnamed: 0,population,city,state
3,808976,San Francisco,California
4,8363710,New York,New-York
5,413201,Miami,Florida
6,2242193,Houston,Texas


## Concatenate them together

In [142]:
result_concat = pd.concat([city_loc, city_pop])
result_concat

Unnamed: 0,city,lat,lng,population,state
0,San Francisco,37.781334,-122.416728,,CA
1,New York,40.705649,-74.008344,,NY
2,Miami,25.7911,-80.320733,,FL
3,Cleveland,41.473508,-81.739791,,OH
4,Salt Lake City,40.755851,-111.896657,,UT
3,San Francisco,,,808976.0,California
4,New York,,,8363710.0,New-York
5,Miami,,,413201.0,Florida
6,Houston,,,2242193.0,Texas


**Note:** this operation aligned the data horizontally (by columns) but not vertically (by rows). In this example, we end up with multiple rows having the same index (eg. 3). Pandas handles this rather gracefully:

In [143]:
result_concat.loc[3]

Unnamed: 0,city,lat,lng,population,state
3,Cleveland,41.473508,-81.739791,,OH
3,San Francisco,,,808976.0,California


 <div style="height:40px">
		<div style="width:100%; text-align:center; border-bottom: 1px solid #000; line-height:0.1em; margin:40px 0 20px;">
    	<span style="background:#fff; padding:0 10px; font-size:25px; font-family: 'Open Sans', sans-serif;">
        Example
    	</span>
		</div>
		</div>
			

## Tell pandas to just ignore the index

In [144]:
pd.concat([city_loc, city_pop], ignore_index=True)

Unnamed: 0,city,lat,lng,population,state
0,San Francisco,37.781334,-122.416728,,CA
1,New York,40.705649,-74.008344,,NY
2,Miami,25.7911,-80.320733,,FL
3,Cleveland,41.473508,-81.739791,,OH
4,Salt Lake City,40.755851,-111.896657,,UT
5,San Francisco,,,808976.0,California
6,New York,,,8363710.0,New-York
7,Miami,,,413201.0,Florida
8,Houston,,,2242193.0,Texas


**Note:** when a column does not exist in a `DataFrame`, it acts as if it was filled with `NaN` values. 

 <div style="height:40px">
		<div style="width:100%; text-align:center; border-bottom: 1px solid #000; line-height:0.1em; margin:40px 0 20px;">
    	<span style="background:#fff; padding:0 10px; font-size:25px; font-family: 'Open Sans', sans-serif;">
        Example
    	</span>
		</div>
		</div>
			

## Only columns that exist in *both* `DataFrame`s

In [145]:
pd.concat([city_loc, city_pop], join="inner")

Unnamed: 0,state,city
0,CA,San Francisco
1,NY,New York
2,FL,Miami
3,OH,Cleveland
4,UT,Salt Lake City
3,California,San Francisco
4,New-York,New York
5,Florida,Miami
6,Texas,Houston


 <div style="height:40px">
		<div style="width:100%; text-align:center; border-bottom: 1px solid #000; line-height:0.1em; margin:40px 0 20px;">
    	<span style="background:#fff; padding:0 10px; font-size:25px; font-family: 'Open Sans', sans-serif;">
        Example
    	</span>
		</div>
		</div>
			

## Concatenate `DataFrame`s horizontally

In [146]:
pd.concat([city_loc, city_pop], axis=1)

Unnamed: 0,state,city,lat,lng,population,city.1,state.1
0,CA,San Francisco,37.781334,-122.416728,,,
1,NY,New York,40.705649,-74.008344,,,
2,FL,Miami,25.7911,-80.320733,,,
3,OH,Cleveland,41.473508,-81.739791,808976.0,San Francisco,California
4,UT,Salt Lake City,40.755851,-111.896657,8363710.0,New York,New-York
5,,,,,413201.0,Miami,Florida
6,,,,,2242193.0,Houston,Texas


In this case it really does not make much sense because the indices do not align well (eg. Cleveland and San Francisco end up on the same row, because they shared the index label `3`). So let's reindex the `DataFrame`s by city name before concatenating:

In [147]:
pd.concat([city_loc.set_index("city"), city_pop.set_index("city")], axis=1)

Unnamed: 0,state,lat,lng,population,state.1
Cleveland,OH,41.473508,-81.739791,,
Houston,,,,2242193.0,Texas
Miami,FL,25.7911,-80.320733,413201.0,Florida
New York,NY,40.705649,-74.008344,8363710.0,New-York
Salt Lake City,UT,40.755851,-111.896657,,
San Francisco,CA,37.781334,-122.416728,808976.0,California


This looks a lot like a `FULL OUTER JOIN`, except that the `state` columns were not renamed to `state_x` and `state_y`, and the `city` column is now the index.

 <div style="height:40px">
		<div style="width:100%; text-align:center; border-bottom: 1px solid #000; line-height:0.1em; margin:40px 0 20px;">
    	<span style="background:#fff; padding:0 10px; font-size:25px; font-family: 'Open Sans', sans-serif;">
        Learn More
    	</span>
		</div>
		</div>
			

Check out [the documentation on pd.concat](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) for more details.