# Lecture 3: Pandas [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) 3

* [`pandas.concat(...)`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html)

## Imports

In [1]:
import pandas as pd

## DataFrames

In [2]:
df1 = pd.DataFrame(data=list(zip(range(5), range(100,105), range(1000, 1005))),
                   columns=['col0', 'col1', 'col2'])
df1

Unnamed: 0,col0,col1,col2
0,0,100,1000
1,1,101,1001
2,2,102,1002
3,3,103,1003
4,4,104,1004


In [3]:
df2 = pd.DataFrame(data=list(zip(range(5,10), range(105,110), range(1005, 1010))),
                   columns=['col0', 'col1', 'col2'])
df2

Unnamed: 0,col0,col1,col2
0,5,105,1005
1,6,106,1006
2,7,107,1007
3,8,108,1008
4,9,109,1009


In [4]:
df3 = pd.DataFrame(data=list(zip(range(5,10), range(105,110), range(2005, 2010))),
                   columns=['col0', 'col1', 'col3'])
df3

Unnamed: 0,col0,col1,col3
0,5,105,2005
1,6,106,2006
2,7,107,2007
3,8,108,2008
4,9,109,2009


## [`pandas.concat(...)`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html)

### Vertical (append rows)
#### Standard Case

In [5]:
pd.concat([df1, df2])

Unnamed: 0,col0,col1,col2
0,0,100,1000
1,1,101,1001
2,2,102,1002
3,3,103,1003
4,4,104,1004
0,5,105,1005
1,6,106,1006
2,7,107,1007
3,8,108,1008
4,9,109,1009


Note that the row index was copied over from `df2`. We can avoid that and effectively re-index by setting `ignore_index=True`.

#### With Non-overlapping Columns

In [6]:
pd.concat([df1, df3])

Unnamed: 0,col0,col1,col2,col3
0,0,100,1000.0,
1,1,101,1001.0,
2,2,102,1002.0,
3,3,103,1003.0,
4,4,104,1004.0,
0,5,105,,2005.0
1,6,106,,2006.0
2,7,107,,2007.0
3,8,108,,2008.0
4,9,109,,2009.0


All columns are preserved and filled with NaN where values are missing.

This behaviour can also be tightened to only keep overlapping values using `join='inner'`:

In [7]:
pd.concat([df1, df3], join='inner')

Unnamed: 0,col0,col1
0,0,100
1,1,101
2,2,102
3,3,103
4,4,104
0,5,105
1,6,106
2,7,107
3,8,108
4,9,109


### Horizontal (append columns)

Use `axis=1` to switch to horizontal.

#### Standard Case

In [8]:
pd.concat([df1, df2], axis=1)

Unnamed: 0,col0,col1,col2,col0.1,col1.1,col2.1
0,0,100,1000,5,105,1005
1,1,101,1001,6,106,1006
2,2,102,1002,7,107,1007
3,3,103,1003,8,108,1008
4,4,104,1004,9,109,1009


`pd.concat(...)` also supports `ignore_index` for columns.

#### With Non-overlapping Rows

In [9]:
df4 = pd.DataFrame(data=list(zip(range(5,10), range(105,110), range(1005, 1010))),
                   index=range(3,8),
                   columns=['col0', 'col1', 'col2'])
df4

Unnamed: 0,col0,col1,col2
3,5,105,1005
4,6,106,1006
5,7,107,1007
6,8,108,1008
7,9,109,1009


In [10]:
pd.concat([df1, df4], axis=1)

Unnamed: 0,col0,col1,col2,col0.1,col1.1,col2.1
0,0.0,100.0,1000.0,,,
1,1.0,101.0,1001.0,,,
2,2.0,102.0,1002.0,,,
3,3.0,103.0,1003.0,5.0,105.0,1005.0
4,4.0,104.0,1004.0,6.0,106.0,1006.0
5,,,,7.0,107.0,1007.0
6,,,,8.0,108.0,1008.0
7,,,,9.0,109.0,1009.0


We can again tighten this behaviour to only keep overlapping values using `join='inner'`.

#### With a `Series`

In [11]:
df1

Unnamed: 0,col0,col1,col2
0,0,100,1000
1,1,101,1001
2,2,102,1002
3,3,103,1003
4,4,104,1004


We now also provie a `name` for this `Series`, which will become its column name.

In [12]:
s2 = pd.Series(range(2000, 2005), name='col3')
s2

0    2000
1    2001
2    2002
3    2003
4    2004
Name: col3, dtype: int64

In [13]:
pd.concat([df1, s2], axis=1)

Unnamed: 0,col0,col1,col2,col3
0,0,100,1000,2000
1,1,101,1001,2001
2,2,102,1002,2002
3,3,103,1003,2003
4,4,104,1004,2004


The more common practice is to use indexing:

In [14]:
df1['col3'] = s2
df1

Unnamed: 0,col0,col1,col2,col3
0,0,100,1000,2000
1,1,101,1001,2001
2,2,102,1002,2002
3,3,103,1003,2003
4,4,104,1004,2004


#### With Only `Series`

In [15]:
s3 = pd.Series(range(3000, 3005))
s3

0    3000
1    3001
2    3002
3    3003
4    3004
dtype: int64

In [16]:
pd.concat([s2, s3], axis=1)

Unnamed: 0,col3,0
0,2000,3000
1,2001,3001
2,2002,3002
3,2003,3003
4,2004,3004


© 2023 Philipp Cornelius