# Introduction

In our applications, we often need to chain multiple CSV data sources into a single one. If the data has headers, it complicates matter a little. Here are suggestions on how to deal with this situation.

In [1]:
# Contents of the first data file
with open('header1.csv') as f:
    print(f.read())

uid,alias,shell
501,karen,bash
502,john,tcsh



In [2]:
# Contents of the second data file
with open('header2.csv') as f:
    print(f.read())

uid,alias,shell
601,peter,bash
602,paul,tcsh
603,mary,zsh



## Using Homemade Solution

In this solution, we open one file after another and skip the header line for all but the first file.

In [4]:
import csv

def open_files(*filenames):
    for counter, filename in enumerate(filenames, 1):
        with open(filename) as f:
            if counter > 1:
                next(f)  # Skip the header
            yield from f
            
for record in csv.DictReader((open_files('header1.csv', 'header2.csv'))):
    print(record)

OrderedDict([('uid', '501'), ('alias', 'karen'), ('shell', 'bash')])
OrderedDict([('uid', '502'), ('alias', 'john'), ('shell', 'tcsh')])
OrderedDict([('uid', '601'), ('alias', 'peter'), ('shell', 'bash')])
OrderedDict([('uid', '602'), ('alias', 'paul'), ('shell', 'tcsh')])
OrderedDict([('uid', '603'), ('alias', 'mary'), ('shell', 'zsh')])


## Using Multiple Readers

A different approach is to use multiple readers:

In [9]:
import csv

def open_readers(*filenames):
    for filename in filenames:
        with open(filename) as f:
            reader = csv.DictReader(f)
            yield from reader
            
for record in open_readers('header1.csv', 'header2.csv'):
    print(record)

OrderedDict([('uid', '501'), ('alias', 'karen'), ('shell', 'bash')])
OrderedDict([('uid', '502'), ('alias', 'john'), ('shell', 'tcsh')])
OrderedDict([('uid', '601'), ('alias', 'peter'), ('shell', 'bash')])
OrderedDict([('uid', '602'), ('alias', 'paul'), ('shell', 'tcsh')])
OrderedDict([('uid', '603'), ('alias', 'mary'), ('shell', 'zsh')])
