Skip to content
Use Stata's merge with csv files
Stata TeX
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE
README.md
mergecsv.ado
mergecsv.pkg
mergecsv.sthlp
mmergecsv.ado
mmergecsv.sthlp
stata.toc

README.md

MERGECSV: Merge with csv in stata


Updates

  • 0.2.1 12apr2019:
    • adds sortusing option to mergecsv to allow for sorting of using dataset before the merge and thus the use of sorted option from merge
    • varnames(1) option called by default when importing csv
  • 0.2 22jan2019:
    • Introduces mmergecsv command
    • Allows passing of options to import delimited using csvopt
  • 0.1.1 04jan2019:
    • Improved cleanup of temporary datasets
  • 0.1 02jan2019:
    • First version of the command

Description

This package installs two twin commands: mergecsv and mmergecsv. These commands allow the use of the merge and mmerge commands in Stata when the using dataset is stored in a .csv file.

This command can be useful for the follwing cases:

  1. the using data is provided to you in .csv format
  2. you are using datasets that you want to store in .csv format for easier interface with other programming languages
  3. you are using datasets that you want to store in git or in another version control software optimized for text, avoiding Stata's .dta format

Main drawbacks:

  1. merging is slower than with directly with .dta, as the csv file is imported, saved in a temporary dataset, and merge or mmerge are called
  2. csv files do not keep useful information, such as:
    • variable labels and types
    • time (tset) and panel (xtset) information
    • labeled/encoded values

Install

cap ado uninstall mergecsv
net install mergecsv, from("https://raw.githubusercontent.com/luispfonseca/stata-mergecsv/master/")

Alternatively, if you have installed haghish's github package for stata, just execute:

github install luispfonseca/stata-mergecsv

Usage

mergecsv

The command:

mergecsv m:1 id using usingdataset, mergeopt(update force nolabel)

is equivalent to

merge m:1 id using usingdataset, update force nolabel

if usingdataset.csv and usingdataset.dta contain the same dataset. The same options that can be passed to merge can be passed to csvmerge using mergeopt. Similarly, options that can be passed to import delimited can be passed using csvopt.

In a more complete example, the following:

sysuse auto, clear
gen logpricesq = log(price)

mergecsv 1:1 id using usingdataset.csv, mergeopt(keepusing(discount)) csvopt(varnames(1))

is equivalent in practice to:

sysuse auto, clear
gen logpricesq = log(price)

preserve
import delimited usingdataset.csv, clear varnames(1)
save usingdataset, replace
restore
merge 1:1 id using usingdataset, keepusing(discount)
erase usingdataset.dta

The .csv extension is passed in the argument, but it is not necessary, as import delimited will recognize the file even without explicit mention.

mmergecsv

To use mmergecsv, we just need to adapt the syntax to that command.

In practice:

sysuse auto, clear
gen logpricesq = log(price)

mmergecsv id using usingdataset.csv, mergeopt(type(1:1) keepusing(discount)) csvopt(varnames(1))

is equivalent to:

sysuse auto, clear
gen logpricesq = log(price)

preserve
import delimited usingdataset.csv, clear varnames(1)
save usingdataset, replace
restore
mmerge id using usingdataset, type(1:1) keepusing(discount)
erase usingdataset.dta

To do:

  • Write tests

Author

Luís Fonseca
London Business School
lfonseca london edu
https://luispfonseca.com

You can’t perform that action at this time.