Merging framework in STATA for international time series datasets
Stata R
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

DOI

UCIMerge - a framework for harmonizing cross national time series data

Read Me

UCIMerge is a framework in STATA to standardize the merging of international comparative datasets. This project creates conventions and a library of functions so that it becomes easier and faster to merge time series datasets, incorporate updates, make sure observations are consistent across years, conserve N and encourage reproducible research.

This framework came about from conversations at the UC Irvine International Comparative Workshop.

Download the latest release. Join the announcement list to receive notifications of updates.

How to use UCIMerge

The first time you run the scripts, it will take an extremely long time to update the datasets from the web. If you would like to jumpstart this, you can use this starter pack by drop these files into the /source directory. If you want to force the system to refresh a dataset, just delete that dataset file from /source.

  1. Set the UCIMerge folder as the working directory for STATA ('cd ~/UCIMerge')

  2. Edit the Master.do file with the configuration that you would like.

  3. Run 'do master' -> your new dataset will be opened and saved within the UCIMerge folder.

UCIMerge requires STATA 13. The .csv files which link countries across datasets can be used independently.

Currently Supported Datasets

You can add your own datasets by using one of these examples as a template.

Structure and Philosophy

Intuitive directory structures and naming conventions means writing less code!

Files

  • Master.do - Configure your merge by commenting/uncommenting the datasets that you want included.
  • UCIMergeList.csv - An index of dataset country codes and their corresponding standardized UCIMerge code.
  • UCICountries.csv - An index of UCIMerge codes for countries.
  • /private/MyMergeList.csv - if the file is present, it is automatically included into the UCIMergeList. Use this to add your own country codes.

Directories

  • /lib - Location for the project .do files. One .do file for each dataset.
  • /source - Location for cached version of the original dataset. UCIMerge will look here for a .dta file with the same prefix as the merge. If if doesn't find a copy, and the dataset merge file specifies a location on the internet, then UCIMerge will try to download a copy.
  • /private - Location for your customized functions, datasets and MergeLists for datasets that are not part of the core UCIMerge project.

About Contributions

Please contribute! If you discover an error, please submit an issue on github or send a fix.

To contribute a new dataset:

  • create a new merge .do file in the /lib directory. The polity.do is a well documented template.
  • include the file in Master.do file
  • update the UCIMergeList.csv with the new country code entries. Sort the merge list by the Source and UCINumeric fields.

Submit a pull request through github, or email your changes to pearcem@uci.edu. We'd love to have your contributions!

If you use this framework, consider citing so that others can find it and contribute.

Pearce, Matthew. 2016. "UCIMerge - a framework for harmonizing cross national time series data." (https://github.com/mpearce/UCIMerge) doi:10.5281/zenodo.27933