UCIMerge - a framework for harmonizing cross national time series data
UCIMerge is a framework in STATA to standardize the merging of international comparative datasets. This project creates conventions and a library of functions so that it becomes easier and faster to merge time series datasets, incorporate updates, make sure observations are consistent across years, conserve N and encourage reproducible research.
This framework came about from conversations at the UC Irvine International Comparative Workshop.
How to use UCIMerge
The first time you run the scripts, it will take an extremely long time to update the datasets from the web. If you would like to jumpstart this, you can use this starter pack by drop these files into the /source directory. If you want to force the system to refresh a dataset, just delete that dataset file from /source.
Set the UCIMerge folder as the working directory for STATA ('cd ~/UCIMerge')
Edit the Master.do file with the configuration that you would like.
Run 'do master' -> your new dataset will be opened and saved within the UCIMerge folder.
UCIMerge requires STATA 13. The .csv files which link countries across datasets can be used independently.
Currently Supported Datasets
- Norris 2009
- Freedom House 2016
- Polity IV
- Polity IV Coups
- World Development Indicators
- KOF Index of Globalization
- The Lexical Index of Electoral Democracy (LIED)
- CIRI Human Rights Dataset
- Quality of Government Standard dataset
- Cross National Time Series
- Penn World Table version 8.1
You can add your own datasets by using one of these examples as a template.
Structure and Philosophy
Intuitive directory structures and naming conventions means writing less code!
- Master.do - Configure your merge by commenting/uncommenting the datasets that you want included.
- UCIMergeList.csv - An index of dataset country codes and their corresponding standardized UCIMerge code.
- UCICountries.csv - An index of UCIMerge codes for countries.
- /private/MyMergeList.csv - if the file is present, it is automatically included into the UCIMergeList. Use this to add your own country codes.
- /lib - Location for the project .do files. One .do file for each dataset.
- /source - Location for cached version of the original dataset. UCIMerge will look here for a .dta file with the same prefix as the merge. If if doesn't find a copy, and the dataset merge file specifies a location on the internet, then UCIMerge will try to download a copy.
- /private - Location for your customized functions, datasets and MergeLists for datasets that are not part of the core UCIMerge project.
Please contribute! If you discover an error, please submit an issue on github or send a fix.
To contribute a new dataset:
- create a new merge .do file in the /lib directory. The polity.do is a well documented template.
- include the file in Master.do file
- update the UCIMergeList.csv with the new country code entries. Sort the merge list by the Source and UCINumeric fields.
Submit a pull request through github, or email your changes to email@example.com. We'd love to have your contributions!
If you use this framework, consider citing so that others can find it and contribute.
Pearce, Matthew. 2016. "UCIMerge - a framework for harmonizing cross national time series data." (https://github.com/mpearce/UCIMerge) doi:10.5281/zenodo.27933