Template structure for empirical papers.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
extra
input
output
ref
src
tex
tmp
.gitignore
README.md
run_paper.py

README.md

This folder provides an all-encompassing working structure for empirical papers.

It organizes every step of the process: merging and cleaning (several) data sets, performing analyses (tables, figures, regressions), writing the article itself and also presentations.

This readme explains in more detail the folder structure and how you can take the most advantage of it.
For more information, see Gentzknow & Shapiro (2014) Code and Data for the Social Sciences. For a fancier (but harder to learn) version of a paper template, see here.

To use it: simply download it and adapt it to your own project!

Summary

  1. Requirements
  2. Folders
  3. Files
  4. Principles
  5. Further Reading

0. Requirements

This workflow requires:

Other great languages and softwares may also be used.

For now it's only adapted for OSX (Apple) environments. But feel free to adapt it to Windows (and please share it with me!).

1. Folders

/src
  • Any code that manipulates build data and performs analysis should be put here.

  • All output should be redirected into /output. Ideally as a single data file called, say, data.dta.

  • Keep code clean and modularized.

    /sub

    • Holds modularized code to implement subroutines in build.do and analysis.do.
/input
  • Any original data source should be included here in clean and normalized form.
  • Only include cleaned files. Raw external files should be cleaned in each data source specific folder.
  • These data sets will then be manipulated and merged by the files in /src.
/output
  • Holds the final data set, to be then used in /src/analysis.do.
  • Contains all analysis objects generated by files in /src.
  • Will then serve as source for the generation of .tex files.
/tmp
  • Contains any temporary file created during the manipulation of input data sets or the analysis routine.
/extra
  • Contains any extra file relevant to the paper.
  • Examples: grant material, previous analyses.
/ref
  • Keeps the paper references.
  • Suggestion of formatting
    • Author 1 & Author 2 (Journal, Year) Title with Capitalized First Letters.pdf
  • Recommended auxiliary program: Mendeley.
/tex
  • Where the juice is produced.

  • Contains all .tex files for preliminary results, the paper and presentations.

    /sub

    • Curated set of packages and shortcuts commonly used in Social Science papers and presentations.

2. Files

run_paper.py
  • Automates the whole paper construction.
  • Runs everything in a pre-specified order, from beginning (building data sets) to end (compiling .tex files).
  • Keeps clear what should be run when.
  • Also cleans /output and /tmp folders before running other code.
/src/get_input.py
  • Erases any file inside /input and copies any original data set from outside sources.
  • Ensures consistency across original data generation and data building for paper.

3. Principles

  • For each new project, start (i) a structured versioned folder, (ii) a task manager project, and (iii) a set of slides.
    1. Copy this folder and use a version control system (e.g. Git).
      • Keep track of multiple authors' edits.
      • No more report_final_v3.2b_ST_toDelete.tex.
      • Use branching to work simultaneously on the same code.
    2. Start a project within a task manager. (see Asana, Trello, JIRA, etc).
      • Your email inbox is not a task manager.
      • Tasks should be actionable atoms.
      • Set priorities, assignments, due dates, etc.
      • Only one person should be ultimately responsible for each task.
      • Do regular reviews and cleaning.
    3. Slides
      • Containing the current (summarized) version of the paper.
      • Update it continuously. It will discipline your work.
  • Keep two folders: /papers, and /data.
    1. Papers.
      • Each folder within /papers is paper.
    2. Data.
      • Each folder within /data is a data set.
      • Use the same structure for cleaning these datasets (/input, /src, /output, /tmp)
      • Then use /main_paper/src/get_input.py to copy original datasets.
  • Use a good text editor (I recommend vim, Sublime Text or Notepad ++).
  • Use a modern and flexible communication tool (see Slack).
  • Keep documentation lean and clean.
  • Keep this folder organized. Your future self thanks your present effort.

4. Further Reading