Skip to content

Workflow.Details

jmshapir edited this page Jul 20, 2023 · 8 revisions

Github issues allow us to organize and track what is happening on a project, and also provide a durable, replicable record of the work for future reference.

Scope

An issue is a discrete, well-defined unit of work on a project. Normally this means that an issue should not be more than a couple of weeks' worth of work, and should not be open for more than a month or two.

Issues are too broad if they become open-ended and end up mixing multiple work threads. "Write the follow up paper" or "Do the analysis" are usually not good issues (unless the project is very small). Sometimes an issue that started with manageable scope will grow as the project expands or new questions arise. At this point it is a good idea to carve off subparts into separate issues and close the original issue with an interim summary.

Issues should not be opened until the assignee is ready to work on them actively (or will be soon). To-do items that we plan to work on in the future should be placed on a project outline in the repository's Github wiki.

If work stops on an issue for any reason and is not expected to resume soon, the issue should be closed with an interim summary. If we plan to continue work later this can be noted on the project outline in the repository's Github wiki along with a link to the original issue.

Title

The issue title should be descriptive enough that somebody looking back at it later will be able to understand what the purpose of the issue was and how it fits into the larger context. It should use the imperative mood, and not end in a period ("Revise main figure") not ("main figure."). This post by Chris Beams has an excellent discussion of what makes a good Git commit message; the same principles apply to good issue titles as well.

Description

The issue description should state the goals of the issue clearly. Like the title, it should usually be written in imperative mode. The description should be precise enough that a third party can judge whether the issue was completed or not. It should include enough explanation and context that someone who is not intimately familiar with the other work going on at that moment can understand it clearly -- remember that we will often be returning to these issues many months or even years later and trying to understand what was going on. If an issue relates directly to one or more other issues, this should be stated in the description with a link to the other isssue(s) (e.g., "Follow-up to #5").

Good description:

Following #22, re-run the anlaysis on the Sherlock cluster to see if that improves performance.
* Run a minimal version of the base model on the Sherlock node `alpha`
* Test the subsampling procedure on the `alpha` node
* Run a minimal version of the base model on Sherlock's actual computing nodes.
* Test the subsampling procedure on Sherlock's actual computing nodes

Document necessary code changes to implement politext code on Sherlock and potential bottlenecks. 
In the long term we want to migrate politext computing to Sherlock and this issue is a first step toward that.

Bad description

Redoing everything on Sherlock including the subsampling. Remember we want alpha not only the regular nodes.

Comments

Comments in Github issue threads are the main way we communicate about our work. Our workflow outlines when to comment, but the assignee can feel free to add more comments (e.g. "note to self") as desired.

When asking a question, please be specific about the input you require in order to continue, e.g.

@gentzkow, Where would you like me to store the data files?

Users should keep email notifications for '@' references turned on. Anyone who is not the assignee of an issue will assume by default that comments not @-referencing them do not require their attention.

When asking for feedback on results, please take some time to confirm the results are correct and make sense.

When referencing other comments, issues or repositories please include a link and issue number. When referencing a file, directory, or other object, please include a permanent URL.

Commits

Each commit should have a commit message whose first line has the form #X Description of commit where X is the Github issue number (e.g., "#123 Add first appendix figure").

Crafting good commit messages is crucial to the history of work on a project being clear and readable. Commit messages should describe the purpose of the commit, and not be redundant with what Git is already recording ("Update code" or "Modify slides.lyx" are redundant; "Refactor estimate() function" and "Add robustness figure to slides" are better). Commit messages should be written in sentence case, use the imperative mood, and not end in a period ("#123 Revise abstract") not ("#123 abstract."). This post by Chris Beams has an excellent discussion of what makes a good commit message.

Any commit to main, that is merged to main, or that defines an issue deliverable should follow a complete run of the relevant modules / directories' build scripts (e.g., scons or make.py).

Pull Request

The title of the pull request should be PR for #X: original_issue_title where X is the Github issue number (e.g., "PR for #123: Update appendix figures").

If the work in the issue affects the main paper draft, the pull should include a PDF diff comparing the draft before vs. after the work in the issue.

A pull request is not required if changes will not be merged to main.

Pull Review

The description / first comment of the pull request should provide instructions that define the scope of the review along with any information the reviewer will need to execute it efficiently.

Any issue that involves substantial changes to code should be reviewed by at least one other lab member, typically an RA.

The job of the reviewer is to verify that:

  • The deliverable is clear, complete, and conforms to the standards here
  • Files committed to the repository conform to our organizational and code style rules
  • Empirical and theoretical results are clear and appear correct
  • Saved data meet our integrity criteria

It is not typically the job of the reviewer to go over every detail of the output and every line of code. While commenting on fine points of code style from time to time is fine, for example, this should not be the primary content of the review. When requesting review the assignee can request feedback in addition to the above: e.g., particular code or results that need a careful check.

Summary

When an issue is complete, you should post a final summary comment. All completed issues must have one and only one summary comment. If changes made after the summary is written, you should edit the summary comment in place rather than creating a new one. To make the summary comment easy to find, please try to avoid commenting on an issue after it has been summarized.

The summary comment should begin with Summary. It should briefly recap what the issue accomplished. It should either contain or link to the deliverable.

Deliverable

The form of the deliverable may be any combination of:

  • Content added to the draft, slides, etc. in the repository
  • A PDF or markdown file
  • A summary in the final comment in the issue thread

The deliverable should be self-contained. It should usually begin with a concise summary of the issue goal and the conclusions (e.g., answer to an empirical question), followed by supporting detail. A user should be able to learn all relevant results from the deliverable without looking back at the comment thread or task description.

The deliverable should contain enough information that another user could replicate its results. For figures, tables, or other results produced by code, a user should be able to identify the relevant code and reproduce the output. This will usually be automatic when the output is produced inside the repository. For output produced by hand (e.g., literature reviews, manual calculations) the deliverable should include enough information about the steps performed that a user could have a decent shot at repeating them.