Skip to content

Introduction to Stan for New Developers

Mitzi Morris edited this page Jun 22, 2020 · 22 revisions

Welcome to Stan! We're excited that you're interested in contributing to the project. Before you're able to contribute, there are some processes and other information that are good to know.

The Stan project is hosted on GitHub so you will have to create a GitHub account if you do not yet already have one. Developer discussions are hosted on Discourse so you will have to create an account there in order to ask questions or participate in discussions.

Most of the following discussion is aimed at people who want to contribute C++ code to Stan. But there are many other ways to contribute that don't involve C++!

GitHub repositories and submodule relationships

The development for the math library, language and algorithms, and interfaces are arranged into the following repositories with arrows indicating submodule inclusions.

math <- stan <- pystan
             <- rstan   <- rstanarm
             <- cmdstan <- statastan
                        <- matlabstan
                        <- stan.jl
                        <- MathematicaStan
     <- stanc3

Currently, the stan repo includes the algorithms and the service API for the interfaces. There are additional repos for tools such as the emacs mode, R plotting, R Shiny interface, web pages, etc. stanc3 hosts the language compiler. Some of the modules include others as submodules if there's a code dependency. Each of the repos also has their own wiki! Don't forget to check that wiki homepage and search it for information that might be related to that subproject.

Contributing to Stan

The Lifecycle of a Pull Request

People use Stan in many contexts. When we're deciding how to add/remove/modify Stan code, we need to understand what our goals are. This typically involves some discussion where we try to elicit some concrete use-cases for the feature, followed by a github issue in the appropriate repo with something resembling a spec for the issue that the reviewer can use to evaluate an associated pull request, followed by that pull request. These three artifacts exist in different locations, so at the top of each one there should be a link to the others and an attempt to summarize the results of previous steps in the workflow. To summarize:

  1. Bring up your proposed feature for discussion on our forums. If you're trying to find a place to help out, you can skip this and find an existing issue on the appropriate github repo.
  2. Summarize the discussion and write something approaching a high level spec in a github issue.
  3. Create a pull request with an attempt to address a github issue.

You can read more about the developer process here.

Misc

We have adopted the GitFlow process for incorporating new contributions into Stan. If you are not yet familiar with Git we recommend that you check out many of the great Git tutorials freely available online. Once you are comfortable with Git itself you can read about are particular implementation of GitFlow here and here.

All new contributions are also tested with out continuous integration framework.

Every developer has their own local development setup, but we have compiled various helpful tricks that you might find useful.

Style

In order to ensure that we can quickly read and understand contributions, consistent style is incredibly important. We have adopted conventions for code quality and code style to which all contributions must conform. You can read more on these links, but we use an automated formatter for many of our conventions.

There is a list of supported compilers and language features here.

Testing

The robustness of Stan is only as good as our test coverage, and we require that all new contributions are adequately tested. We use the GoogleTest framework for writing tests and GnuMake and Python for running those tests.

Documentation

We have two main sources of documentation - Doxygen doc comments and the Stan manual. You can read more about contributing to the former here. The latter typically has a github issue for each Stan release associated with it on the Stan repo, but we also take pull requests to the .tex files.

There are other forms of documentation listed on the website here.

Contributing Core Code

Much of what you might consider to be the "core" of Stan actually exists in the Math repo. This document applies to that repo, but you can read more about how that repo is organized and any differences here.

The core code in Stan is written in heavily-templated C++ to ensure high-performance. There are many great C++ tutorials available online, for example cplusplus.org, and once you are familiar with the basics of the language you can tackle the subtleties of templates. We highly recommend Vandevoorde and Josuttis and Alexandrescu.

There are many additional resources available for learning how to optimize C++ code, including Agner Fog's manuscript and the many books of, amongst others, Scott Meyers and Herb Sutter.

Contributing new densities

Having a comprehensive set of useful densities coded in the Stan math library is a benefit to users. Densities are also a maintenance burden both for testing and for understanding the code base. As a result we are somewhat cautious about including new densities. Guidelines for including densities:

  • The pdf, cdf, and rng should be available so users of the Stan language don't need to check the manual.
  • There should be a computational benefit to coding the density in C++. Some densities can easily and efficiently be specified in the Stan language and the benefits of coding them in C++ are limited. It helps to provide some evidence of the computational benefits.
  • The density should be applicable to a range of problems.
  • If the density's C++ code re-implements or improves on functions already present in the math library, the necessary improvements should be coded separately in the math library.
  • Ongoing interest from the code author in maintaining the code.

Contributing to the Interfaces

The Stan interfaces wrap the core C++ code and expose its functionality to other languages, such as R and Python. Consequently contributions to the interfaces may require knowledge of how to couple these languages together, for example with Rccp and Cython, or be built entirely in the interface language. For details on a specific interface please consult the corresponding GitHub repository.

Once you have familiarized yourself with our process take a look at the GitHub issue trackers for the many tasks that need to be tackled! We look forward to hearing from you on Discourse and seeing your pull requests!

Useful utilities

This section contains some tips for using developer-oriented tools and setting up a computing environment for development.

Git

Remove Untracked Files

There are a lot of files that are in our .gitignore file that stack up and don't gt cleaned. In order to remove every untracked file, including hidden ones, do this:

git clean -d -x -f

Warning: this will kill everything that's not currently being tracked. You probably want to run git status first.

Git completion

https://github.com/git/git/tree/master/contrib/completion

  • git-prompt.sh. This changes the prompt on the command line to show the current branch. Install by copying it to ~/, follow install instructions. For a cleaner prompt, replace the PS1 suggested with: PS1='\w$(__git_ps1 " (%s)")> ' The prompt will look like: ~/stan (master)> where "~/stan" is the current directory, "(master)" indicates the current branch is the master branch.

  • git-completion.*sh. Install this for auto-completion from the command line. It auto-completes git commands and git branches. For example, type git checkout then hit tab twice. It should show the available branches.

Mac utilities

Aquamacs: single kill buffer

By default, aquamacs will has multiple kill buffers. This means that there is a copy and paste buffer by using command-c/x/v and there is a separate copy and paste buffer by using ctrl-w, alt-w, ctrl-y. This gets really confusing. Here's how to have a single kill buffer so copying from any Mac program will paste into emacs using ctrl-y or command-v.

  1. Open ~/.emacs (or wherever else you're storing your preferences)
  2. Add: (setq x-select-enable-clipboard t)

Fixing Line Feeds and Tabs

Ant includes a handy task called FixCRLF that "Adjusts a text file to local conventions." So you can set it to replace tabs with spaces and Windows line ends with unix ones.

Emacs

Autoformatting space

To make sure you never have tabs or line-ending spaces in your code files, you can use this in your .emacs file:

(defun java-mode-untabify ()
   (save-excursion
     (goto-char (point-min))
     (if (search-forward "\t" nil t)
         (untabify (1- (point)) (point-max))))
   nil)

 (add-hook 'java-mode-hook
           '(lambda ()
              (make-local-variable 'write-contents-hooks)
              (add-hook 'write-contents-hooks 'java-mode-untabify)))

 (add-hook 'html-mode-hook
           '(lambda ()
              (make-local-variable 'write-contents-hooks)
              (add-hook 'write-contents-hooks 'java-mode-untabify)))

 (add-hook 'cpp-mode-hook
           '(lambda ()
              (make-local-variable 'write-contents-hooks)
              (add-hook 'write-contents-hooks 'java-mode-untabify)))

 (add-hook 'stan-mode-hook
           '(lambda ()
              (make-local-variable 'write-contents-hooks)
              (add-hook 'write-contents-hooks 'java-mode-untabify)))

You can rename --- the "Java" in the title is a holdover from where I first got the macros.

You can also automatically remove line-final whitespace (this is just for C++, but it could be hooked elsewhere):

(add-hook 'c++-mode-hook
          (lambda () (add-to-list 'write-file-functions 'delete-trailing-whitespace)))

Cpp

OS Detection

Not just for Windows:

http://nadeausoftware.com/articles/2012/01/c_c_tip_how_use_compiler_predefined_macros_detect_operating_system Some GCC specifics:

http://stackoverflow.com/questions/259248/how-do-i-test-the-current-version-of-gcc

Stan developer resources

C++ resources