Skip to content

Noninvasive source code formatting

Katrin Leinweber edited this page Jul 3, 2017 · 21 revisions

Background

As the tidyverse style guide puts it:

Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.

Especially in collaborative projects, a consistent coding style helps communicate the code's intent. A style guide helps defining how code should look like before it's accepted in the main development branch. This can be a document, or a facility that automatically formats the code according to that style guide; of course the latter is much more comfortable.¹

However, existing solutions to pretty-printing R code do slightly too much: any intent the developer might have put in the formatting is usually lost after pretty-printing. This project aims at implementing a pretty-printing solution that only alters the formatting where this is absolutely necessary according to the style guide in effect.

The general idea is to use the information in the parse tree (obtained from utils::getParseData()) to add/remove whitespace and line breaks as dictated by the style guide, but leave everything else untouched. The result will be code formatted according to the style guide with minimal differences to the original formatting.


1: Experience with dplyr's C++ code has shown that an automatic formatter (astyle) integrated in the package's test suite greatly simplifies the task of keeping the code in shape, and also allows to revert poor style choices easily by changing the astyle configuration. However, good pretty-printing requires a notion of the language's syntax, adapting existing pretty-printers for other languages to R seems difficult.

Related work

  • formatR: Currently powers pretty-printing in knitr. Parses code and uses R's deparsing mechanism to emit formatted source code. Applies base R's notion of source code formatting. Has problems with certain edge cases, and cannot reliably restrict the width of the formatted output.
  • Google's R formatter: Aims at producing "aesthetically appealing" formatting by using an optimization approach (technical report), requires Python.
  • "Reformat code" command in RStudio: Cannot be easily automated, implemented in Java. Applied style is just slightly different from the tidyverse style guide.
  • lintr: Only detects style violations, cannot currently fix them.

All of the above solutions operate with a hard-coded notion of style which cannot be changed easily. The code is edited too heavily, or (in the case of lintr) not at all.

Details of your coding project

A proof of concept is able to add/remove whitespace around operators, but currently cannot fix broken indentation. The project will aim at getting this draft ready for production, and show its utility by reformatting several existing mid-size R packages and analysis scripts. The package tests should be enhanced to cover all important use cases; test-driven development looks like a good strategy for this project anyway.

The package will support formatting entire R packages, formatting entire R source trees, and checking consistent formatting. Development will target the tidyverse style guide first, but the formatting rules should be implemented in a way that allows for replacement with rules that support other coding conventions.

All functions will be documented, and a short vignette will demonstrate how to use the package and the relevant package internals.

Further extensions are possible:

  • Support for breaking lines wider than a given number of characters
  • Support for a second, third, ... style guide
  • A Shiny application that allows previewing formatted source code for different style guides
  • IDE integration (RStudio, emacs-ess, ...)
  • Support for scripts with wide characters (requiring two or more columns in a fixed-width font) or zero-width characters
  • Configuration: Selection of a particular style guide for a codebase
  • Extensibility: How can third parties define their own style guide?
  • Recognition: Can we automatically select a style guide that best fits a given codebase?

Expected impact

The resulting package is a tool that both package authors and R users can benefit from, especially in collaborative settings. Package authors gain a simple tool to define how their code should look like, and check/enforce this style. R users can effortlessly format their scripts with a style guide of their choice, this will simplify future understanding of the scripts.

Mentors

  1. Kirill Müller has a computer science background and has been using R since 2012. He maintains and has developed several CRAN packages and is an active contributor to the tidyverse.
  2. Yihui Xie has authored knitr, bookdown, formatR, highr, shiny, and a great many other R packages. He has been a GSOC mentor twice (2012 and 2014).

Tests

Please file separate pull requests to the styler repository for each test result.

  • Easy: Clone the styler package, run the tests, and expand them with a few examples not yet covered. You probably want to edit in.R and two other files.

  • Medium: One of:

    • Write a function style_src() that accepts a path (default: current working directory) and formats all R files in this directory. How do you support formatting files in subdirectories?
    • Write a function check_pkg_style() that merely checks if the source files in a package are formatted correctly. How does this function interact with testthat or other test suites?
    • Write a small Shiny app that has a multiline text box and formats R source code entered/pasted there to another control. What happens in case of incorrect code? Can you color-highlight the formatted source code? Can the app be made reactive to respond to each change of the input text?
  • Hard: One of:

    • What needs to be done to fix failing package checks (R CMD check)? How do we continuously ensure good code quality?
    • Describe in your own words how the output of utils::getParseData() can help with pretty-printing, and how this output is transformed by the proof-of-concept code? Please create a stub vignette.
Clone this wiki locally