R Code Optimizer
R Code Optimizer
A brief search on the web suffices to notice that R is slow compared to other popular programming languages. “The R interpreter is not fast and execution of large amounts of R code can be unacceptably slow” . The main reason for this is because “R was purposely designed to make data analysis and statistics easier for you to do. It was not designed to make life easier for your computer” . Currently the most widely used R interpreter is GNU-R, although there are several implementations of R interpreters that attempt to improve execution speed [3–8], “switching interpreters is something to consider carefully” .
“Beyond performance limitations due to design and implementation, it has to be said that a lot of R code is slow simply because it’s poorly written. Few R users have any formal training in programming or software development … This means that it’s relatively easy to make most R code much faster” .
“It is important to pursue efficiency issues, and in particular, speed” . “A good deal of work is going into making R more efficient. Much of this work consists of reimplementing interpreted R code” .
The main goal of this project is to provide a GNU-R package with functions that allow users to automatically apply different strategies to optimize their R code. The developed functions will have as input and output R code so that the resulting code will allow the user to understand what modifications in the code cause its optimization.
To the best of our knowledge the only existing tool to automatically
optimize R code is the
The high impact of such library was demonstrated as it was added to
GNU-R since version 2.13.0. Although the
compiler library manages,
in certain cases, to improve the execution time of the R code, its main
objective is to compile expressions into byte
Since the main goal of the
compiler package is not optimization, it is
that, as we show in the optimization strategies
section, this library leaves aside several optimization strategies
commonly known by the
. In addition to this, as the result of applying the functions of
compiler library is byte code, it does not allow the user to
easily understand which modifications make their code more efficient.
Other types of related work include blog posts, web pages, and books that provide tips and guides to follow in order to omptimize R code [2,9,12–16]. Although intuitive and easy to apply strategies are found in these texts, none of them provide an automatic way of optimizing the code.
Automatic code optimization strategies were firstly implemented for compiled languages, the best known example being the GNU Compiler Collection (gcc; formerly called GNU C Compiler). This C code compiler was initially developed more than 30 years ago and implements more than 100 different code optimization techniques. While it is known that R is interpreted and therefore certain optimization techniques for compiled code cannot be implemented, many of these ideas can be applied to interpreted languages. As a precedent of interpreted languages that have tools for code optimization are the case of PMD for Java, or Vulture and PyCC for Python.
To evaluate the feasibility of this project, a portion of the
optimization strategies present in the citations of this document were
evaluated. In this sense, for each strategy, was implemented a function
f with a non-optimized code chunk, and a
f_opt function with the
modification that would result after applying the optimization strategy.
Additionally, both functions were compiled using the
compiler package. Evaluation times were obtained, using the
microbenchmark R package, by evaluating the resulting 4 functions with
the same (as similar as possible) inputs.
Common optimization strategies:
- Loop-invariant code motion
- Dead code and dead store elimination
- Common subexpression elimination
- Constant folding and constant propagation
- Jump threading
- Inline expansion
R-specific optimization strategies:
Details of your coding project
The tasks to be carried out during the present summer of code project will be:
- Study several code optimization strategies. Evaluate the complexity of implementing them in R, and their efficiency gains (mainly speed).
- Rank the optimization strategies based on efficiency gain against complexity.
- Analyze methods for R code parsing, e.g., the one used by the
compilerpackage. Select an appropriate parsing method to use.
- Analyze alternatives of how to model R code (functions, chunks),
e.g., the one used by the
compilerpackage, executions trees, etc. Select an appropriate alternative to use.
- Create a GNU-R package (tests, docs, etc.) that contains the top ranked optimization strategies. The package will be designed in a way that results extensible, so that it serves as the basis to continue collaboratively adding new optimization strategies.
- Since the output of the package functions will be R code, it is expected to be used to teach/learn efficient coding practices.
- The most ambitious impact of this project would be to replicate the
success generated by the
compilerpackage. Even more, a pipeline of
R Code Optimizer %>% compilerwould generate great results. While this expectation sounds ambitious, by checking the correctness of the implementation of each optimization strategy then this objective would be a reality.
- Dr. Nicolás Wolovick - is an expert in high-performance computing, optimizing compilers, low-level programming, etc. Teaches the “operating systems” course since 2002, and “parallel computing” since 2012.
- Dr. Yihui Xie - well, every R user knows him, he has authored knitr, bookdown, DT, formatR, highr, servr, testit, and many other high impact R packages. He has been a GSOC mentor three times (2012, 2014 and 2017).
 R. Ihaka, R: Lessons learned, directions for the future, in: Joint Statistical Meetings, The Authors, 2010. https://www.stat.auckland.ac.nz/~ihaka/downloads/JSM-2010.pdf.
 H. Wickham, Advanced r, Chapman; Hall/CRC, 2014. http://adv-r.had.co.nz/.
 Microsoft r open, 2018. https://mran.microsoft.com/open.
 PqR - a pretty quick version of r, 2018. http://www.pqr-project.org/.
 Renjin, 2018. http://www.renjin.org/.
 FastR, 2018. https://github.com/oracle/fastr.
 Riposte, a fast interpreter and jit for r, 2015. https://github.com/jtalbot/riposte/tree/library.
 Rho, 2017. https://github.com/rho-devel/rho.
 C. Gillespie, R. Lovelace, Efficient r programming, O’Reilly Media, Incorporated, 2016. https://csgillespie.github.io/efficientR/.
 R. Ihaka, R: Past and future history, Computing Science and Statistics. 392396 (1998). https://www.stat.auckland.ac.nz/~ihaka/downloads/Interface98.pdf.
 K. Cooper, L. Torczon, Engineering a compiler, Elsevier, 2011. https://www.elsevier.com/books/engineering-a-compiler/cooper/978-0-12-088478-0.
 P. Burns, The r inferno, 2011. https://www.burns-stat.com/pages/Tutor/R_inferno.pdf.
 Fast r code, n.d. http://www.dartistics.com/fast-r-code.html.
 Strategies to speedup r code, 2016. https://datascienceplus.com/strategies-to-speedup-r-code/.
 FasteR! HigheR! StrongeR! - a guide to speeding up r code for busy people, 2013. http://www.noamross.net/blog/2013/4/25/faster-talk.html.
 Making r code faster : A case study, 2017. https://robinsones.github.io/Making-R-Code-Faster-A-Case-Study/.