Skip to content

Publications

Nathan Pinnow edited this page Aug 14, 2019 · 3 revisions

This page contains some of the research papers associated with the ROSE project over the last several years. For numerous reasons, we feel that the latest papers are the best papers, this is likely typical of any ambitious project; but we have included everything for completeness. It is hoped that the underlying goal within each paper of supporting the use of high-level abstractions will be clear together with our attempts to address the performance issues required for the use of high-level abstractions within scientific computing.

2019

2018

2017

2016

2015

Enhancing domain specific language implementations through ontology

C. Liao, P. Lin, D. J. Quinlan, Y. Zhao, and X. Shen, “Enhancing domain specific language implementations through ontology,” in Proceedings of the 5th international workshop on domain-specific languages and high-level frameworks for high performance computing, New York, NY, USA, 2015, p. 3:1–3:9.

Experiences of using the openmp accelerator model to port DOE stencil applications

P. Lin, C. Liao, D. J. Quinlan, and S. Guzik, “Experiences of using the openmp accelerator model to port DOE stencil applications,” in Openmp: heterogenous execution and data movements – 11th international workshop on openmp, IWOMP 2015, aachen, germany, october 1-2, 2015, proceedings, 2015, pp. 45-59.

Supporting multiple accelerators in high-level programming models

Y. Yan, P. Lin, C. Liao, B. R. de Supinski, and D. J. Quinlan, “Supporting multiple accelerators in high-level programming models,” in Proceedings of the sixth international workshop on programming models and applications for multicores and manycores, New York, NY, USA, 2015, pp. 170-180.

2014

Verification of polyhedral optimizations with constant loop bounds in finite state space computations

M. Schordan, P. Lin, D. Quinlan, and L. Pouchet, “Verification of polyhedral optimizations with constant loop bounds in finite state space computations,” in Leveraging applications of formal methods, verification and validation. specialized techniques and applications, T. Margaria and B. Steffen, Eds., Springer Berlin Heidelberg, 2014, vol. 8803, pp. 493-508.

2013

Early experiences with the openmp accelerator model

In this paper, we examine the newly released accelerator directives and create an initial reference implementation, referred to as HOMP (Heterogeneous OpenMP). Focused on targeting NVIDIA GPUs, our work is based on an existing OpenMP implementation in the ROSE source-to-source compiler infrastructure. HOMP includes extensions to parse the new constructs and to represent them in the AST and other compiler translation details. Further we provide initial runtime support. For our evaluation, we have adapted a few existing OpenMP codes to use the accelerator model directives and present preliminary performance results. Finally, we critique the accelerator model in terms of its impact on developers and compiler writers and suggest possible improvements

C. Liao, Y. Yan, B. R. de Supinski, D. J. Quinlan, and B. Chapman, “Early experiences with the openmp accelerator model,” in Openmp in the era of low power devices and accelerators, Springer, 2013, pp. 84-98.

2012

Openmp-checker: detecting concurrency errors of openmp programs using hybrid program analysis

This paper presents a novel technique to detect data races and deadlocks of OpenMP programs, using hybrid program analysis. Specifically, we use an SMT-solver based static analysis to analyze OpenMP source code. Then we use a dynamic analysis to confirm, or rule out, the potential errors. The static analysis narrows down the code regions and events that need to be monitored, significantly reducing the overhead of the dynamic analysis. Our experiments show that OpenMP-Checker is more scalable and accurate at pinpointing concurrency errors within a set of chosen benchmarks, compared to the two commercial tools, Sun Thread Analyzer and Intel Thread Checker.

H. Ma, Q. Chen, L. and Wang, C. Liao, and D. Quinlan, “Openmp-checker: detecting concurrency errors of openmp programs using hybrid program analysis,” in Poster paper icpp’12, the 41st international conference on parallel processing, , 2012.

Rose:: FTTransform - A source-to-source translation framework for exascale fault-tolerance research

This paper presents a compiler based transformation released in ROSE and demonstrates the use of Triple Modular Redundancy as an approach to provide HPC software with fault tolerance against transient faults, as we expect them to manifest themselves on future Exascale architectures. The paper presents performance results showing that for a randomly selected subset of benchmarks the overhead of this extra layer of support is about 20%. We expect that may be competitive with future approaches to fault tolerance using check-point restart that may be much more expensive or maybe even intractable for Exascale. This work is released as a framework within ROSE to support research work in this area by ourselves and collaborators.

J. Lidman, D. J. Quinlan, C. Liao, and S. A. McKee, “Rose:: fttransform-a source-to-source translation framework for exascale fault-tolerance research,” in Dependable systems and networks workshops (dsn-w), 2012 ieee/ifip 42nd international conference on, 2012, pp. 1-6.

Auto-scoping for openmp tasks

This paper presents an auto-scoping algorithm to work with OpenMP tasks. (Auto-scoping is the process of automatically determining the data sharing dependencies of variables in OpenMP programs). This is a much more complex challenge due to the uncertainty of when a task will be executed, which makes it harder to determine what parts of the program will run concurrently. We also introduce an implementation of the algorithm and results with several benchmarks showing that the algorithm is able to correctly scope a large percentage of the variables appearing in them.

S. Royuela, A. Duran, C. Liao, and D. J. Quinlan, “Auto-scoping for openmp tasks,” in Openmp in a heterogeneous world, Springer, 2012, pp. 29-43.

Studying the impact of application-level optimizations on the power consumption of multi-core architectures

This paper presents an extensive study of the impact of application level optimizations on both the performance and power efficiencies of applications from a wide range of scientific and embedded systems domains. We observe that application-level optimizations often have a much larger impact on performance than on power consumption. However, optimizing for performance does not necessarily lead to better power consumption, and vice versa. Compared to sequential applications, multithreaded applications give more room for performance and power improvements. Additionally, a number of optimizations, including loop and thread affinity optimizations, have shown great potential in supporting collective enhancement of both performance and power efficiency. Our experimental results provide several insights to help exploit these optimizations effectively.

S. M. F. Rahman, J. Guo, A. Bhat, C. Garcia, M. H. Sujon, Q. Yi, C. Liao, and D. Quinlan, “Studying the impact of application-level optimizations on the power consumption of multi-core architectures,” in Proceedings of the 9th conference on computing frontiers, 2012, pp. 123-132.

Bamboo: translating mpi applications to a latency-tolerant, data-driven form

T. Nguyen, P. Cicotti, E. Bylaska, D. Quinlan, and S. B. Baden, “Bamboo: translating mpi applications to a latency-tolerant, data-driven form,” in Proceedings of the international conference on high performance computing, networking, storage and analysis, 2012, p. 39.

2011

Rethinking hardware-software codesign for exascale systems

This paper presents work combining the LBL node-simulator, the SNL, network simulator, and the ROSE compiler to demonstrate analysis of software and the workflow required for such tools to analyze the power requirements of HPC code using autotuning to define optimial points in the design space. The paper lays out an approach to co-design at the start of work that is a part of the CoDEX project lead by LBL and including both SNL and LLNL.

J. Shalf, D. Quinlan, and C. Janssen, “Rethinking hardware-software codesign for exascale systems,” Computer, vol. 44, iss. 11, pp. 22-30, 2011.

The rose source-to-source compiler infrastructure

D. Quinlan and C. Liao, “The rose source-to-source compiler infrastructure,” in Cetus users and compiler infrastructure workshop, in conjunction with pact 2011, 2011.

Foropencl: transformations exploiting array syntax in fortran for accelerator programming

This paper presents an OpenCL code generator leveraging the semantics of the F90 array constructs. Such GPU work is expected to be an important part of future Exascale programming environments, this work demonstrates how ROSE is used to support the analysis of the input code, and the translation and code generation required to generate OpenCL code for GPUs.

M. J. Sottile, C. E. Rasmussen, W. N. Weseloh, R. W. Robey, D. Quinlan, and J. Overbey, “Foropencl: transformations exploiting array syntax in fortran for accelerator programming,” in 2nd international workshop on gpus and scientific applications (gpusca 2011), 2011, p. 23.

Runtime detection of c-style errors in upc code

This paper present work to define a dynamic analysis for correctness of UPC usage and leverages the RTED test suite from Iowa State University. This work is released in ROSE and shows how to build a dynamic analysis level of support to catch errors as represented by test codes in the RTED test suit for UPC. The correctness of using programming models is an important aspect of the design of future programming models for Exascale. This paper shows how to design dynamic analysis-based tools to evaluate correctness of the UPC languages programming model.

P. Pirkelbauer, C. Liao, T. Panas, and D. Quinlan, “Runtime detection of c-style errors in upc code,” in Proceedings of fifth conference on partitioned global address space programming models, pgas, 2011.

2010

A rose-based openmp 3.0 research compiler supporting multiple runtime libraries

C. Liao, D. J. Quinlan, T. Panas, and B. R. de Supinski, “A rose-based openmp 3.0 research compiler supporting multiple runtime libraries,” in Beyond loop level parallelism in openmp: accelerators, tasking and more, Springer, 2010, pp. 15-28.

Semantic-aware automatic parallelization of modern applications using high-level abstractions

C. Liao, D. J. Quinlan, J. J. Willcock, and T. Panas, “Semantic-aware automatic parallelization of modern applications using high-level abstractions,” International journal of parallel programming, vol. 38, iss. 5-6, pp. 361-378, 2010.

2009

Towards an abstraction-friendly programming model for high productivity and high performance computing

C. Liao, D. Quinlan, and T. Panas, “Towards an abstraction-friendly programming model for high productivity and high performance computing,” Lawrence Livermore National Laboratory (LLNL), Livermore, CA 2009.

Effective source-to-source outlining to support whole program empirical optimization

This paper describes our work of using ROSE to build an effective source-to-source outliner in order to support whole program empirical optimization (also called autotuning). The ROSE outliner addresses the problem of extracting tunable kernels out of large scale applications, thereby helping to convert the challenging whole-program tuning problem into a set of more manageable kernel tuning tasks. In particular, the outliner can generate kernels which preserve performance characteristics of tuning targets which can be easily handled by other tools. This work also demonstrates how one can use ROSE’s compiler analyses to enhance the quality of source-to-source translation.

C. Liao, D. J. Quinlan, R. Vuduc, and T. Panas, “Effective source-to-source outlining to support whole program empirical optimization,” in Languages and compilers for parallel computing, Springer, 2010, pp. 308-322.

Detecting code clones in binary executables

A. Sæbj{o}rnsen, J. Willcock, T. Panas, D. Quinlan, and Z. Su, “Detecting code clones in binary executables,” in Proceedings of the eighteenth international symposium on software testing and analysis, 2009, pp. 117-128.

Techniques for software quality analysis of binaries: applied to windows and linux

T. Panas and D. Quinlan, “Techniques for software quality analysis of binaries: applied to windows and linux,” Defects, vol. 9, pp. 6-10, 2009.

Extending automatic parallelization to optimize high-level abstractions for multicore

This paper describes an approach to extending automatic parallelization to optimize applications written using high level abstractions. This work exemplifies a typical usage of ROSE and an initial work by us on the general subject of how to leverage semantics associated with high level of abstractions to enable more optimizations.

C. Liao, D. J. Quinlan, J. J. Willcock, and T. Panas, “Extending automatic parallelization to optimize high-level abstractions for multicore,” in Evolving openmp in an age of extreme parallelism, Springer, 2009, pp. 28-41.

2008

Signature visualization of software binaries

T. Panas, “Signature visualization of software binaries,” in Proceedings of the 4th acm symposium on software visualization, 2008, pp. 185-188.

Towards distributed memory parallel program analysis

D. J. Quinlan, G. Barany, and T. Panas, “Towards distributed memory parallel program analysis,” in Scalable program analysis, Dagstuhl, Germany, 2008.

2007

Shared and distributed memory parallel security analysis of large-scale source code and binary applications

D. Quinlan, G. Barany, and T. Panas, “Shared and distributed memory parallel security analysis of large-scale source code and binary applications,” Lawrence Livermore National Laboratory (LLNL), Livermore, CA 2007.

Communicating software architecture using a unified single-view visualization

T. Panas, T. Epperly, D. Quinlan, A. Saebjornsen, and R. Vuduc, “Communicating software architecture using a unified single-view visualization,” in Engineering complex computer systems, 2007. 12th ieee international conference on, 2007, pp. 217-228.

Techniques for specifying bug patterns

D. J. Quinlan, R. W. Vuduc, and G. Misherghi, “Techniques for specifying bug patterns,” in Proceedings of the 2007 acm workshop on parallel and distributed systems: testing and debugging, 2007, pp. 27-35.

Analyzing and visualizing whole program architectures

T. Panas, D. Quinlan, and R. Vuduc, “Analyzing and visualizing whole program architectures,” in Icse workshop on aerospace software engineering (aerose), minneapolis, mn, 2007.

Tool support for inspecting the code quality of hpc applications

T. Panas, D. Quinlan, and R. Vuduc, “Tool support for inspecting the code quality of hpc applications,” in Proceedings of the 3rd international workshop on software engineering for high performance computing applications, 2007, p. 2.

2006

Improving distributed memory applications testing by message perturbation

R. Vuduc, M. Schulz, D. Quinlan, B. De Supinski, and A. Sæbj{o}rnsen, “Improving distributed memory applications testing by message perturbation,” in Proceedings of the 2006 workshop on parallel and distributed systems: testing and debugging, 2006, pp. 27-36.

Support for whole-program analysis and the verification of the one-definition rule in c++

D. Quinlan, R. Vuduc, T. Panas, J. Härdtlein, and A. Sæbj{o}rnsen, “Support for whole-program analysis and the verification of the one-definition rule in c++,” Paul e. black, helen gill, and w. bradley martin (co-chairs), vol. 500, p. 27, 2006.

2001-2005

Improving the computational intensity of unstructured mesh applications

This paper is about the optimization of unstructured grid applications and represent preparatory work for future automated transformations specific to unstructured grid applications within DOE using ROSE.

B. S. White, S. A. McKee, B. R. de Supinski, B. Miller, D. Quinlan, and M. Schulz, “Improving the computational intensity of unstructured mesh applications,” in Proceedings of the 19th annual international conference on supercomputing, 2005, pp. 341-350.

Applying loop optimizations to object-oriented abstractions through general classification of array semantics

This paper outlines an approach to the optimization of user-defined abstractions. This work represents a substantial goal for ROSE and an initial work by us on the general subject of how to write code at a very high level of abstraction and have the lower level code required to get good performance be automatically generated. This paper covers the details of optimizing object-oriented abstractions usingROSE. Unfortunately, ROSE is not mentioned anywhere in the paper, a ridiculous oversight, but oh well. The subject is the optimization, not the ROSE compiler infrastructure.

Q. Yi and D. Quinlan, “Applying loop optimizations to object-oriented abstractions through general classification of array semantics,” in Languages and compilers for high performance computing, Springer, 2005, pp. 253-267.

Classification and utilization of abstractions for optimization

This paper is a general introduction to recent work in the ROSE project.

D. Quinlan, M. Schordan, Q. Yi, and A. Saebjornsen, Classification and utilization of abstractions for optimization, Springer, 2006.

A Source-To-Source Architecture for User-Defined Optimizations

This paper covers the architecture of ROSE as a project.

Schordan M., Quinlan D., “A Source-To-Source Architecture for User-Defined Optimizations”, Joint Modular Languages Conference held in conjunction with EuroPar’03, Austria, August 2003

Semantic-Driven Parallelization of Loops Operating on User-Defined Containers

This paper is the informal proceedings version and demonstrates the optimization of generalized container abstractions and is related to Active Library research (or so I understand). It is also related to Telescoping Language research. The paper demonstrates a few of the newest features in ROSE and has served an an introduction for the authors into the optimization of the STL library more generally.

Daniel J. Quinlan, Markus Schordan, Qing Yi, Bronis R. de Supinski: Semantic-Driven Parallelization of Loops Operating on User-Defined Containers. LCPC 2003: 524-538

A C++ Infrastructure for Automatic Introduction and Translation of OpenMP Directives

This paper demonstrates the use of ROSE to recognize OpenMP pragmas and, using the Nanos OpenMP runtime library, build a subset of an OpenMP specific compiler for C++.

Daniel J. Quinlan, Markus Schordan, Qing Yi, Bronis R. de Supinski: A C++ Infrastructure for Automatic Introduction and Translation of OpenMP Directives. WOMPAT 2003: 13-25

Treating a User-Defined Parallel Library as a Domain-Specific Language

This paper is specific to compile-time optimization of array classes. It demonstrates what was at the time the most current work on the compile-time optimization of an array class library. ROSE is more general, but this paper is very specific to the optimization of a single library.

Quinlan, D. J., Miller, B., Philip, B., and Schordan, M. 2002. Treating a User-Defined Parallel Library as a Domain-Specific Language. In Proceedings of the 16th international Parallel and Distributed Processing Symposium (April 15 – 19, 2002). IEEE Computer Society, Washington, DC, 324

Parallel Object-Oriented Framework Optimization

This is one of the first papers on ROSE presented at CPC2001 and later updated for publication into the Journal of Concurrency, Practice, and Experience.

Quinlan, D. Schordan, M. Philip, B. Kowarschik, M. “Parallel Object-Oriented Framework Optimization”, Special Issue of Concurrency: Practice and Experience (2003), also in Proceedings of Conference on Parallel Compilers (CPC2001), Edinburgh, Scotland, June 2001.

The Specification of Source-To-Source Transformations for the Compile-Time Optimization of Parallel Object-Oriented Scientific Applications

This was a paper which specified some elements of what later became the string based AST rewrite mechanism used in ROSE.

Quinlan, D., Schordan, M. Philip, B. Kowarschik, M. “The Specification of Source-To-Source Transformations for the Compile-Time Optimization of Parallel Object-Oriented Scientific Applications”, Submitted to Parallel Processing Letters, also in Proceedings of 14th Workshop on Languages and Compilers for Parallel Computing (LCPC2001), Cumberland Falls, KY, August 1-3 2001.

ROSETTA: The Compile-Time Recognition of Object-Oriented Library Abstractions and Their Use Within User Applications

This paper describes the development of a tool, ROSETTA, which build object-oriented Intermediate Representations (IRs) for compilers. It is a tool used within ROSE to build the SAGE III IR which we use internally with the EDG front-end. It is specific to details of the internal ROSE compiler infrastructure.

D. Quinlan and B. Philip, “ROSETTA: The Compile-Time Recognition of Object-Oriented Library Abstractions and Their Use Within User Applications”, in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001), 2001

2000 and Earlier

ROSE: Compiler Support for Object-Oriented Frameworks

This paper was an introduction to the work being done at the time on ROSE complete with a more detailed motivation for compile-time optimization of specific libraries.

Quinlan, D., “ROSE: Compiler Support for Object-Oriented Frameworks” Proceedings of Conference on Parallel Compilers (CPC2000), Aussois, France, January 2000. Also published in special issue of Parallel Processing Letters, Vol. 10.

ROSE II: An Optimizing Code Transformer for C++ Object-Oriented Array Class Libraries

This paper present preliminary work on the compile-time optimization of array class libraries.

Kei Davis and Dan Quinlan, ROSE II: An Optimizing Code Transformer for C++ Object-Oriented Array Class Libraries, World Multiconference on Systemics, Cybernetics and Informatics and 5th International Conference on Information Systems Analysis and Synthesis Vol.5: Computer Science and Engineering, Jul 31-Aug 4, 1999, Orlando, Florida

C++ Expression Templates Performance Issues in Scientific Computing

Discusses the different approaches to the optimization of array class libraries. Optimization of array class libraries led to the development of ROSE as a project, though ROSE is not at all specific to array class libraries and addresses the optimization of libraries generally. This paper can be helpful in understanding what work was done using language template features within C++ before attempting to address the optimization issues more generally at compile time. Prior work started on ROSE had been abandoned because of the perceived significant advantages of template meta-programming techniques for scientific computing. Several papers on the details of template use were written, this is the most complete of them. It is included with these papers to provide a bit of perspective (currently historical).

F. Bassetti, K. Davis, D. Quinlan, “C++ Expression Templates Performance Issues in Scientific Computing,” ipps, pp.0635, 12th. International Parallel Processing Symposium, 1998

Related Papers

A++/P++ array classes for architecture independent finite difference computations

R. Parsons and D. Quinlan, “A++/P++ array classes for architecture independent finite difference computations,” in Proceedings of the second annual object-oriented numerics conference (oonski’94), 1994.

P++, a c++ virtual shared grids based programming environment for architecture-independent development of structured grid applications

M. Lemke and D. Quinlan, “P++, a c++ virtual shared grids based programming environment for architecture-independent development of structured grid applications,” in Preceeding of the conpar/vapp v, 1992.

Overture: a framework for the complex geometries

D. Brown, W. Henshaw, and D. Quinlan, “Overture: a framework for the complex geometries,” in Proceedings of the iscope’99 conference, 1999.

You can’t perform that action at this time.