Idea List for Google Summer of Code 2015

John Vilk edited this page Jan 14, 2016 · 55 revisions

GOOGLE SUMMER OF CODE 2015 IS OVER! Please note that the information below is for Google Summer of Code 2015 only, which is now over.

About PLASMA-UMass

The PLASMA Lab at the University of Massachusetts is a research lab that explores ways to make programming languages faster, safer, and easier to use. Our software also allows programmers to easily build fast and safe applications in emerging domains like crowdsourcing, web portability, and computational social science. All of our projects aim to positively impact people outside of the programming languages community, often with a distinct aim for social good.

Our work appears in top CS publications such as PLDI, OOPSLA, ASPLOS, ICSE, and ASE and has won numerous awards, most recently, best paper at OOPSLA 2014 for our SurveyMan project and distinguished artifact at PLDI 2014 for our Doppio project. Our paper on the AutoMan project was also recently selected to appear in the Research Highlights section of Communications of the ACM.

We are looking to recruit talented GSoC students to help us push the state-of-the-art in programming language design and implementation. We plan to merge all code contributed by GSoC students into our open source projects for dissemination to the general public. Whenever relevant, we gratefully acknowledge [1,2] the work of our GSoC students in peer-reviewed scientific reports.


#Ideas

We've prepared the following list of projects that are of most interest to us. Students may also propose their own variations on these projects:

Please get in touch if you are interested in working on any of these projects. We'll be happy to give feedback on your application, and to get you started on small bugs. Also, take some time to read Google's advice for students.

Please use our Application Template for your project proposal.


Doppio / DoppioJVM

Doppio emulates threads, a file system, network sockets, and an unmanaged heap on top of JavaScript. With these resources, conventional multithreaded programming languages and their programs, such as Java, can be brought to the web. You can read the research paper on the ACM's website, or at this direct link.

DoppioJVM is a full-featured Java Virtual Machine implementation written in JavaScript. DoppioJVM is built upon the resources provided by Doppio. DoppioJVM adheres closely to the Java Virtual Machine specification in order to have compatibility with unmodified programs and languages that run on the JVM.

Because Doppio and DoppioJVM are written and maintained by a very small team, your Google Summer of Code project has the possibility of making a large impact on our users. You will also have the ability to poke at portions of the code base not directly relevant to your project, if you wish. Previous GSoC students have improved our build process and fixed bugs that weren't directly related to their projects.

Users of Doppio / DoppioJVM:

Note

Currently, Doppio and DoppioJVM share the same GitHub repository, and components of the two are a bit intertwined. We are planning to disambiguate the two by breaking out components of Doppio into separate libraries.

For example, Doppio's file system is already broken out into the library BrowserFS. There's an idea below that does the same with Doppio's network sockets.

Ideal Skills

These skills apply to all of the ideas below.

  • Experience with JavaScript (We use TypeScript, but it maps directly into JavaScript. You'll quickly learn to love it. :) )
  • Experience with Java
  • The ability to recover from terrible unexpected surprises lurking in the deep dark of most complex systems (we'll be there for emotional support).

Ideas

We have a large number of ideas, but only a few developers on the project to implement them. Below, we list the most significant ideas that are currently on our wishlist. Outside of this list, there are a million different directions in which you can take your project; Doppio opens the door to many possibilities!

Improve a Python Interpreter with Doppio

DoppioJVM is a Java Virtual Machine implementation that uses Doppio. Now, we are working on a Python implementation that uses Doppio!

We have the beginnings of a Python interpreter called Ninia written in TypeScript, but it does not use all of Doppio's resources and is limited to trivial programs. This project would involve changing the interpreter to use Doppio's threads.

If the student finishes migrating the interpreter to use Doppio's threads, then he/she will work on general accuracy improvements to improve the interpreter's compatibility with Python programs.

Throughout this process, the student will work directly with the main architect of Doppio, who will make any changes needed to support Python to core Doppio functionality.

Technical Challenges

You will need to become very familiar with the basic design of a language interpreter, and will need to investigate how Python bytecode works. Note that the bytecode is different from Python itself! Internally, Python compiles Python code into bytecode, and then it runs the bytecode.

Ideal Skills

  • Familiarity with Python
  • Familiarity with basic bytecode-based language interpreter design

Starter Task

If you are interested in this project, please complete the Ninia Starter Task, which will get you acquainted with the Python interpreter.

Generalize Doppio network socket support

During the summer of 2013, we had an excellent Google Summer of Code student implement Java's TCP socket API for DoppioJVM using WebSockets. As a result, DoppioJVM can communicate over the network to other programs.

However, this work is tightly coupled with DoppioJVM itself. As a result, it cannot be reused for other language implementations that use Doppio or from within JavaScript easily.

This work would take the existing socket support, and:

  • Move it into a separate library.
  • Implement a relevant portion of the Node Socket API using WebSockets.
  • Change DoppioJVM's networking implementation to use this library.

Once these changes are done, we can experiment with running various Node packages in the browser using the new socket support and other Doppio functionality! :)

If the student accomplishes the above quickly, we can also examine the possibility of implementing TCP over the new WebRTC standard to allow web browsers to talk directly to one another.

Technical Challenges

You will need to become very familiar with Node's socket API, including any odd corner cases or design decisions. Ideally, your implementation will be able to reuse any existing unit tests present in the NodeJS source code.

You will also need to become familiar with how DoppioJVM's native methods work in order to reimplement the needed methods in terms of the new library.

Ideal Skills

  • Familiarity with TCP sockets.
  • Familiarity with WebSockets, and how they differ from vanilla TCP connections.
  • Experience using Node, especially the Socket API.

Starter Task

If you are interested in this idea, please check out the Doppio Starter Task, which will get you acquainted with the Doppio codebase.


SurveyMan

SurveyMan is a programming language and runtime system for designing, debugging, and deploying surveys on the web at scale. We provide support for deploying web surveys on crowdsourcing platforms (such as Mechanical Turk) and locally hosted servers.

Our target users are social scientists and market researchers. These populations have varying expected technical skill and preferred tools. SurveyMan reduces the burden of writing and deploying surveys for these users.

Ideas

R Interface

R is the lingua franca of programming languages for statistical analyses in the social sciences. Right now, to use SurveyMan results in an R analysis, a user must run SurveyMan, feed the results into R as a CSV, and perform the analysis separately. This process is particularly cumbersome when a user wants to monitor a survey's progress as it runs. By allowing users to encode SurveyMan surveys in the R programming language, we can improve the integration, making SurveyMan more appealing to the large base of R programmers in the social sciences.

Technical Challenges

  • The SurveyMan runtime system runs on the Java Virtual Machine, and external code needs to be called using the Java Native Interface. Likewise, R has its own runtime and native interface. These two runtimes will need to be glued together using C, and then hooked into SurveyMan using Java.
  • We will need to design an API for R users. We would like this interface to be more than just functional. We want it to be easy to use. We have a prototype interface in development for Python, which could be used as a model, but we are open to suggestions.

Ideal Skills

  • C/C++, Java, and R programming experience
  • Experience writing R libraries is a plus, but not required
  • An interest in designing easy-to-use programming abstractions

Starter Task

If you are interested in this idea, please check out the SurveyMan Starter Task, which will get you acquainted with the SurveyMan language and runtime system.

Blocks Language Front End

SurveyMan's type system allows for a surprising level of complexity when designing surveys. This complexity can be overwhelming to users when they construct surveys as csvs. The control flow of the survey may be especially confusing to non-programmers.

A visual front end to SurveyMan would greatly impact adoption among non-programmers. We envision a drag-and-drop web interface where users generate instances of the types in our language, and combine them to form surveys. The system will use visual cues, like shape and color, to indicate types and legal operations. Once completed, the surveys will be exported to json. See this blog post on a graphical representation of a SurveyMan survey.

Technical Challenges

  • surveyman.js contains a Javascript representation of the survey. It expects to be fed a JSON representation, generated by the SurveyMan compiler. surveyman.js will need to be augmented to perform some of the static checks that SurveyMan performs. The blocks language will differ in that it checks for correctness incrementally.
  • Small surveys can be displayed in their entirety. However, we will need to devise creative ways of displaying relevant information for larger surveys, without losing a sense of the overall structure.
  • A feature that is not present in SurveyMan, but would be critically important for a visual front end is the ability to reuse pieces of data, like answer options. Consider a survey that contains all Yes/No questions. Dynamically adding block, question, or answer option templates as "types" would be very useful.

Ideal Skills

  • Some familiarity with functional languages and/or language design.
  • Experience with Javascript and CSS. Interested in working with technologies like ReactJS.
  • Basic understanding of how to build a dynamic web page.

Starter Task

If you are interested in this idea, please check out the SurveyMan Starter Task, which will get you acquainted with the SurveyMan language and runtime system.


AutoMan

AutoMan is an embedded domain-specific language for Java/Scala that allows programmers to transparently combine human expertise with traditional computer programs. This hybrid computing platform provides a way to solve many hard AI problems, such as vision, motion planning, and natural language understanding using humans, right now. AutoMan integrates human-based computation into a standard programming language as ordinary function calls, which can be intermixed freely with traditional functions. AutoMan abstracts away many of the issues that make working with human labor challenging, such as scheduling, payment, and quality control, letting the programmer focus on solving the problem at hand.

AutoMan is currently implemented as a Scala library. Scala is a relatively new OO/functional language which runs on top of the Java Virtual Machine (JVM).

Ideas

Hybrid human-computer programs are a new and exciting area of research, and we have lots of ideas for making the system easier to use, more accessible to non-Scala programmers, and more adaptable to the kind of workload desired by system builders.

AutoMan Monitoring and Debugging Plugin for IntelliJ IDEA

AutoMan allows programmers to write sophisticated human-computer programs that look very much like ordinary Scala programs. Nonetheless, humans behave very differently than computers, and sometimes defy a programmer's expectations. When a human-computer program is in the early stages of its development, it is often helpful to be able to monitor and debug that program as it executes so that further refinements can be made. While AutoMan produces copious status messages on STDERR, these messages are hard to understand, especially for programs that manage hundreds or thousands of workers.

We would like to develop a prototype plugin for IntelliJ IDEA that allows programmers to watch an AutoMan program as it runs, step into code for certain events, and even possibly intervene on running tasks (e.g., by manually supplying answers). This project combines traditionally very different aspects of computer science (programming languages, human factors, and software engineering) in a novel way, since "crowdprogramming" is a completely new field. We think this will be an exciting project to work on.

For reference, we have an initial prototype, built as a web interface instead of an IDE plugin, here.

Technical Challenges

  • We already have a basic information-reporting facility in AutoMan, that allows information to be exported (e.g., a debugging/visualization plugin for IntelliJ). But that plugin probably needs refinement as we discover useful things to monitor/visualize. This will require some hacking in Scala.
  • IntelliJ plugins need to be thread-safe using InteliiJ's concurrency model, and of course, so does AutoMan using its concurrency model. Of course, these models are totally different (threads vs. futures/actors).
  • We are very interested in adding predictive capabilities to the IntelliJ plugin. This will require either a familiarity or interest in learning some statistics/machine learning.

Ideal Skills

  • Experience with the Scala programming language.
  • Experience with the Java programming language (for plugin development).
  • Familiarity with concurrent programming.
  • Experience developing IDE plugins is a major plus.
  • Experience using a debugger, like GDB or the one present in IntelliJ IDEA. Experience using a visual debugging tool like GNU DDD a major plus.
  • An interest in creative information display.

Startup Tasks

If you can complete all of the following tasks, then you're probably capable of working on the AutoMan debugger plugin:

  • Clone the dbarowy/GSoCStarterTask repository, create a branch, commit a change to the README.md, and then send us a pull request.
  • Create a very simple web server in Scala that returns "Hello World", using something like the spray-http DSL.
  • Follow the IntelliJ IDEA Plugin Development Getting Started Guide.
  • Modify your plugin so that when you click on your custom IntelliJ button, the plugin communicates with your custom webserver and displays the result ("Hello World") in a text box.

Support for a Web API

AutoMan currently requires programmers to write their applications as persistent server daemons. While AutoMan makes this relatively easy, it's a simple fact that interested programmers may not have access to an always-on machine that can operate in this fashion. Instead, we would like AutoMan to be available to clients as a web service. Users would then submit jobs to the AutoMan web service which are then handled asynchronously from the end-user's program. This frees the programmer to write programs which can then periodically collect completed jobs from the web service.

Technical Challenges

  • The task scheduler and crowdsourcing backend infrastructure will need to be split from the client interface.
  • A RESTful AutoMan web API will need to be designed.
  • Users may be wary about sharing their AWS credentials with the party running the AutoMan web service. We will need to design a system that either abstracts these details away or securely handles authorizing the web service on the client's behalf.

Ideal Skills

  • Experience with Scala.
  • Experience with REST APIs.
  • Experience with Mechanical Turk.

Startup Tasks

If you can complete all of the following tasks, then you're probably capable of working on the AutoMan Web API:

  • Clone the dbarowy/GSoCStarterTask repository, create a branch, commit a change to the README.md, and then send us a pull request.
  • Write a simple Scala program that contacts one of the REST services listed here and prints the result to the console.
  • Create a very simple web server in Scala that returns "Hello World", using something like the spray-http DSL.
  • Modify your webserver so that it calls your Scala REST program and echoes the result instead of "Hello World".
  • Run one of the sample AutoMan programs against the MTurk sandbox environment, e.g., "simple_program".

Support for Additional Crowd-Labor Backends

AutoMan currently uses Mechanical Turk to recruit workers who perform tasks. We would like to expose some additional labor pools to AutoMan to allow the system to choose labor that may be more appropriate for the programmer's task. These services vary by labor latency, worker expertise, cost, and worker persistence. Additionally, some of these services attempt to do some of their own quality control, which may complicate interaction with AutoMan.

Technical Challenges

  • Labor recruitment APIs vary widely from service to service. While AutoMan was designed to make it easy to abstract away differences in these services, we may find that our view of the world is too focused on the way Mechanical Turk works.
  • Our scheduler currently assumes that the programmer only wants to use one service at a time. We would like AutoMan to be able to dynamically choose the backend without requiring the user to specify in advance.
  • Allowing multiple backends complicates the payment model, since users will need credentials on all of those systems, and the scheduler will need to be aware of possibly different budget constraints for each backend. Making it easy for a programmer to concisely specify these details will be important from a usability standpoint.

Startup Tasks

If you can complete *all* of the following tasks, then you're probably capable of working on additional backends for AutoMan.  Note that this project will be more challenging than the other AutoMan-related projects.

* Do the AutoMan Web API Startup Tasks (see above).
* Create a simple `Polygon` Scala trait with an abstract `area` method in Scala and implement concrete classes for `Rectangle`, `Triangle`, and `Rhombus`.  Implement two version of a `Square` class; one that inherits from `Rectangle` and one that inherits from `Rhombus`.  You will have to decide for yourself how your class constructors should work.
* Modify your Scala REST program so that a requester can pass values to the constructors of the classes above and get the area in return.

Ideal Skills

  • Experience with Scala.
  • Experience with REST APIs.
  • Experience with Mechanical Turk. Experience with other services a BIG plus.

More Information


Coz

Coz is an implementation of a new profiling technique called causal profiling. Coz helps developers identify code that is important for performance, and predicts the potential impact of optimizing that code. Unlike conventional profilers, coz can determine when optimizing code will not improve performance (or could even hurt it), leaving developers to focus on code where performance tuning will have a positive effect. Coz carries out a series of performance experiments, which test the effect of speeding up a particular source line. We are interested in extending coz to support more languages (like JavaScript and Scala) and more platforms; adding support for multi-process applications (like Chrome/Chromium); and improving the usability of causal profiles.

Note: These projects will not require a detailed understanding of how causal profiling works, but you should be willing to learn to use coz and understand its output.

To learn more about what coz does and how it works, see our technical report.

Ideas

Identifying performance-critical code is a challenge across many software domains, but coz is currently limited to Linux applications with debug information available at program startup. We would like to extend coz's reach to support a wider variety of languages and platforms, and to improve the overall usability of the tool.

Support for JIT-compiled code

Coz uses debug information to identify source lines in the application it is profiling. Currently, coz locates and processes this debug information at program startup. As a result, coz does not produce useful profiles for just-in-time compiled code, even though many JIT runtimes generate debug information that can be used to map the JIT-compiled code back to source locations. We would like to extend coz to support JIT environments by locating and processing the source information generated by JIT compilers at runtime.

Technical Challenges

  • coz already uses DWARF debug information to map executable code to source locations, but JIT runtimes use a variety of formats. Selecting which format(s) to support and implementing that support will require learning at least one new API for runtime source information.
  • Because coz currently collects source information only at startup, the data structure used to map executable code to source locations is read-only. Support for JIT-compiled code will require refinement or replacement of this data structure to support concurrent accesses, some of which may be insertions of source information for JITted code.

Ideal Skills

  • Experience with a JIT-based language such as JavaScript
  • Experience using a debugger
  • C/C++ skills
  • Familiarity with binary formats such as DWARF or ELF (less important)

Starter Task

Please complete coz starter task #1 to get some familiarity with coz and causal profiles.

Support for multi-process applications

Coz operates within a single process, profiling each of the application's threads. Coz's profiler thread runs one performance experiment at a time, during which all threads emulate the effect of optimizing a specific line of code. For coz to support multi-process applications, the profiler will need to run in all of the applications' processes and coordinate performance experiments across these processes.

Technical Challenges

  • We will need to develop a procedure for multiple processes to coordinate their experiments, likely through shared memory.
  • Cross-process communication through pipes, signals, or shared memory will need to be wrapped and handled in much the same way that coz currently handles pthread synchronization operations.
  • Processes will need to direct profile output through a single process, with the ability to migrate this responsibility to a new process if the original process exits during application execution.

Ideal Skills

  • Experience writing POSIX multi-process applications
  • Experience with C++

Starter Task

Please complete coz starter task #1 to get some familiarity with coz and causal profiles.

Cross-platform causal profiling

Coz relies on the Linux perf_event API to collect and respond to samples. Unlike conventional sampling-based profilers, a causal profiler must collect and respond to samples during the profiling run. Extending coz's sampling API (a thin wrapper on the perf_event API) to OSX or Windows sampling APIs would enable coz to support a much larger number of applications. A side benefit of this work will be a cross-platform API for program sampling, which could be useful for other projects.

Technical Challenges

  • Finding and using sampling-based profiler APIs will require learning new OS APIs, which are often complex and/or poorly-documented.
  • Coz's sampling API is tailored to Linux's perf_event API. Adding support for Windows or OSX APIs will likely require changes to the sampling API and coz itself.

Ideal Skills

  • Experience with sampling profilers or sampling APIs such as perf_event, PAPI, or others
  • Experience with C++

Starter Task

Please complete coz starter task #1 to get some familiarity with coz and causal profiles.

Live profile view

Coz produces a log of performance experiments during a profiling run. Processing this log and producing a causal profile can be done at any point during the run, but it can often take a long time to collect a detailed profile for large applications. We would like to add a live profile view to coz, so developers can see the profile as it is being collected, and potentially guide the profiler to focus on specific lines or source files that show up in the early profile results.

Technical Challenges

  • A live, interactive profile view will require familiarity with JavaScript based data visualization.
  • Enabling users to guide the profiler toward specific source locations will require some changes to coz itself, which is implemented in C++.
  • This work will require designing an interface to display a large number of causal profile graphs in an informative and uncluttered layout. This will likely require some understanding of what a causal profile presents, and how developers will use this information.

Ideal Skills

  • Experience with JavaScript
  • Experience with web-based data visualization (d3, svg, etc.)
  • Experience with web-based user interface design

Starter Task

Please complete coz starter task #1 to get some familiarity with coz and causal profiles.