Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PolicyBrain 2.0 Outline #906

Open
hdoupe opened this issue Jun 17, 2018 · 0 comments
Open

PolicyBrain 2.0 Outline #906

hdoupe opened this issue Jun 17, 2018 · 0 comments

Comments

@hdoupe
Copy link
Collaborator

hdoupe commented Jun 17, 2018

PolicyBrain 2.0

Overview

One of our goals for PolicyBrain is for it to become a platform for computational modeling projects. To get started, we need to define a set of standards which will allow PolicyBrain to provide a dependable interface for the upstream projects. Clarity for how PolicyBrain will interact with upstream models will clearly define the line between PolicyBrain’s responsibility and the upstream projects’ responsibilities. PolicyBrain should take a standard approach to all models and work to remove itself from implementing model-specific details. This gives upstream projects more responsibility and control over the modeling process. This also removes PolicyBrain developers, who often lack domain expertise, from the modeling process as much as possible. Applying a standard approach to all models reduces PolicyBrain maintenance costs, gives PolicyBrain the ability to stand up new projects quickly and with negligible costs, and allows modeling projects to easily add new and varied capabilities for to their users.

Steps

The PolicyBrain 2.0 platformization project requires that we complete several steps: rework the PolicyBrain GUI to accommodate standardized project APIs; establish PolicyBrain-Standards for modeling project APIs; redesign the PolicyBrain infrastructure to accommodate alternative environments for each project and scalability.

Step 1a: Redesign the PolicyBrain GUI to accommodate standardized project APIs

Model inputs

Our end goal for model inputs is to hybridize our current "file upload" and "GUI" capabilities into a single, sectioned, model-controller page. In each section, parameter values can either be edited manually via the GUI or uploaded via a JSON file upload. If edited manually, the resulting JSON file will be downloadable, and everything should be replicable locally. See low-fi mockup here!

By setting some simple, flexible, standards for the JSON files that define each section, we hope to be able to build the model-controller page for a model in a mechanized fashion.

This means that we will no longer have widgets (like the TaxBrain Data Source or Start Year widgets) that appear as one-off, stand-alone items and that we will no longer have multi-step apps. Every type of input -- including behavioral parameters, start year, data source, and tax law parameters -- will appear as a standardized sections of a single model-controller page.

We will move towards this goal incrementally.

To begin with, more emphasis will be placed on PolicyBrain’s file-upload capabilities. Intermediate steps include providing a link to the file upload page from the GUI page, providing a capability to download the JSON file that is created at the GUI page, and displaying the created JSON file on the outputs page.

The first step to moving from multi-step apps to a single, sectioned, model-controller page is to move the behavioral simulation and elastic parameters onto the TaxBrain static run parameter input page. Upon completing this step, it will be easier to generalize the approach and the necessary API for creating an input page with multiple parameter sections.

A consequence of removing multi-step apps is that there will be only one app for each package endpoint. For example, the TaxBrain static and behavioral apps are submitted to the same Tax-Calculator end-point. Thus, they would be placed in the same application. The GUI builder page would have a section for the static parameters and the assumptions parameters. A static run would be performed by specifying only the static parameters and not any of the behavioral parameters. The user can do a behavioral run by specifying the behavioral parameters in addition to the static parameters. The main difference is that the user doesn’t have to go through multiple stages of the app to get to the model they want to use. This gives clear lines of authority over which modeling project owns and operates the app. Thus, the OG-USA app is OG-USA’s app—no matter that it is dependent on Tax-Calculator.

Model outputs

Model-specific table building functionality will be removed from PolicyBrain. Instead, the upstream project will deliver content that is already renderable such as an HTML table or a picture. The upstream project will also deliver content that is only meant to be downloaded by the user such as a CSV file. The user will have an option to download all content as a zip file. All of this data will be stored by the webapp in these deliverable formats. This should resolve many of the backwards compatibility problems that have tripped us up in the past and give more power and responsibility to the upstream project for delivering results to the user. Example outputs are CSV files, JSON files, and tables or graphs (both static pictures and interactive widgets (see the CCC bubble plot)). The tables or graphs could either be pictures or some type of HTML/JavaScript file(s). We would need to establish further rules about what formats are allowed and their maximum size. They can provide anything that generally fits on grid, and we encourage the tables and charts to view well on mobile devices. See low-fi mockup here!

How we could use help from modeling-project and other contributors during this step:

  • Front-end Feedback

    • Is there something that you miss incredibly from the PB 1.0 layout that will not be available in the new framework?

    • What did you really want to do in PB 1.0 that you are still unable to do?

    • Are there any other constraints that you foresee in the PB 2.0 layout?

  • PB Inputs and Outputs page

    • Make sure that all (sensible) types of parameters can be represented in some way on the inputs page.

    • Developing inputs sections for each type of parameter (for taxcalc, data source, static, behavior, growth, etc.)

    • Make sure that all (sensible) types of outputs can be created by the models can either displayed or downloaded with ease.

    • Make sure that there is documentation for creating deliverable outputs

Step 1b: Establish PolicyBrain-Standards for modeling project APIs

Next, we want to build a well-defined API such that if the software package meets the API requirements we can stand it up on PolicyBrain using existing functionality with negligible effort and without needing the PolicyBrain team to understand or implement model details. This API was inspired by and will be modeled very closely off of the Tax-Calculator interface to PolicyBrain. Like the standards, this API will also evolve over time. We will begin to get a better idea of what form this API will take once we finish building it into OG-USA and reinstate it on PolicyBrain.

[What will this API look like? Something like https://github.com/open-source-economics/OG-USA/issues/352 ]

How we could use help from modeling-project and other contributors during this step:

  • Feedback + Implementation help

    • PB API -- We will need to work closely with the project maintainers/contributors to develop a good API, apply it to their projects, and implement bug fixes and enhancements once the API is established.

Step 2: Redesign the PolicyBrain infrastructure to accommodate alternative environments for each project and scalability

In PolicyBrain 1.5.0, the worker nodes were transitioned into Docker containers, and in the upcoming 1.6.0 release, the webapp will be deployed on Heroku as a Docker container. Making these first steps drastically increases the flexibility and reliability of PolicyBrain environments. The next step is to run each project in its own environment. One of the problems that we have encountered in the past is trying to keep all of the projects in one environment. Thus, some projects may have to run with older software to stay compatible with other projects or some projects may have to urgently upgrade in order to remain compatible with other projects. Furthermore, it is probable that we will want to stand up projects run in Julia, Fortran, Stata, or some other language that isn’t Python. Thus, finding a way to run projects in their own environments is imperative.

The infrastructure is essentially there for actually running the models in their own environments. However, the webapp needs access to certain endpoints within the projects to process and validate parameters. Thus, the current setup requires that all projects are installed in the same environment within the webapp. One approach to solving the webapp environment problem is to move all interactions with the upstream project to the project worker nodes. Then, the webapp would interact with the worker nodes more as a REST API client. The unvalidated parameters would be posted as a JSON file to the worker node API, they would be validated there, and the model would run if they are OK or it would return the warning and error messages if they are not. This approach requires substantial refactoring to decouple the webapp from most of the parameter processing code but requires little infrastructure work.

We are also exploring other solutions for streamlining the deployment process, increasing the scalability of the worker nodes, and distributing work. An intermediate step is to use the AWS Elastic Container Service (ECS) for deploying the worker nodes. The worker nodes are already Docker-ized and deployed with Docker Compose. ECS provides a nice interface for systems setup by Docker Compose and makes it easy to scale the number of EC2 instances up and down.

A reach goal is to deploy projects on highly flexible and scalable computing clusters. Systems that are built around dask are a very strong candidates right now. Dask is easy to set up and integrate into existing code. Dask allows you to parallelize many NumPy and Pandas operations in addition to your own custom functions. Many existing projects such as Pangeo already use Dask on scientific computing clusters and have open-sourced their infrastructure code. At this point, once Kubernetes and Docker are setup, a local Kubernetes cluster can be setup with only a few commands. The downside of Dask is that it really only works for projects written in Python. This is sufficient right now; however, we will have to find solutions for other languages in the future.

How we could use help from model project contributors during this step:

  • Environment build -- has the environment for running your model been correctly configured?

    • Do we have your data sets in the worker nodes?

    • Are all packages installed?

    • Advice in setting up dask (or some other parallelization package if absolutely necessary)

    • Advice and templates in creating a Dockerfile for your project

  • Are projects interested in setting up a cluster for testing/development purposes? This may be useful if projects are interested in testing their model on different sets of hardware or on existing hardware after changes that may affect resource usage.

Conclusion

We hope that this work on PolicyBrain 2.0 will be beneficial to both modeling project contributors and model users on the PolicyBrain platform. We would be very grateful for your help in achieving those outcomes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant