diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 21c2b6b43a..b4d0935685 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,34 +1,83 @@ ## Interested in contributing to MMLSpark? We're excited to work with you. -### You can contribute in many ways +### You can contribute in many ways: -* Use the library and give feedback -* Report a bug -* Request a feature -* Fix a bug -* Add examples and documentation -* Code a new feature -* Review pull requests +* Use the library and give feedback: report bugs, request features. +* Add sample Jupyter notebooks, Python or Scala code examples, documentation + pages. +* Fix bugs and issues. +* Add new features, such as data transformations or machine learning algorithms. +* Review pull requests from other contributors. ### How to contribute? -You can give feedback, report bugs and request new features anytime by -opening an issue. Also, you can up-vote and comment on existing issues. +You can give feedback, report bugs and request new features anytime by opening +an issue. Also, you can up-vote or comment on existing issues. -To make a pull request into the repo, such as bug fixes, documentation -or new features, follow these steps: +If you want to add code, examples or documentation to the repository, follow +this process: -* If it's a new feature, open an issue for preliminary discussion with - us, to ensure your contribution is a good fit and doesn't duplicate +#### Propose a contribution + +* Preferably, get started by tackling existing issues to get yourself acquainted + with the library source and the process. +* Open an issue, or comment on an existing issue to discuss your contribution + and design, to ensure your contribution is a good fit and doesn't duplicate on-going work. -* Typically, you'll need to accept Microsoft Contributor Licence - Agreement (CLA). -* Familiarize yourself with coding style and guidelines. -* Fork the repository, code your contribution, and create a pull - request. -* Wait for an MMMLSpark team member to review and accept it. Be patient - as we iron out the process for a new project. - -A good way to get started contributing is to look for issues with a "help -wanted" label. These are issues that we do want to fix, but don't have -resources to work on currently. +* Any algorithm you're planning to contribute should be well known and accepted + for production use, and backed by research papers. +* Algorithms should be highly scalable and suitable for very large datasets. +* All contributions need to comply with the MIT License. Contributors external + to Microsoft need to sign CLA. + +#### Implement your contribution + +* Fork the MMLSpark repository. +* Implement your algorithm in Scala, using our wrapper generation mechanism to + produce PySpark bindings. +* Use SparkML `PipelineStage`s so your algorithm can be used as a part of + pipeline. +* For parameters use `MMLParam`s. +* Implement model saving and loading by extending SparkML `MLReadable`. +* Use good Scala style. +* Binary dependencies should be on Maven Central. +* See this [pull request](https://github.com/Azure/mmlspark/pull/22) for an + example contribution. + +#### Implement tests + +* Set up build environment. Use a Linux machine or VM (we use Ubuntu, but other + distros should work too), and install environment using the [`runme` + script](runme). +* Test your code locally. +* Add tests using ScalaTests — unit tests are required. +* A sample notebook is required as an end-to-end test. + +#### Implement documentation + +* Add a [sample Jupyter notebook](notebooks/samples) that shows the intended use + case of your algorithm, with instructions in step-by-step manner. (The same + notebook could be used for testing the code.) +* Add in-line ScalaDoc comments to your source code, to generate the [API + reference documentation](https://mmlspark.azureedge.net/docs/pyspark/) + +#### Open a pull request + +* In most cases, you should squash your commits into one. +* Open a pull request, and link it to the discussion issue you created earlier. +* An MMLSpark core team member will trigger a build to test your changes. +* Fix any build failures. (The pull request will have comments from the build + with useful links.) +* Wait for code reviews from core team members and others. +* Fix issues found in code review and re-iterate. + +#### Build and check-in + +* Wait for a core team member to merge your code in. +* Your feature will be available through a Docker image and script installation + in the next release, which typically happens around once a month. You can try + out your features sooner by using build artifacts for the version that has + your changes merged in (such versions end with a `.devN`). + +If in doubt about how to do something, see how it was done in existing code or +pull requests, and don't hesitate to ask.