RobotClassify allows for non-data scientists such as citizen developers and other operational people involved with analyzing and reporting on business data. The goal is to automate the entire ML process (feature-engineering, training, prediction).
This version of the app is optimized for loading data files to train with, and test files for predictions. Prediction files are optimized for submission in Kaggle competitions. Currently, we only support Machine Learning classification problems. The Machine Learning component is based upon mlLib, a library that I created, put into code techniques I have learning during my ML studies.
My motivation for RobotClassify centers around my interest in making machine learning accessible for citizen developers. Taking the complicated task of feature engineering, model selection, and training and making it a simple point and click exercise without any prior machine learning training.
Using RobotClassify requires four simple steps that can all be accomplished via the RobotClassify.herokuapp.com.
- Load a CSV data file. This is done by creating a project and specifying the training and test files (examples are found in the examples folder)
- Create a Run. The run record defines the file attributes and the nature of the training. For this, we need to specify:
- The target variable that is to be predicted
- Record Key column
- Predict set out. These are the columns that are used to create the predict file in a format that can be used to submit the test results in a Kaggle competition
- Classification model to train
- Scoring method
- Algorithm type (There are two approaches used to automate feature engineering)
- Run the training
- Review the results
RobotClassify can be accessed from the URL: https://robotclassify.herokuapp.com/.
The web interface provides a 4 step approach to completing training and getting a result:
- Load the training and test files by creating a project
- Create a run record. The run record describes the test attributes
- Run the training
- Download the results file from the predictions
For example, the Titanic Kaggle competition (https://www.kaggle.com/c/titanic) provides two data sets, the training set and a test set. Loading these into RobotClassify, we would set the run parameters as follows:
- Target Variable: Survived
- Record Key: PassengerID
- Predict set out: Survived, PassengerID
- Classification model: xgbc
- Scoring method: f1
- Use Algorithm I for feature engineering: True
Following these instructions will give a training result that would put you in the top 8% of competitors.
The application was written with Flask as the backend and Flask What-the-forms for the frontend.
DISABLED FOR NOW - ALL PERMISSIONS AVAILABLE FOR ALL USERS.fit
There are two roles:
- Viewer Role: Viewers can only view projects, runs, and their results.
- Editor Role: Editors can create projects, runs, and perform training
permissions | Editor | Viewer | Description |
---|---|---|---|
get:project | Yes | Yes | get a single, or list of projects |
post:project | Yes | Create a new project or search | |
patch:project | Yes | Update a project attributes | |
delete:project | Yes | Delete a project and its runs | |
get:run | Yes | Yes | Get a run or download run results |
post:run | Yes | Create a new run | |
patch:run | Yes | Update a run's attributes | |
delete:run | Yes | Delete a run | |
get:train | Yes | Run ML Training |
The following APIs endpoints are available. Detailed HTML documentation on these end points, including this file, can be found at https://robotclassify.herokuapp.com/docs/index.html
These are the end-points, with the short description and role.
-- Home Page --
- GET / (home)
-- Documentation Page --
- GET /docs/index.html
--- Projects ---
- GET /projects (List all projects) - get:project
- GET /projects/int:project_id (List a single project) - get:project
- POST/GET /projects/create (create a new project) - post:project
- PATCH /projects/int:project_id/edit (edit a project) - patch:project
- DELETE /projects/<project_id>/delete (Delete a project) - delete:project
--- Runs ---
- GET /runs/int:run_id (Display a run results) - get:run
- GET/POST /runs/create/int:project_id (Create a run) - get:post
- DELETE /runs/int:run_id/delete (Delete a run) - delete:post
- PATCH /run/int:run_id/edit (edit a run) - patch:run
--- Train ---
- GET /train/int:run_id (run ML training for a run) get:train
- GET /train/int:run_id/download (download testing results file, kaggle file) get:run
RobotClassify source is loacted at: https://github.com/scottrsmith/RobotClassify
This project uses python 3.7
To Install Python
Once you have your virtual environment setup and running, install dependencies by navigating to the root directory and running:
pip install -r requirements.txt
This will install all of the required packages we selected within the requirements.txt
file.
-
Flask is a lightweight backend microservices framework.
-
SQLAlchemy is the Python SQL toolkit and ORM.
-
Flask-CORS is the extension used to handle cross-origin requests from the frontend server.
-
Auth0 Provides authentication and authorization as a service
-
Postgres Postgres SQL database
-
Heroku App Hosting
-
Flask-WTF Flask What-the-forms
-
mlLib Machine Learning Training lib. Included in robot classify
-
InitTest Test automation for Python
-
FlaskMigrate Manages SQLAlchemy database migrations for Flask applications using Alembic
-
scikit-learn Simple and efficient tools for predictive data analysis
The UnitTest is running Postgres SQL as the local souce database.
How to start/stop: https://stackoverflow.com/questions/7975556/how-to-start-postgresql-server-on-mac-os-x
On a local machine, from within the root
directory to run the server, execute dev.sh
Live documentation, including this readme, can be found at https://robotclassify.herokuapp.com/docs/index.html
The PDF version of the documentation is located in the root project directory. Named robotclassify.pdf
Documentation is generated with Sphinx.
To install Sphinx, reference the documents at https://www.sphinx-doc.org/en/master/usage/installation.html
Documentation is generated with Sphinx. Use docs.sh
in the docs folder to generate the documentation.
Generated docs are located at https://robotclassify.herokuapp.com/docs/index.html
Errors are returned as JSON objects in the following format:
{
"success": False,
"error": 401,
"message": "Premission Error"
"description": "401: Authorization header is expected."
}
The API returns multiple error types when requests fail:
- 400: Bad Request
- 401: Permission Error
- 404: Resource Not Found
- 405: Method Not Allowed
- 422: Not Processable
- 500: Server Error
Testing is done with UnitTest and curl. UnitTest is set up to create and use a local Postgres database while Curl is set up to run commands against the
- Flask Sessions are maintained between REST Calls for Web-based use of the API. The implementation is based upon Flask Sessions and the quickstart example app from Auth0 for Web applications.
- CSRF protection is disabled for certain REST calls to facilitate testing via CuRL.
- Patch and Delete functions are only available via API calls
- UnitTest uses a local Postgres database
- UnitTest uses Auth0 API App credentials (verses using Auth0 Web App quickstart code) Auth0 Management API (Test Application)
- Tokens in the headers are used for API authentication