Building ML Powered Applications
This repository consists of three parts:
A set of Jupyter notebooks in the
notebookfolder serve to illustrate concepts covered in the book.
A library in the
ml_editorfolder contains core functions for the book's case study example, a Machine Learning driven writing assistant.
A Flask app demonstrates a simple way to serve results to users
images/bmlpa_figuresfolder contains reproductions of a few figures which were hard to read in the first print version.
Credit and thanks go to Bruno Guisard who conducted a thorough review of the code in this repository.
This repository has been tested on Python 3.6 and 3.7. It aims to support any Python 3 version.
To setup, start by cloning the repository:
git clone https://github.com/hundredblocks/ml-powered-applications.git
Then, navigate to the repository and create a python virtual environment using virtualenv:
You can then activate it by running:
Then, install project requirements by using:
pip install -r requirements.txt
The library uses a few models from spacy. To download the small and large English model (required to run the app and the notebooks), run these commands from a terminal with your virtualenv activated:
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_lg
Finally, the notebooks and library leverage the
The package comes with a set of resources that need to be individually downloaded.
To do so, open a Python session in an activated virtual environment, import
nltk, and download the required resource.
Here is an example of how to do this for the
punkt package from an active
virtual environment with
The notebook folder contains usage examples for concepts covered in the book. Most of the examples only use one of the subfolders in archive (the one that contains data for writers.stackexchange.com).
I've included a processed version of the data as a
.csv for convenience.
If you wanted to generate this data yourself, or generate it for another subfolder, you should:
Download a subfolder from the stackoverflow archives
parse_xml_to_csvto convert it to a DataFrame
generate_model_text_featuresto generate a DataFrames with precomputed features
The notebooks belong to a few categories of concepts, described below.
Data Exploration and Transformation
- Dataset Exploration
- Splitting Data
- Vectorizing Text
- Clustering Data
- Tabular Data Vectorization
- Exploring Data To Generate Features
Initial Model Training and Performance Analysis
Improving the Model
Generating Suggestions from Models
You can train and save models using the notebooks in the
For convenience, I've included three trained models and two vectorizers,
serialized in the
These models are loaded by notebooks demonstrating methods to compare model
results, as well as in the flask app.
Running the prototype Flask app
To run the app, simply navigate to the root of the repository and run:
FLASK_APP=app.py flask run
The above command should spin up a local web-app you can access at
If you have any questions or encounter any roadblocks, please feel free to open an issue or email me at firstname.lastname@example.org.
Project structure inspired by the great Cookiecutter Data Science.