Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
23 changed files
with
527 additions
and
82 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
Beginner Tutorial: Fruit Stand | ||
**Note**: This is a Pachyderm pre version 1.4 tutorial. It needs to be updated for the latest versions of Pachyderm. | ||
|
||
# Beginner Tutorial: Fruit Stand | ||
|
||
This directory contains assets for the [Beginner Tutorial on our documentation portal](http://pachyderm.readthedocs.io/en/stable/getting_started/beginner_tutorial.html). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# ML pipeline using Nervana Neon and Pachyderm | ||
|
||
![alt tag](pipeline.jpg) | ||
|
||
This machine learning pipeline integrates Nervana Neon training and inference into a production scale pipeline using Pachyderm. In particular, this pipeline trains and utilizes a model that predicts the sentiment of movie reviews, based on data from IMDB. | ||
|
||
## Deploying Pachyderm | ||
|
||
See the [Pachyderm docs](http://docs.pachyderm.io/en/latest/) for details. Note, this demo requires `pachctl` 1.4.0+. | ||
|
||
## Deploying the pipeline | ||
|
||
1. Create the necessary data "repositories": | ||
|
||
```sh | ||
$ pachctl create-repo training | ||
$ pachctl create-repo reviews | ||
``` | ||
|
||
2. Create the pipeline: | ||
|
||
```sh | ||
$ pachctl create-pipeline -f pipeline.json | ||
``` | ||
|
||
## Running model training | ||
|
||
Because we have already deployed the pipeline, the training portion of the pipeline will run as soon as data is committed to the training data repo. The training data in TSV format can be obtained [here](https://www.kaggle.com/c/word2vec-nlp-tutorial/data). | ||
|
||
```sh | ||
$ pachctl put-file training master labeledTrainData.tsv -c -f labeledTrainData.tsv | ||
``` | ||
|
||
## Running model inference | ||
|
||
Once the model is trained and a persisted version of the model is output to the `model` repo. Sentiment of movie reviews can be run by committing movie reviews to the `reviews` repository as text files. An example review file might be: | ||
|
||
``` | ||
Naturally in a film who's main themes are of mortality, nostalgia, and loss of innocence it is perhaps not surprising that it is rated more highly by older viewers than younger ones. However there is a craftsmanship and completeness to the film which anyone can enjoy. The pace is steady and constant, the characters full and engaging, the relationships and interactions natural showing that you do not need floods of tears to show emotion, screams to show fear, shouting to show dispute or violence to show anger. Naturally Joyce's short story lends the film a ready made structure as perfect as a polished diamond, but the small changes Huston makes such as the inclusion of the poem fit in neatly. It is truly a masterpiece of tact, subtlety and overwhelming beauty. | ||
``` | ||
|
||
Once this is committed to the `reviews` repo as `1.txt`: | ||
|
||
```sh | ||
$ pachctl put-file reviews master 1.txt -c -f 1.txt | ||
``` | ||
|
||
The inference stage of the pipeline will run and output results to the master branch of the `inference` repo. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
{ | ||
"pipeline": { | ||
"name": "inference" | ||
}, | ||
"transform": { | ||
"image": "dwhitena/neon-inference", | ||
"cmd": [ | ||
"python", | ||
"examples/imdb/auto_inference.py", | ||
"--model_weights", | ||
"/pfs/model/imdb.p", | ||
"--vocab_file", | ||
"/pfs/model/imdb.vocab", | ||
"--review_files", | ||
"/pfs/reviews", | ||
"--output_dir", | ||
"/pfs/out" | ||
] | ||
}, | ||
"parallelism_spec": { | ||
"strategy": "CONSTANT", | ||
"constant": "1" | ||
}, | ||
"inputs": [ | ||
{ | ||
"repo": { | ||
"name": "reviews" | ||
}, | ||
"glob": "/*" | ||
}, | ||
{ | ||
"repo": { | ||
"name": "model" | ||
}, | ||
"glob": "/" | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
FROM kaixhin/neon | ||
ADD auto_inference.py /root/neon/examples/imdb/ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
#!/usr/bin/env python | ||
""" | ||
Example that does inference on an LSTM networks for amazon review analysis | ||
$ python examples/imdb/auto_inference.py --model_weights imdb.p --vocab_file imdb.vocab | ||
--review_files /pfs/reviews --output_dir /pfs/out | ||
""" | ||
|
||
from __future__ import print_function | ||
from future import standard_library | ||
standard_library.install_aliases() # triggers E402, hence noqa below | ||
from builtins import input # noqa | ||
import numpy as np # noqa | ||
from neon.backends import gen_backend # noqa | ||
from neon.initializers import Uniform, GlorotUniform # noqa | ||
from neon.layers import LSTM, Affine, Dropout, LookupTable, RecurrentSum # noqa | ||
from neon.models import Model # noqa | ||
from neon.transforms import Logistic, Tanh, Softmax # noqa | ||
from neon.util.argparser import NeonArgparser, extract_valid_args # noqa | ||
from neon.util.compat import pickle # noqa | ||
from neon.data.text_preprocessing import clean_string # noqa | ||
import os | ||
|
||
# parse the command line arguments | ||
parser = NeonArgparser(__doc__) | ||
parser.add_argument('--model_weights', required=True, | ||
help='pickle file of trained weights') | ||
parser.add_argument('--vocab_file', required=True, | ||
help='vocabulary file') | ||
parser.add_argument('--review_files', required=True, | ||
help='directory containing reviews in text files') | ||
parser.add_argument('--output_dir', required=True, | ||
help='directory where results will be saved') | ||
args = parser.parse_args() | ||
|
||
|
||
# hyperparameters from the reference | ||
batch_size = 1 | ||
clip_gradients = True | ||
gradient_limit = 5 | ||
vocab_size = 20000 | ||
sentence_length = 128 | ||
embedding_dim = 128 | ||
hidden_size = 128 | ||
reset_cells = True | ||
num_epochs = args.epochs | ||
|
||
# setup backend | ||
be = gen_backend(**extract_valid_args(args, gen_backend)) | ||
be.bsz = 1 | ||
|
||
|
||
# define same model as in train | ||
init_glorot = GlorotUniform() | ||
init_emb = Uniform(low=-0.1 / embedding_dim, high=0.1 / embedding_dim) | ||
nclass = 2 | ||
layers = [ | ||
LookupTable(vocab_size=vocab_size, embedding_dim=embedding_dim, init=init_emb, | ||
pad_idx=0, update=True), | ||
LSTM(hidden_size, init_glorot, activation=Tanh(), | ||
gate_activation=Logistic(), reset_cells=True), | ||
RecurrentSum(), | ||
Dropout(keep=0.5), | ||
Affine(nclass, init_glorot, bias=init_glorot, activation=Softmax()) | ||
] | ||
|
||
|
||
# load the weights | ||
print("Initialized the models - ") | ||
model_new = Model(layers=layers) | ||
print("Loading the weights from {0}".format(args.model_weights)) | ||
|
||
model_new.load_params(args.model_weights) | ||
model_new.initialize(dataset=(sentence_length, batch_size)) | ||
|
||
# setup buffers before accepting reviews | ||
xdev = be.zeros((sentence_length, 1), dtype=np.int32) # bsz is 1, feature size | ||
xbuf = np.zeros((1, sentence_length), dtype=np.int32) | ||
oov = 2 | ||
start = 1 | ||
index_from = 3 | ||
pad_char = 0 | ||
vocab, rev_vocab = pickle.load(open(args.vocab_file, 'rb')) | ||
|
||
# walk over the reviews in the text files, making inferences | ||
for dirpath, dirs, files in os.walk(args.review_files): | ||
for file in files: | ||
with open(os.path.join(dirpath, file), 'r') as myfile: | ||
data=myfile.read() | ||
|
||
# clean the input | ||
tokens = clean_string(data).strip().split() | ||
|
||
# check for oov and add start | ||
sent = [len(vocab) + 1 if t not in vocab else vocab[t] for t in tokens] | ||
sent = [start] + [w + index_from for w in sent] | ||
sent = [oov if w >= vocab_size else w for w in sent] | ||
|
||
# pad sentences | ||
xbuf[:] = 0 | ||
trunc = sent[-sentence_length:] | ||
xbuf[0, -len(trunc):] = trunc | ||
xdev[:] = xbuf.T.copy() | ||
y_pred = model_new.fprop(xdev, inference=True) # inference flag dropout | ||
|
||
with open(os.path.join(args.output_dir, file), "w") as output_file: | ||
output_file.write("Pred - {0}\n".format(y_pred.get().T)) |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
{ | ||
"pipeline": { | ||
"name": "model" | ||
}, | ||
"transform": { | ||
"image": "kaixhin/neon", | ||
"cmd": [ | ||
"python", | ||
"examples/imdb/train.py", | ||
"-f", | ||
"/pfs/training/labeledTrainData.tsv", | ||
"-e", | ||
"2", | ||
"-eval", | ||
"1", | ||
"-s", | ||
"/pfs/out/imdb.p", | ||
"--vocab_file", | ||
"/pfs/out/imdb.vocab" | ||
] | ||
}, | ||
"inputs": [ | ||
{ | ||
"repo": { | ||
"name": "training" | ||
}, | ||
"glob": "/" | ||
} | ||
] | ||
} | ||
{ | ||
"pipeline": { | ||
"name": "inference" | ||
}, | ||
"transform": { | ||
"image": "dwhitena/neon-inference", | ||
"cmd": [ | ||
"python", | ||
"examples/imdb/auto_inference.py", | ||
"--model_weights", | ||
"/pfs/model/imdb.p", | ||
"--vocab_file", | ||
"/pfs/model/imdb.vocab", | ||
"--review_files", | ||
"/pfs/reviews", | ||
"--output_dir", | ||
"/pfs/out" | ||
] | ||
}, | ||
"parallelism_spec": { | ||
"strategy": "CONSTANT", | ||
"constant": "1" | ||
}, | ||
"inputs": [ | ||
{ | ||
"repo": { | ||
"name": "reviews" | ||
}, | ||
"glob": "/*" | ||
}, | ||
{ | ||
"repo": { | ||
"name": "model" | ||
}, | ||
"glob": "/" | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
{ | ||
"pipeline": { | ||
"name": "test12" | ||
}, | ||
"transform": { | ||
"image": "dwhitena/neon-inference", | ||
"cmd": [ | ||
"/bin/bash" | ||
], | ||
"stdin": [ | ||
"echo $(ls /pfs/model/) > /pfs/out/model_contents.txt", | ||
"echo $(ls /pfs/reviews/) > /pfs/out/reviews_contents.txt" | ||
] | ||
}, | ||
"parallelism_spec": { | ||
"strategy": "CONSTANT", | ||
"constant": "1" | ||
}, | ||
"inputs": [ | ||
{ | ||
"repo": { | ||
"name": "reviews" | ||
}, | ||
"glob": "/*" | ||
}, | ||
{ | ||
"repo": { | ||
"name": "model" | ||
}, | ||
"glob": "/" | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
{ | ||
"pipeline": { | ||
"name": "model" | ||
}, | ||
"transform": { | ||
"image": "kaixhin/neon", | ||
"cmd": [ | ||
"python", | ||
"examples/imdb/train.py", | ||
"-f", | ||
"/pfs/training/labeledTrainData.tsv", | ||
"-e", | ||
"2", | ||
"-eval", | ||
"1", | ||
"-s", | ||
"/pfs/out/imdb.p", | ||
"--vocab_file", | ||
"/pfs/out/imdb.vocab" | ||
] | ||
}, | ||
"inputs": [ | ||
{ | ||
"repo": { | ||
"name": "training" | ||
}, | ||
"glob": "/" | ||
} | ||
] | ||
} |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
FROM affinelayer/pix2pix-tensorflow | ||
RUN wget https://raw.githubusercontent.com/affinelayer/pix2pix-tensorflow/master/pix2pix.py | ||
RUN wget https://raw.githubusercontent.com/affinelayer/pix2pix-tensorflow/master/server/tools/process-local.py | ||
RUN wget https://raw.githubusercontent.com/affinelayer/pix2pix-tensorflow/master/tools/process.py | ||
RUN wget https://raw.githubusercontent.com/affinelayer/pix2pix-tensorflow/master/tools/tfimage.py |
Oops, something went wrong.