Skip to content
This repository was archived by the owner on Aug 25, 2024. It is now read-only.

Commit 8a811e1

Browse files
committed
docs: Improve homepage and plugins
Signed-off-by: John Andersen <johnandersenpdx@gmail.com>
1 parent 374d058 commit 8a811e1

File tree

13 files changed

+436
-22
lines changed

13 files changed

+436
-22
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2222
- Skeleton for service creation was added
2323
- Simple Linear Regression model from scratch
2424
- Community link in CONTRIBUTING.md.
25+
- Explained three main parts of DFFML on docs homepage
26+
- Documentation on how to use ML models on docs Models plugin page.
2527
### Changed
2628
- feature/codesec became it's own branch, binsec
2729
- BaseOrchestratorContext `run_operations` strict is default to true. With

docs/index.rst

Lines changed: 37 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,18 +10,32 @@ Data Flow Facilitator for Machine Learning (DFFML) provides APIs for dataset
1010
generation and storage, and model definition using any machine learning
1111
framework, from high level down to low level use is supported.
1212

13-
The goal of DFFML is to build a community driven library of plugins for dataset
14-
generation and model definition. So that we as developers and researchers can
15-
quickly and easily plug and play various pieces of data with various model
16-
implementations.
13+
The idea of DFFML is to abstract three main parts of the machine learning
14+
workflow. So as to reduce the amount of code that gets re-writen when applying
15+
machine learning to a new problem.
1716

18-
The more we build up the library of plugins (which anyone can maintain, they
19-
don't have to be contributed upstream unless you want to) the more variations on
20-
model implementations, feature data generators, and database backend
21-
abstractions, we all have to work with.
17+
It's an object oriented approach involving three main classes.
18+
19+
- ``Source`` classes handle the storage of datasets, saving and loading them
20+
from files, databases, remote APIs, etc.
21+
22+
23+
- ``Model`` classes handle implementations of machine learning algorithms. They
24+
most likely implement them using a machine learning framework. DFFML is not a
25+
machine learning library like PyTorch or TensorFlow. It's higher level than
26+
those. Because of this, you most likely you won't have to write any code to
27+
start doing machine learning. If you want to fine tune a model or create your
28+
own specific implementation, all you need to do is subclass from ``Model``.
29+
30+
31+
- To get started with machine learning right away, head over to
32+
:ref:`plugin_models`.
33+
34+
35+
- ``OperationImplementation`` classes are akin to micro services, wrapped in
36+
a data flow architecture. More information on these can be found in the Data
37+
Flow usage example. *The data flow portion of the API is less mature*.
2238

23-
Right now we've released a wrapper around the Tensorflow DNN estimator, and a
24-
set of feature generators which gather data from git repositories.
2539

2640
.. toctree::
2741
:glob:
@@ -34,6 +48,19 @@ set of feature generators which gather data from git repositories.
3448
plugins/index
3549
api/index
3650

51+
The goal of DFFML is to build a community driven library of plugins for dataset
52+
generation and model definition. So that we as developers and researchers can
53+
quickly and easily plug and play various pieces of data with various model
54+
implementations.
55+
56+
The more we build up the library of plugins (which anyone can maintain, they
57+
don't have to be contributed upstream unless you want to) the more variations on
58+
model implementations, feature data generators, and database backend
59+
abstractions, we all have to work with.
60+
61+
Right now we've released a wrapper around the Tensorflow DNN classifier, a
62+
simple linear regression estimator, and a set of operations which gather data
63+
from git repositories.
3764

3865
Indices and tables
3966
==================

docs/plugins/dffml_model.rst

Lines changed: 175 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _plugin_models:
2+
13
Models
24
======
35

@@ -7,6 +9,11 @@ abstract the usage of machine learning models.
79
dffml_model_tensorflow
810
----------------------
911

12+
.. code-block:: console
13+
14+
pip install dffml-model-tensorflow
15+
16+
1017
tfdnnc
1118
~~~~~~
1219

@@ -15,6 +22,93 @@ tfdnnc
1522
Implemented using Tensorflow's DNNClassifier. Models are saved under the
1623
``directory`` in subdirectories named after the hash of their feature names.
1724

25+
.. code-block:: console
26+
27+
$ wget http://download.tensorflow.org/data/iris_training.csv
28+
$ wget http://download.tensorflow.org/data/iris_test.csv
29+
$ head iris_training.csv
30+
$ sed -i 's/.*setosa,versicolor,virginica/SepalLength,SepalWidth,PetalLength,PetalWidth,classification/g' *.csv
31+
$ head iris_training.csv
32+
$ dffml train \
33+
-model tfdnnc \
34+
-model-epochs 3000 \
35+
-model-steps 20000 \
36+
-model-classification classification \
37+
-model-classifications 0 1 2 \
38+
-model-clstype int \
39+
-sources iris=csv \
40+
-source-filename iris_training.csv \
41+
-features \
42+
def:SepalLength:float:1 \
43+
def:SepalWidth:float:1 \
44+
def:PetalLength:float:1 \
45+
def:PetalWidth:float:1 \
46+
-log debug
47+
... lots of output ...
48+
$ dffml accuracy \
49+
-model tfdnnc \
50+
-model-classification classification \
51+
-model-classifications 0 1 2 \
52+
-model-clstype int \
53+
-sources iris=csv \
54+
-source-filename iris_test.csv \
55+
-features \
56+
def:SepalLength:float:1 \
57+
def:SepalWidth:float:1 \
58+
def:PetalLength:float:1 \
59+
def:PetalWidth:float:1 \
60+
-log critical
61+
0.99996233782
62+
$ dffml predict all \
63+
-model tfdnnc \
64+
-model-classification classification \
65+
-model-classifications 0 1 2 \
66+
-model-clstype int \
67+
-sources iris=csv \
68+
-source-filename iris_test.csv \
69+
-features \
70+
def:SepalLength:float:1 \
71+
def:SepalWidth:float:1 \
72+
def:PetalLength:float:1 \
73+
def:PetalWidth:float:1 \
74+
-caching \
75+
-log critical \
76+
> results.json
77+
$ head -n 33 results.json
78+
[
79+
{
80+
"extra": {},
81+
"features": {
82+
"PetalLength": 4.2,
83+
"PetalWidth": 1.5,
84+
"SepalLength": 5.9,
85+
"SepalWidth": 3.0,
86+
"classification": 1
87+
},
88+
"last_updated": "2019-07-31T02:00:12Z",
89+
"prediction": {
90+
"confidence": 0.9999997615814209,
91+
"value": 1
92+
},
93+
"src_url": "0"
94+
},
95+
{
96+
"extra": {},
97+
"features": {
98+
"PetalLength": 5.4,
99+
"PetalWidth": 2.1,
100+
"SepalLength": 6.9,
101+
"SepalWidth": 3.1,
102+
"classification": 2
103+
},
104+
"last_updated": "2019-07-31T02:00:12Z",
105+
"prediction": {
106+
"confidence": 0.9999984502792358,
107+
"value": 2
108+
},
109+
"src_url": "1"
110+
},
111+
18112
**Args**
19113

20114
- directory: String
@@ -48,4 +142,84 @@ Implemented using Tensorflow's DNNClassifier. Models are saved under the
48142
- clstype: locate
49143

50144
- default: <class 'str'>
51-
- Data type of classifications values (default: str)
145+
- Data type of classifications values (default: str)
146+
147+
dffml_model_scratch
148+
-------------------
149+
150+
.. code-block:: console
151+
152+
pip install dffml-model-scratch
153+
154+
155+
scratchslr
156+
~~~~~~~~~~
157+
158+
*Core*
159+
160+
Simple Linear Regression Model for 2 variables implemented from scratch.
161+
Models are saved under the ``directory`` in subdirectories named after the
162+
hash of their feature names.
163+
164+
.. code-block:: console
165+
166+
$ cat > dataset.csv << EOF
167+
Years,Salary
168+
1,40
169+
2,50
170+
3,60
171+
4,70
172+
5,80
173+
EOF
174+
$ dffml train \
175+
-model scratchslr \
176+
-features def:Years:int:1 \
177+
-model-predict Salary \
178+
-sources f=csv \
179+
-source-filename dataset.csv \
180+
-source-readonly \
181+
-log debug
182+
$ dffml accuracy \
183+
-model scratchslr \
184+
-features def:Years:int:1 \
185+
-model-predict Salary \
186+
-sources f=csv \
187+
-source-filename dataset.csv \
188+
-source-readonly \
189+
-log debug
190+
1.0
191+
$ echo -e 'Years,Salary\n6,0\n' | \
192+
dffml predict all \
193+
-model scratchslr \
194+
-features def:Years:int:1 \
195+
-model-predict Salary \
196+
-sources f=csv \
197+
-source-filename /dev/stdin \
198+
-source-readonly \
199+
-log debug
200+
[
201+
{
202+
"extra": {},
203+
"features": {
204+
"Salary": 0,
205+
"Years": 6
206+
},
207+
"last_updated": "2019-07-19T09:46:45Z",
208+
"prediction": {
209+
"confidence": 1.0,
210+
"value": 90.0
211+
},
212+
"src_url": "0"
213+
}
214+
]
215+
216+
**Args**
217+
218+
- directory: String
219+
220+
- default: /home/user/.cache/dffml/scratch
221+
- Directory where state should be saved
222+
223+
- predict: String
224+
225+
- Label or the value to be predicted

docs/plugins/dffml_operation_implementation.rst

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,11 @@ which could do anything, make HTTP requests, do inference, etc.
88
dffml
99
-----
1010

11+
.. code-block:: console
12+
13+
pip install dffml
14+
15+
1116
associate
1217
~~~~~~~~~
1318

@@ -72,6 +77,11 @@ No description
7277
dffml_feature_git
7378
-----------------
7479

80+
.. code-block:: console
81+
82+
pip install dffml-feature-git
83+
84+
7585
check_if_valid_git_repository_URL
7686
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7787

@@ -353,8 +363,13 @@ No description
353363

354364
- work: work_spread(type: int)
355365

356-
dffml_feature_codesec
357-
---------------------
366+
dffml_operations_binsec
367+
-----------------------
368+
369+
.. code-block:: console
370+
371+
pip install dffml-operations-binsec
372+
358373
359374
cleanup_rpm
360375
~~~~~~~~~~~
@@ -470,6 +485,11 @@ No description
470485
dffml_feature_auth
471486
------------------
472487

488+
.. code-block:: console
489+
490+
pip install dffml-feature-auth
491+
492+
473493
scrypt
474494
~~~~~~
475495

docs/plugins/dffml_service_cli.rst

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,37 @@ Services. It also helps developers hack on DFFML itself.
1414
1515
$ dffml service -h
1616
17+
You can create a new python package and start implementing a new plugin for
18+
DFFML right away with the ``create`` command of ``dev``.
19+
20+
.. code-block:: console
21+
22+
$ dffml service dev create model cool-ml-model
23+
$ cd cool-ml-model
24+
$ python setup.py test
25+
26+
When you're done you can upload it to PyPi and it'll be ``pip`` installable so
27+
that other DFFML users can use it in their code or via the CLI. If you don't
28+
want to mess with uploading to ``PyPi``, you can install it from your git repo
29+
(wherever it may be that you upload it to).
30+
31+
.. code-block:: console
32+
33+
$ python -m pip install -U git+https://github.com/user/cool-ml-model
34+
35+
Make sure to look in ``setup.py`` and edit the ``entry_points`` to match
36+
whatever you've edited. This way whatever you make will be usable by others
37+
within the DFFML CLI (eventually HTTP API and others) as soon as they ``pip``
38+
install your package, nothing else required.
39+
1740
dffml
1841
-----
1942

43+
.. code-block:: console
44+
45+
pip install dffml
46+
47+
2048
dev
2149
~~~
2250

docs/plugins/dffml_source.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@ abstract the loading and storage of data / datasets.
77
dffml
88
-----
99

10+
.. code-block:: console
11+
12+
pip install dffml
13+
14+
1015
csv
1116
~~~
1217

0 commit comments

Comments
 (0)