Skip to content
This repository was archived by the owner on Jun 23, 2018. It is now read-only.

Conversation

@lanthias
Copy link
Contributor

What does this PR do?

I've added another section in quick_start to:

  • Build a more complex module (a classifier that uses some system libraries)
  • Introduced new sources and sinks
  • Introduce partial workflows
  • Introduce workflow modules

Notes

  • It doesn't yet include more complex types
  • It is badly organised and needs to be split into separate files

This was referenced Mar 1, 2017
Building NStack Container module irisclassify. Please wait. This may take some time.
Module irisclassify built successfully. Use `nstack list methods` to see all available methods.

We can see our method, ``irisclassify.predict``. Including our ``demo.numChars`` method from the previous tutorial, we should now have two:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line reads oddly - presumably you mean something like:

We can now see irisclassify.predict in the list of existing methods (along with previously built methods like demo.numChars) by running the suggested command nstack list methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you

.. code :: bash

~/irisclassify/ $ curl -X PUT -d '{ "params" : [4.7, 1.4, 6.1, 2.9] }' localhost:8080/irisendpoint
Success
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Success we now see Msg Accepted here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

~/irisclassify/ $ nstack log 2
Feb 17 10:32:30 nostromo nstack-server[8925]: OUTPUT: "Iris-versicolor"

Great! Our classifiier is now productionised.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in classifiier


Great! Our classifiier is now productionised.

Other Sources and Sinks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a better header here would be Connecting to a Database - it's more user-story focused, perhaps more compelling?


~/irisworkflow/ $ nstack start irisworkflow.completeWorkflow

This paradigm can be helpful when we apply it to sources and sinks. Oftentimes, you -- or someone else in your company -- will want to create sources and sinks which are combined with modules, for instance in the following fictional example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should build a practical example here that someone can go through, for defining sources and sinks?

E.g. define a reusable http endpoint:

def irisEndpoint = source(http:///irisclassify : (Double,Double,Double,Double))

Which they can then use and test? Or we stick with Postgres but make sure we actually have a test db on the server they can use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the former, I think we should mention the latter as well though.

@lanthias
Copy link
Contributor Author

The last more complex example should probably be split into sections IMO

.. code :: python

def __init__(self):
train = pd.read_csv("train.csv")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even our examples escape our type system!

Not a criticism as it seems like its actually the main use case

Advanced Tutorial
******************

In this section, we're going to productionise a Random Forest classifier written with `sklearn`, deploy it to the cloud, and use it in a more sophisticated workflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to what sklearn is (and probably likewise with random forest)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really really have to use "productionise"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the former. On the latter, it's quite a common term but is a bit clunky. Do you prefer operationalise?

~/irisclassify/ $ nstack init python
python module 'irisclassify' successfully initialised at ~/irisclassify

Next, let's download our training data into this so we can use it in our module.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

into this -> into this directory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or just remove it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Defining our API
****************

As we know what the input and output of our classifier is going to look like, let's edit the ``api`` section of ``nstack.yaml`` to define our API (i.e. the entry-point into our module). By default a new module contains a sample function ``numChars``, which we can replace with our definition. We're going to call the function we write in Python ``predict``, which means we can fill in the ``api`` section of ``nstack.yaml`` as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can fill in -> we fill in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


.. note :: Python modules must also import ``nstack``

Before we add our ``predict`` function, we're going to add ``__init__``, the Python contructor function which runs upon the creation of our module. It's going to load our data from ``train.csv``, and use it to train our Random Forest classifier:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contructor -> constructor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


.. code :: yaml

packages: ['numpy', 'python3-scikit-learn.x86_64', 'scipy', 'python3-scikit-image.x86_64', 'python3-pandas.x86_64']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sucks that we have to specify architecture for some of these - is that definitely the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not - can it be left out typically with dnf?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

Building NStack Container module irisclassify. Please wait. This may take some time.
Module irisclassify built successfully. Use `nstack list functions` to see all available functions.

We can now see ``irisclassify.predict`` in the list of existing functions (along with previously built functions like demo.numChars) by running the suggested command nstack list functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demo.numChars -> demo.numChars perhaps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by running the suggested command

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

irisclassify.predict : (Double, Double, Double, Double) -> Text
demo.numChars : Text -> Integer

Our classifier is now published, but to use it we need to connect it to an event-source and sink. In the previous tutorial, we used HTTP as a source, and the NStack log as a sink. We can do the same here by starting the following workflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an event source and a sink

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


.. code :: bash

module irsiworkflow {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

irsi -> iris

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch..

Partial workflows
*****************

All of the workflows that we have written so far have been `fully composed`, which means that they contain a source, one or more functions, and a sink. Many times, you want to split up sources, sinks, and functions into separate pieces you can share and reuse. In this case, we say that a workflow is `partially composed`, which just means it does not contain a source, one or more functions, and a sink. These workflows cannot be ``start``\ed by themselves, but can be shared and attached to other sources, sinks, or functions to become `fully composed`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I create a workflow that's just a source and a sink?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a common theme for data integration.

Copy link
Contributor

@jonboulle jonboulle Mar 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so "one or more functions" should be "zero or more functions"? (but would suggest tweaking the wording)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good catch

Copy link
Contributor

@jonboulle jonboulle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good for now (please squash)

@lanthias
Copy link
Contributor Author

lanthias commented Mar 23, 2017

It's still probably incomplete at then end (it doesn't really have a conclusion), and it stills needs to be changed to use the newest UX syntax on modules and DSL.

Copy link
Contributor

@rjmk rjmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looked at the first file so far


.. code:: bash

~/ $ mkdir Irisclassify; cd Irisclassify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to follow the directory structure of the examples, this should be Iris.Classify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll update everything to use this format.


.. code:: bash

~/Irisclassify/ $ curl -O https://raw.githubusercontent.com/nstackcom/nstack-examples/master/iris/irisclassify/train.csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api : |
predict : (Double, Double, Double, Double) -> Text

This means we are exposing a single function ``predict``, which takes a record of four ``Double``\s (the measurements) and returns ``Text`` (the iris species).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be kind of cool to show off sum types here. How many species are there?


.. code :: bash

~/Irisclassify/ $ nstack start "sources.http<(Double, Double, Double, Double)> { http_path : "/irisendpoint" } | irisclassify.predict | sinks.log<Text>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sadly still not possible. You need a complete workflow for nstack start which could be referenced unqualified. The current example has a workflow and a project yaml, which could be recreated?

@lanthias lanthias changed the title [WIP] More complex quick start examples to introduce other concepts More complex quick start examples to introduce other concepts Mar 29, 2017
@lanthias
Copy link
Contributor Author

This is now ready for review to merge.

@mands mands merged commit 83f9a18 into master Mar 31, 2017
@mands mands deleted the moreComplexQuickStart branch March 31, 2017 15:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants