Data Science Apps with Anaconda
=======================
**[PyCon 2017 Workshop](https://us.pycon.org/2017/schedule/presentation/792/), Portland OR** *2017-05-18*
<br>

<center>
<img src=http://ijstokes-public.s3.amazonaws.com/dspyr/img/AnacondaCIO_Logo width=400 />
</center>

This builds on a great blog post by the Anaconda product managers from March 2017:

**[Data Science Project Encapsulation and Deployment](https://www.continuum.io/blog/developer-blog/%E2%80%8Banaconda-project-data-science-project-encapsulation-deployment)**

Description
-----------
Anaconda provides a rich foundation of Python and R packages for data science. This tutorial will demonstrate how Anaconda can be used to turn simple models, scripts, or Jupyter notebooks into deployable applications. Participants should have Anaconda installed and have basic Python programming experience. We'll make use of machine learning and AI libraries such as Pandas, Scikit-learn, Tensorflow, and Keras. The tutorial will also demonstrate the app deployment capabilities of Anaconda Cloud.

Presenter
--------
Ian Stokes-Rees [ijstokes@continuum.io](mailto:ijstokes@continuum.io)
* Twitter: [@ijstokes](http://twitter.com/ijstokes)
* About.Me: [http://about.me/ijstokes](http://about.me/ijstokes)
* LinkedIn: [http://linkedin.com/in/ijstokes](http://linkedin.com/in/ijstokes)

Assets and Reference
-------------------
This presentation:
* Anaconda Cloud: https://anaconda.org/ijstokes/data-science-apps-with-anaconda/notebook
* GitHub: https://github.com/ijstokes/pycon2017-anaconda-project-data-science-apps

The material is based on the BSD-3 open source Anaconda Project, which is included in the Anaconda Distribution:
* Docs: http://anaconda-project.readthedocs.io/
* GitHub: https://github.com/Anaconda-Platform/anaconda-project
* Support Options: https://www.continuum.io/anaconda-support

Setup
-----
* [Download Anaconda 4.3 for Python 3.6](http://continuum.io/downloads)
* Start *Anaconda Navigator* from your desktop
    * green ouroboros ring -- yes, that's what it is
* Make sure it says *Python 3.6* in the top row
    * if not, click on *Environment* then *New Environment* and create one called `ana43py36` using Python 3.6
    * then pick *Not Installed* from the drop-down
    * then search for *Anaconda* and install that meta-package with that name
    * **but** note that this will kill the event APs: 300+ MB to do this.

* Launch Notebook (may require "upgrade" or "install" first)
* Alternative for command line geeks:

```bash
conda create -n ana43py36 anaconda python=3.6
source activate ana43py36
mkdir python-pc17-dsapps
cd python-pc17-dsapps
```

Data Viz App: OHLC/MACD Financial Market Data
-------------------------------------------
We're going to start by looking at a simple stand-alone visualization app:

<center>
<img src=http://ijstokes-public.s3.amazonaws.com/cio/img/streaming_ohlc_animation.gif width=400/>
</center>

To run this on your own system you'll need to download it from [Anaconda Cloud](http://anaconda.org) (no account required):

[https://anaconda.org/ijstokes/project/streaming_ohlc_data](https://anaconda.org/ijstokes/project/streaming_ohlc_data)

**NOTE:** When you download this you need to make sure the file name retains the `.tar.bz2` extension (some browsers will fight you on this and end up with **just** `.bz2` or `.tar`).

Then start up your terminal, navigate to the directory where you put it (`python-pc17-dsapps` would be a good choice!) and execute:

```bash
anaconda-project unarchive streaming_ohlc_data.tar.bz2
cd streaming_ohlc_data
anaconda-project run
```

This mini data viz app is simulating streaming Open, High, Low, Close (OHLC) data for a fictional market-traded asset (e.g. stock in a publicly traded company).  It includes a [candlestick chart](https://en.wikipedia.org/wiki/Candlestick_chart) with a moving average overlayed (orange line) and a separate plot that illustrates the moving average "momentum" or "convergence" (MACD).  If you're interested you can [read more about it here.](http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:macd-histogram)

So What?
--------
Why do you care about this?  Because in a single 2.6 KB file you got all the pieces necessary to run a simple data visualization app.

Disecting the Streaming OHLC Data Viz App
---------------------------------------
Take a look at the content of the `streaming_ohlc_data` directory.

* **`anaconda-project.yml`** contains the Anaconda Project specification
* **`main.py`** less than 100 lines, this contains 100% of the code required to run the app
    * Any other app-specific code (packages, scripts, etc.) could be included in the project directory
    * `main.py` is the **required** name for the runnable Python script that creates a [Bokeh](http://bokeh.pydata.org) App
    * Anaconda Project supports other app types that we'll look at soon!    
* **`theme.yml`** defines [Bokeh plot style properties](http://bokeh.pydata.org/en/latest/docs/reference/themes.html)
    * it is picked up automatically by the [Bokeh App Server](http://bokeh.pydata.org/en/latest/docs/user_guide/server.html#directory-format) that Anaconda Project will start

* **`README.md`** provides some documentation on the app in [Markdown format](https://guides.github.com/features/mastering-markdown/)
    * this file is optional
    * `README.txt` would be fine as a non-Markdown version
* **`envs`** the directory that contains the [conda environments](https://conda.io/docs/using/envs.html) (software sandboxes) associated with the app
    * the `defaults` sub-directory inside this is the main sandbox that will be used to run the app
    * the software specification for the conda environment is given in the `packages` section of the `anaconda-project.yml` file

More than Python: Shiny App in R
------------------------------
Anaconda project supports more than just Python-based apps.  It provides a generic model for app creation, sharing, deployment, and execution.

**For now, just watch for this example.** It requires adding the R packages to Anaconda, and if everyone in the workshop does this it will sink the event wifi.

But First, Histograms in R
-------------------------

**server.R:**
```
shinyServer(function(input, output) {

  output$distPlot <- renderPlot({

    # generate an rnorm distribution and plot it
    dist <- rnorm(input$obs)
    hist(dist, breaks=input$breaks)
  })
})
```

And Second, Slider Widgets in Shiny
---------------------------------
**ui.R:**
```
shinyUI(pageWithSidebar(
  # Application title
  headerPanel("Interactive Histogram R Shiny App"),

  # Sidebar with a slider input for number of observations
  sidebarPanel(
    sliderInput("obs", "Number of observations:",
                 min=10, max=5000, value=500),
    sliderInput("breaks", "Bins:",
                 min=3, max =50, value=20)
  ),
  # Show a plot of the generated distribution
  mainPanel(
    plotOutput("distPlot")
  )
))
```

Vanilla Shiny App
----------------
Where **`hist_app_simple.R`** connects those two pieces (`server.R` and `ui.R`):

```
library(shiny)

runApp(appDir         = "hist", 
       host           = '127.0.0.1',
       port           = 8086, 
       launch.browser = TRUE, 
       quiet          = FALSE)
```

And this is run from the command line with:

```bash
Rscript hist_app_simple.R
```

Anaconda Project Wrapper
------------------------
Provides execution details and conda environment specification:

**`anaconda-project.yml:`**
```yaml
name: hist_shiny_app
description: An example R Shiny histogram app

commands:
  default:
    unix: Rscript hist_app.R
    supports_http_options: true
```

Anaconda Project Wrapper (2)
--------------------------
Part deux, showing the conda environment for R Shiny to work.

**`anaconda-project.yml:`** *(cont)*
```yaml
channels:
  - r

packages:
  - r=3.3.2
  - r-base=3.3.2
  - r-shiny=0.14.2
  - r-proto=1.0.0
  - r-rjson=0.2.15
  - r-argparse=1.0.4
  - r-devtools=1.12.0
  - r-findpython=1.0.1
  - r-getopt=1.20.0
  - gettext=0.19.8
```

Putting It All Together *(and why conda matters)*
----------------------
```bash
anaconda-project run
```
Will create a conda environment software sandbox for R, Shiny, and the required R packages, start the Shiny server, and then open a browser window to the app instance.

<center>
<img src=http://ijstokes-public.s3.amazonaws.com/cio/img/hist_shiny_app.png width=400/>
</center>

**NOTICE:** This is why conda is important for data science: the data science world is bigger than Python, but in most cases a VM or container is over-kill.

Create an Anaconda Project Package
--------------------------------
In order to share the Anaconda Project we need to create a project package using the `archive` sub-command:
```bash
$ anaconda-project archive     \
    --directory hist_shiny_app \
    hist_shiny_app.zip
    
  added hist_shiny_app/README.md
  added hist_shiny_app/anaconda-project.yml
  added hist_shiny_app/hist/server.R
  added hist_shiny_app/hist/ui.R
  added hist_shiny_app/hist_app.R
  added hist_shiny_app/hist_app_simple.R
Created project archive hist_shiny_app.zip
```
This is tiny: 1.4 KB.  This can now be shared with anyone: by email, on a web server, a network file system, or ...

Publish and Share Anaconda Projects via Anaconda Cloud
---------------------------------------------------
You can use Anaconda Cloud (after doing `anaconda login`) to publish and share your Anaconda Project by doing:
```bash
$ anaconda-project upload hist_shiny_app
```

This will create the project archive package and upload it to your account on Anaconda Cloud.  Anaconda Enterprise users will have that archive published to their organization-internal repository.

[https://anaconda.org/ijstokes/project/hist_shiny_app](https://anaconda.org/ijstokes/project/hist_shiny_app)

Jupyter Notebook Apps with Anaconda Project
-----------------------------------------
We've now seen two of the three application types currently supported by Anaconda Project:
* *Bokeh apps*
* *command line apps*

We've seen how Anaconda Project is not Python specific and can encapsulate the software, configuration, and details of the conda environment necessary to share and execute the project anywhere.  The third application type is:
* *Jupyter notebook*

Markowitz Portfolio Design via Anaconda Project Jupyter Notebook App
----------------------------------------------------------------
Ever used Jupyter to create a great Notebook that tells a compelling data science story?

And then struggled to find a good way to quickly get that to a colleague so they can reproduce your work?

[https://anaconda.org/ijstokes/markowitz](https://anaconda.org/ijstokes/markowitz)

Take 5 minutes and try to download this Anaconda Project Jupyter Notebook App:

[https://anaconda.org/ijstokes/project/markowitz_notebook](https://anaconda.org/ijstokes/project/markowitz_notebook)

Once you've got it you need to do:

```bash
anaconda-project unarchive markowitz_notebook.tar.bz2
cd markowitz_notebook
anaconda-project run
```