# JupyterLab Showcase: DevOps Intelligence

*DevOps Intelligence* turns data from software development and delivery processes into actionable insight, just like BI does for the business side. Jupyter is the ideal instrument for that, with its combination of powerful coding environments and a user interface facilitating experimentation with ultra-short feedback cycles.

A Jupyter-based setup supports risk analysis and decision making within development and operations processes – typical business intelligence / data science procedures can be applied to the ‘business of making and running software’. The idea is to create feedback loops, and facilitate human decision making by automatically providing reliable input in form of up-to-date factoids. After all development is our business — so let's have KPIs for developing, releasing, and operating software.

## Typical Use-Cases

Here are some obvious application areas where data analysis can be helpful on the technical side.

* Migration processes of all kinds (current state, progress tracking, achievement of objectives)
* Inventory reporting for increased transparency and support of operational decisions
* Automate internal reporting processes to free up scarce assets and human expertise

## Platform Architecture

A simple [JupyterHub](https://jupyter.org/hub) setup can enable you to do analysis on your already available but under-used and hardly understood data, without any great investment of effort or capital. By adding a single JupyterHub host, you can use the built-in Python3 kernel to access existing internal data sources.

The following diagram shows what role JupyterHub can play in an existing environment.

> ![DevOps Intelligence Architecture](https://github.com/1and1/debianized-jupyterhub/raw/master/docs/_static/img/devops-intelligence.png)

To make such a deployment easy, the [1and1/debianized-jupyterhub](https://github.com/1and1/debianized-jupyterhub#jupyterhub-debian-packaging) project provides a JupyterHub service including a fully equipped Python3 kernel as a single Debian package – only Python3, NodeJS, and Chromium (for visualization frameworks) packages must be installed in addition to the `jupyterhub` one.

Including a [NginX-based SSL off-loader](https://github.com/1and1/debianized-jupyterhub#securing-your-jupyterhub-web-service-with-an-ssl-off-loader), the [complete setup](https://github.com/1and1/debianized-jupyterhub#how-to-set-up-a-simple-service-instance) can be done in under an hour.

## Use-Case: Migration Reporting

At the time of this writing (early 2019), a widespread challenge is migration from Oracle Java to other vendors, and also to start migration from Java 8 to newer versions (Java 11). If you do that at scale across many machines and teams, you definitely need some kind of governance, and constant feedback on the current status and the rate of progress.

What follows is an excerpt from a productive notebook, with anonymized data about [AdoptOpenJDK](https://adoptopenjdk.net/) deployments. That data was originally retrieved from a system called *“Patch Management Reporting”*, which collects information about installed packages for all hosts in the data center. We're in the yellow *“Data Sources”* box of the above figure here.

First off, we read the data and show the value sets of categorical columns, plus a sample.

In [1]:
import numpy as np
import pandas as pd

raw_data = pd.read_csv("../data/cmdb-aoj.csv", sep=',')

print('♯ of Records: {}\n'.format(len(raw_data)))

for name in raw_data.columns[1:]:
    if not name.startswith('Last '):
        print(name, '=', list(sorted(set(raw_data[name].fillna('')))))

print(); print(raw_data.head(3).transpose())

♯ of Records: 104

Distribution = ['Debian 8.10', 'Debian 8.11', 'Debian 8.6', 'Debian 8.9', 'Debian 9.6', 'Debian 9.7', 'Debian 9.8']
Architecture = ['amd64']
Environment = ['--', 'DEV', 'LIVE', 'QA']
Team = ['Team Blue', 'Team Green', 'Team Red', 'Team Yellow']
Installed version = ['11.0.2.9-83(amd64)', '11.0.2.9-85(amd64)', '8.202.b08-66(amd64)', '8.202.b08-83(amd64)', '8.202.b08-85(amd64)']

                                    0                   1                   2
CMDB_Id                     108380195           298205230           220678839
Distribution              Debian 8.11          Debian 9.6         Debian 8.11
Architecture                    amd64               amd64               amd64
Environment                       DEV                  --                 DEV
Team                         Team Red            Team Red            Team Red
Last seen            2019-03-18 06:42    2019-03-18 06:42    2019-03-18 06:42
Last modified        2019-03-18 06:42    2019-03-18 06:

Next comes the usual data cleanup. The `Distribution` column is a bit diverse, and not everyone has Debian codenames and associated major versions memorized. The `map_distro` function fixes that.

In [2]:
def map_distro(name):
    """Helper to create canonical OS names."""
    return (name.split('.', 1)[0]
        .replace('Debian 7', 'wheezy')
        .replace('Debian 8', 'jessie')
        .replace('Debian 9', 'stretch')
        .replace('Debian 10', 'buster')
        .replace('squeeze', 'Squeeze [6]')
        .replace('wheezy', 'Wheezy [7]')
        .replace('jessie', 'Jessie [8]')
        .replace('stretch', 'Stretch [9]')
        .replace('buster', 'Buster [10]')
    )

Together with other cleanup steps, the mapper function is applied in a [dfply](https://towardsdatascience.com/dplyr-style-data-manipulation-with-pipes-in-python-380dcb137000) pipeline. The result is controlled by showing a sample of data points with unique version numbers.

In [3]:
from dfply import *

piped = (raw_data
    >> mutate(Version=X['Installed version'].str.split('[()]', 1, expand=True)[0])
    >> mutate(Environment=X.Environment
              .fillna('--').str.replace('--', 'N/A').str.upper())
    >> mutate(Distribution=X.Distribution.apply(map_distro))
    >> drop(X.CMDB_Id, X['Last seen'], X['Last modified'], X['Installed version'])
)

print((piped >> distinct(X.Version)).transpose())

                       0            13            14            62  \
Distribution   Jessie [8]  Stretch [9]    Jessie [8]   Stretch [9]   
Architecture        amd64        amd64         amd64         amd64   
Environment           DEV          N/A           DEV           DEV   
Team             Team Red    Team Blue      Team Red     Team Blue   
Version       11.0.2.9-83  11.0.2.9-85  8.202.b08-83  8.202.b08-85   

                        68  
Distribution    Jessie [8]  
Architecture         amd64  
Environment            DEV  
Team             Team Blue  
Version       8.202.b08-66  
