Skip to content

Commit

Permalink
Merge pull request #51 from takacszs/devel
Browse files Browse the repository at this point in the history
Add JupyterLab tutorial
  • Loading branch information
maystery committed Jan 15, 2021
2 parents 76f3500 + e70122d commit 45954bd
Show file tree
Hide file tree
Showing 7 changed files with 485 additions and 0 deletions.
115 changes: 115 additions & 0 deletions sphinx/source/tutorial-bigdata-ai.rst
Original file line number Diff line number Diff line change
Expand Up @@ -726,3 +726,118 @@ The complete machine learning environment consists of the following components:
.. code:: bash
occopus-destroy -i 14032858-d628-40a2-b611-71381bd463fa
JupyterLab
~~~~~~~~~~
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. The notebook extends the console-based approach to interactive computing in a qualitatively new direction, providing a web-based application suitable for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results.

The Jupyter Notebook combines two components:
- A web application: a browser-based tool for interactive authoring of documents which combine explanatory text, mathematics, computations and their rich media output.
- Notebook documents: a representation of all content visible in the web application, including inputs and outputs of the computations, explanatory text, mathematics, images, and rich media representations of objects.

For more information on Jupyter Notebooks, visit `the official documentation of Jupyter Notebook <https://jupyter-notebook.readthedocs.io/en/latest/>`_.

JupyterLab is the next-generation web-based user interface for Project Jupyter, it's a web-based interactive development environment for Jupyter notebooks, code, and data. JupyterLab is flexible: configure and arrange the user interface to support a wide range of workflows in data science, scientific computing, and machine learning. JupyterLab is extensible and modular: write plugins that add new components and integrate with existing ones.

Compared to the classical web user interface where users can manage Jupyter Notebooks (available at ``http://<JupyterLabIP>:8888/tree``) JupyterLab (available at ``http://<JupyterLabIP>:8888/lab``) provides a more modern user interface where users can install extensions to satisfy their needs and improve their productivity using the Extension Manager.

For more information on how to use the JupyterLab web-based user interface, visit `the official documentation of JupyterLab <https://jupyterlab.readthedocs.io/en/stable/user/interface.html>`_.

**Features**

- creating a node through contextualisation
- utilising health check against a predefined port

**Prerequisites**

- accessing a cloud through an Occopus-compatible interface (e.g EC2, Nova, Azure, etc.)
- target cloud contains an Ubuntu 18.04 image with cloud-init support

**Download**

You can download the example as `tutorials.examples.jupyterlab <https://raw.githubusercontent.com/occopus/docs/devel/tutorials/jupyterlab.tar.gz>`_ .

.. note::

In this tutorial, we will use nova cloud resources (based on our nova tutorials in the basic tutorial section). However, feel free to use any Occopus-compatible cloud resource for the nodes, but we suggest to instantiate all nodes in the same cloud.

**Steps**

#. Open the file ``nodes/node_definitions.yaml`` and edit the resource section of the nodes labelled by ``node_def:``.

- you must select an :ref:`Occopus compatible resource plugin <user-doc-clouds>`
- you can find and specify the relevant :ref:`list of attributes for the plugin <userdefinitionresourcesection>`
- you may follow the help on :ref:`collecting the values of the attributes for the plugin <user-doc-collecting-resources>`
- you may find a resource template for the plugin in the :ref:`resource plugin tutorials <tutorial-resource-plugins>`

The downloadable package for this example contains a resource template for the Nova plugin.

.. important::
For the JupyterLab extensions to work properly, the recommended resources are ``VCPU:2``, ``RAM:4GB``

.. important::

Do not modify the values of the contextualisation and the health_check section’s attribute!

.. note::

If you want Occopus to monitor (health_check) your initiated virtual machine and it is to be deployed in a different network, make sure you assign public (floating) IP to the node.

#. Open the file ``nodes/infra-jupyterlab.yaml`` and edit the variables section labelled by ``variables``. The default username is "jovyan" and the default password is "lpds". Change the value of ``pwd_jupyterlab`` to a safe password!

.. important::

Make sure the default password is changed, because the JupyterLab environment is exposed publicly on the Internet and anyone with access to the password could execute arbitrary code on the underlying virtual machine with root privileges!

#. Services on the virtual machine should be available from outside, therefore some port numbers must be opened for the VM executing the components. Clouds implement port opening various way (e.g. security groups for OpenStack, etc). Make sure you implement port opening in your cloud for the following port ranges:

=========== ============= ====================
Protocol Port(s) Service
=========== ============= ====================
TCP 22 SSH
TCP 8888 Jupyter Notebook
=========== ============= ====================

#. Make sure your authentication information is set correctly in your authentication file. You must set your authentication data for the ``resource`` you would like to use. Setting authentication information is described :ref:`here <authentication>`.


#. Load the node definitions into the database. Make sure the proper virtualenv is activated!

.. important::

Occopus takes node definitions from its database when builds up the infrastructure, so importing is necessary whenever the node definition or any imported (e.g. contextualisation) file changes!

.. code:: bash
occopus-import nodes/node_definitions.yaml
#. Start deploying the infrastructure.

.. code:: bash
occopus-build infra-jupyterlab.yaml
#. After successful finish, the node with ``ip address`` and ``node id`` is listed at the end of the logging messages and the identifier of the newly built infrastructure is printed. You can store the identifier of the infrastructure to perform further operations on your infra or alternatively you can query the identifier using the **occopus-maintain** command.

.. code:: bash
List of nodes/instances/addresses:
jupyterlab:
3116eaf5-89e7-405f-ab94-9550ba1d0a7c
192.168.xxx.xxx
14032858-d628-40a2-b611-71381bd463fa
#. You can start using the freshly installed JupyterLab using your web browster at the following URL:

- JupyterLab: ``http://<JupyterLabIP>:8888``

.. note::

The JupyterLab web user interface is password protected, enter the password that was set in ``nodes/infra-jupyterlab.yaml``

#. Finally, you may destroy the infrastructure using the infrastructure id returned by ``occopus-build``

.. code:: bash
occopus-destroy -i 14032858-d628-40a2-b611-71381bd463fa
Binary file added tutorials/jupyterlab.tar.gz
Binary file not shown.
35 changes: 35 additions & 0 deletions tutorials/jupyterlab/examples/list-of-python-packages.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "leading-clock",
"metadata": {},
"outputs": [],
"source": [
"!pip list"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
180 changes: 180 additions & 0 deletions tutorials/jupyterlab/examples/what-is-the-jupyter-notebook.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# What is the Jupyter Notebook?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Jupyter Notebook is an **interactive computing environment** that enables users to author notebook documents that include: \n",
"- Live code\n",
"- Interactive widgets\n",
"- Plots\n",
"- Narrative text\n",
"- Equations\n",
"- Images\n",
"- Video\n",
"\n",
"These documents provide a **complete and self-contained record of a computation** that can be converted to various formats and shared with others using email, [Dropbox](https://www.dropbox.com/), version control systems (like git/[GitHub](https://github.com)) or [nbviewer.jupyter.org](https://nbviewer.jupyter.org)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Components"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Jupyter Notebook combines three components:\n",
"\n",
"* **The notebook web application**: An interactive web application for writing and running code interactively and authoring notebook documents.\n",
"* **Kernels**: Separate processes started by the notebook web application that runs users' code in a given language and returns output back to the notebook web application. The kernel also handles things like computations for interactive widgets, tab completion and introspection. \n",
"* **Notebook documents**: Self-contained documents that contain a representation of all content visible in the notebook web application, including inputs and outputs of the computations, narrative\n",
"text, equations, images, and rich media representations of objects. Each notebook document has its own kernel."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Notebook web application"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook web application enables users to:\n",
"\n",
"* **Edit code in the browser**, with automatic syntax highlighting, indentation, and tab completion/introspection.\n",
"* **Run code from the browser**, with the results of computations attached to the code which generated them.\n",
"* See the results of computations with **rich media representations**, such as HTML, LaTeX, PNG, SVG, PDF, etc.\n",
"* Create and use **interactive JavaScript widgets**, which bind interactive user interface controls and visualizations to reactive kernel side computations.\n",
"* Author **narrative text** using the [Markdown](https://daringfireball.net/projects/markdown/) markup language.\n",
"* Include mathematical equations using **LaTeX syntax in Markdown**, which are rendered in-browser by [MathJax](https://www.mathjax.org/)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Kernels"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Through Jupyter's kernel and messaging architecture, the Notebook allows code to be run in a range of different programming languages. For each notebook document that a user opens, the web application starts a kernel that runs the code for that notebook. Each kernel is capable of running code in a single programming language and there are kernels available in the following languages:\n",
"\n",
"* Python(https://github.com/ipython/ipython)\n",
"* Julia (https://github.com/JuliaLang/IJulia.jl)\n",
"* R (https://github.com/IRkernel/IRkernel)\n",
"* Ruby (https://github.com/minrk/iruby)\n",
"* Haskell (https://github.com/gibiansky/IHaskell)\n",
"* Scala (https://github.com/Bridgewater/scala-notebook)\n",
"* node.js (https://gist.github.com/Carreau/4279371)\n",
"* Go (https://github.com/takluyver/igo)\n",
"\n",
"The default kernel runs Python code. The notebook provides a simple way for users to pick which of these kernels is used for a given notebook. \n",
"\n",
"Each of these kernels communicate with the notebook web application and web browser using a JSON over ZeroMQ/WebSockets message protocol that is described [here](https://jupyter-client.readthedocs.io/en/latest/messaging.html#messaging). Most users don't need to know about these details, but it helps to understand that \"kernels run code.\""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Notebook documents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notebook documents contain the **inputs and outputs** of an interactive session as well as **narrative text** that accompanies the code but is not meant for execution. **Rich output** generated by running code, including HTML, images, video, and plots, is embeddeed in the notebook, which makes it a complete and self-contained record of a computation. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you run the notebook web application on your computer, notebook documents are just **files on your local filesystem with a `.ipynb` extension**. This allows you to use familiar workflows for organizing your notebooks into folders and sharing them with others."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notebooks consist of a **linear sequence of cells**. There are three basic cell types:\n",
"\n",
"* **Code cells:** Input and output of live code that is run in the kernel\n",
"* **Markdown cells:** Narrative text with embedded LaTeX equations\n",
"* **Raw cells:** Unformatted text that is included, without modification, when notebooks are converted to different formats using nbconvert\n",
"\n",
"Internally, notebook documents are **[JSON](https://en.wikipedia.org/wiki/JSON) data** with **binary values [base64](https://en.wikipedia.org/wiki/Base64)** encoded. This allows them to be **read and manipulated programmatically** by any programming language. Because JSON is a text format, notebook documents are version control friendly.\n",
"\n",
"**Notebooks can be exported** to different static formats including HTML, reStructeredText, LaTeX, PDF, and slide shows ([reveal.js](https://revealjs.com)) using Jupyter's `nbconvert` utility.\n",
"\n",
"Furthermore, any notebook document available from a **public URL or on GitHub can be shared** via [nbviewer](https://nbviewer.jupyter.org). This service loads the notebook document from the URL and renders it as a static web page. The resulting web page may thus be shared with others **without their needing to install the Jupyter Notebook**."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
11 changes: 11 additions & 0 deletions tutorials/jupyterlab/infra-jupyterlab.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
infra_name: jupyterlab
user_id: somebody@somewhere

variables:
username_jupyterlab: jovyan
pwd_jupyterlab: lpds

nodes:
- &M
name: jupyterlab
type: jupyterlab_node

0 comments on commit 45954bd

Please sign in to comment.