# Tutorial: Jupyter Notebook using C3 Python SDK

This tutorial discusses the basics of using C3 Notebook. 

You will learn how to:

1. Utilize **C3 Python SDK** in conjunction with **C3 Notebook** to develop software on an externally persisted Jupyter Notebook.
1. Use the **help** function to learn more about C3 types and methods. 
1. Manage your notebook **kernel**.
1. Utilize **file operations** to upload, download, and create files that will persist externally.

Some additional advanced topics at the end:

1. View, create, and use **action runtimes**. 
1. Use advanced legacy **file operations**.

## Setup
This section will walk you through the basic setup and verification steps.

First, let us verify that we are working with Python 3. To take full advantage of C3 Type system, we strongly recommend that you use Python 3 or later versions.

In [1]:
# NotVerify: result
import sys
sys.version

'3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 21:08:20) \n[GCC 9.3.0]'

The Notebook startup hook will automatically connect to your C3 environment, download the Python SDK, and instantiate a `c3` variable that contains the connection to your environment.

Let us verify that there exists a `c3` variable and that it represents the type system:

In [2]:
# NotVerify: result
c3

c3.TypeSystemBase(connection=ServerConnection(url='https://dev-dti.c3dti.ai', auth='XXXX', tenant='dti-jupyter', tag='tc01'))

Now that we have done a few sanity checks on the system setup, let us move on to the basics!

## C3 Jupyter Basics
This section will cover the bare minimum basics to get up and running with the C3 Jupyter Service.

### Python SDK Basics
This section will walk you through a few basic examples illustrating the syntax for using the C3 Python SDK.

We represent Types and instantiation of Types as `TypeProxy`'s, which provide a handle to all the remote object's functionality. To access a Type, we use the dot notation of Javascript. This unifies the syntax between Python and Javascript with the goal of increasing code portability. In the following, we access the `User` type:

In [4]:
c3.User

c3.User

To access the persisted instantiations of `User`:

In [5]:
# NotVerify: result
users = c3.User.fetch(c3.FetchSpec(limit=1))
users

c3.FetchResult<User>(
 objs=c3.Arry<User>([c3.User(
         typeIdent='MBR:USER',
         id='AnonymousUser',
         name='AnonymousUser',
         meta=c3.Meta(
                tenantTagId=153,
                tenant='dti-jupyter',
                tag='c3',
                created=datetime.datetime(2021, 7, 22, 13, 24, 42, tzinfo=datetime.timezone.utc),
                createdBy='babreu@illinois.edu',
                updated=datetime.datetime(2021, 7, 22, 13, 24, 42, tzinfo=datetime.timezone.utc),
                updatedBy='babreu@illinois.edu',
                timestamp=datetime.datetime(2021, 7, 22, 13, 24, 42, tzinfo=datetime.timezone.utc),
                fetchInclude='[]',
                fetchType='User'),
         version=1,
         user='AnonymousUser',
         groups=c3.Arry<AdminGroup>([c3.AdminGroup(id='C3.Group.AnonymousUser')]))]),
 count=1,
 hasMore=True)

You can now traverse the fields of the result as you would normally:

In [6]:
# NotVerify: result
user1 = users.objs[0]
user1

c3.User(
 typeIdent='MBR:USER',
 id='AnonymousUser',
 name='AnonymousUser',
 meta=c3.Meta(
        tenantTagId=153,
        tenant='dti-jupyter',
        tag='c3',
        created=datetime.datetime(2021, 7, 22, 13, 24, 42, tzinfo=datetime.timezone.utc),
        createdBy='babreu@illinois.edu',
        updated=datetime.datetime(2021, 7, 22, 13, 24, 42, tzinfo=datetime.timezone.utc),
        updatedBy='babreu@illinois.edu',
        timestamp=datetime.datetime(2021, 7, 22, 13, 24, 42, tzinfo=datetime.timezone.utc),
        fetchInclude='[]',
        fetchType='User'),
 version=1,
 user='AnonymousUser',
 groups=c3.Arry<AdminGroup>([c3.AdminGroup(id='C3.Group.AnonymousUser')]))

You can also access fields and subfields:

In [6]:
# NotVerify: all
print(user1.name)
print(user1.meta.tenant)
print(user1.meta.tag)

AnonymousUser
notebook
c3


Try accessing a field that does not exist:

In [7]:
# NotVerify: result
print(user1.doesnotexist)

AttributeError: 'User' object has no attribute 'doesnotexist'

### Help with C3
This section will walk you through the basics of using the `help` utility.

To learn more about any C3 Type or method, use the `help()` function. This method extends the behavior of the built-in Python `help()` function to provide information about C3 Types. try using `help()` below.

In [8]:
# NotVerify: all
help(c3.FFT)

As you can see, when used on a C3 Type, `help()` will provide information about the Type's fields, methods, and uses. The `help()` function can also be used on C3 Type methods.

In [9]:
# NotVerify: all
help(c3.FFT.forward)

When used on a C3 Type method, `help()` will return information regarding the method's parameters, return values, and uses.

### Jupyter Content Storage
This section will walk you through the C3 Jupyter Service content storage model.

You are currently using the C3 Jupyter Service, which is created on demand through C3. This means that, in addition to the vanilla Jupyter functionality, you also get the following:
- Dedicated memory/CPU resources (you can verify the usage in the toolbar)
- Automatic authentication/authorization using your C3 credentials
- Remote content storage backed by C3


#### Listing Remote Content
The C3 Jupyter Service automatically integrates the content list that is stored in C3 into the Jupyter `/tree` view. In the Jupyter /tree view you can see which files exist in your Jupyter container, and which files have pending changes to upload.

#### Download Remote Content
From the `/tree` view, you can download any content that is in C3 to your Jupyter container. To do so, simply check the files/directories you wish to download and click `C3 Download`.

#### Upload New/Changed Content to Remote
From the `/tree` view, you can upload any content from your Jupyter container to C3. To do so, simply check the files/directories you wish to upload to C3, and click `C3 Upload`.

#### Upload Notebook to Remote
From within your notebook, you can save your notebook as you would normally do. Note that your notebook will be auto-uploaded every 5 minutes in the background. You can also manually upload your progress to C3, by either `File -> Upload Notebook to C3` or using the shortcut `cmdtrl+shift+s`.

### Jupyter Kernel Management
This section will walk you through the steps of how to discover, install, and switch to new kernels.

All the kernel management can be done through the `Kernel -> Manage Kernels` widget in your Notebook. This widget detects and lists all runtime definitions from C3. You can select the kernels you would like to install. After successful installation, you can switch to use the new kernel through `Kernel -> Change kernel`.

NOTE: If at first you do not see your kernel, then you may need to refresh your page.

### C3 Connection Status
This section explains the C3 connection check that is integrated with Jupyter and what it means for you.

As you can see from the above sections, C3 is tightly integrated into the Jupyter Service. Everything from content management to authentication to kernel management is coupled to C3. 

As a result, it is important for the user to know when Jupyter's connection to C3 is disrupted. A user can check the periodically-updated connection status in the "C3 Connection Status" section located in the top toolbar. During a disruption time period, the Jupyter Service will be running in a degraded state, where features will be local-only (for example, content and kernel definitions).

The cause for disruption is usually not clear from the Jupyter Service's perspective. For an explanation on the disruption, contact your C3 cluster administrator. Some possible causes include:
* ongoing provisioning to your tenant/tag
* cluster unavailability

## Advanced Topics

### Understanding Jupyter Kernel Runtime Definitions
This section will walk you through how to understand the definitions of C3 runtimes and how they relate to your Jupyter kernel.

Before you can effectively develop software on the C3 notebook, you must select the right kernel for your job. Let's first discuss what a kernel consists of. To do this, we will use `TagMetadataStore`. Change the cell below to code and take a look at `TagMetadataStore`.

The `TagMetadataStore` type has many different methods for managing your current tag. Most information regarding packages, types, and runtimes on your current tag can be inspected and changed using the methods provided by `TagMetadataStore`. Using the `runtimes()` method, we can see available runtimes on our tag. To build our understanding of runtimes, let’s look at one's contents.

In [10]:
# NotVerify: result
runtimes = c3.TagMetadataStore().runtimes()
py_sklearn_runtime = runtimes['py-sklearn_1_0_0']
py_sklearn_runtime

c3.ActionRuntime(
 name='py-sklearn_1_0_0',
 id='py-sklearn_1_0_0',
 language='Python',
 runtime='CPython',
 runtimeVersion='3.6',
 location='all',
 modules=c3.Mapp<string, string>({'conda.dill': '=0.2.8.2',
           'conda.numpy': '=1.15.2',
           'conda.pandas': '=0.23.4',
           'conda.scikit-learn': '=0.20.0',
           'conda.scipy': '=0.19'}),
 repositories=c3.Arry<string>(['https://repo.continuum.io/pkgs/main']))

As you can see, a runtime environment consists of:

 - language: The programming language for this runtime. Currently, only Python and R are supported.
 - runtime: This is the implementation for you language.
 - runtime version: The version of your runtime.
 - connector: As of 7.10, it is recommended to not specify the connector field (this will give you the "thick" client). That being said, there are other supported connectors. The "simple" connector is the old C3FastConnection connector used by default in C3 Server < 7.8; "remote" uses the C3 remote connector; "remote-types" uses a Type-Aware C3 remote connector, and is required for executing python inline functions (discussed in `TutorialC3PythonSDKInlineFunctions.ipynb`).
 - modules: A list of libraries (and versions) to be installed to your environment.
 - repository: The conda package repository for the libraries.

When you select a kernel, you are picking an environment in which your code will execute. Your Jupyter kernel can come from two different locations: local runtimes generated from a `requirements.yaml` file, and python runtimes for server-side python C3 actions. You can use `TagMetadataStore` to look at the content of existing python runtimes to help you pick the right one.

In [11]:
# NotVerify: result
c3.TagMetadataStore().runtimes()

c3.Mapp<string, ActionRuntime>({'js-client': c3.ActionRuntime(
               name='js-client',
               id='js-client',
               language='JavaScript',
               runtimeVersion='ES2015',
               location='client',
               modules=c3.Mapp<string, string>({'builtin.underscore': '*'})),
 'js-server': c3.ActionRuntime(
               name='js-server',
               id='js-server',
               language='JavaScript',
               runtimeVersion='ES5',
               location='server',
               modules=c3.Mapp<string, string>({'builtin.underscore': '*'})),
 'js-testrunner': c3.ActionRuntime(
                   name='js-testrunner',
                   id='js-testrunner',
                   language='JavaScript',
                   runtimeVersion='ES5',
                   location='server',
                   modules=c3.Mapp<string, string>({'builtin.jasmine': '*',
                             'builtin.jsverify': '*',
                             'bui

When you find a runtime with the correct language, packages, and versions, you can install it from "Kernel -> Manage Kernels". When selecting a kernel that has yet to be installed, expect the kernel to take a few minutes to download. After successful installation and refreshing the page, the new kernel should appear in "Kernel -> Change kernel" for you to switch to.

Suppose you want to try out an existing kernel with a slight modification (i.e. newer version of sklearn) or you want to create your own kernel entirely (typically through `conda env export` of your local environment). To do this, we recommend creating a conda requirements file and placing it in the `requirements_files` folder in your root Jupyter directory. We can look at an existing runtime to see the content of its requirement file.

In [12]:
# NotVerify: all
reqFiles = c3.CondaActionRuntime.requirementsFilesForLanguage(language="Python")  #get the requirements file for all runtimes
print(reqFiles['py-sklearn_1_0_0'])  #display the requirements files for py-sklearn_1_0_0


#conda env create --file requirements.yaml
name: py-sklearn_1_0_0
channels:
- https://repo.continuum.io/pkgs/main
dependencies:
- dill=0.2.8.2
- scikit-learn=0.20.0
- numpy=1.15.2
- scipy=0.19
- pandas=0.23.4
- python=3.6


Now that we have an idea of what a requirement file consists of, let’s create our own. Place the following as a file named `py-test-env.yaml` in the `requirements_files` folder:

Note: Your new kernel's name will be the name of the `yaml` file, **not** the value stored in the name field within the file.

After refreshing your browser, you should be able to find your kernel in the drop down menu "Kernel -> Manage Kernels". After installing, this new kernel can be used to execute code locally, along with testing custom server-side methods, as described in the next section.

### Creating Action Runtimes
This section will cover creating and selecting runtimes for server-side methods.

Every Python server-side method has a runtime associated with it. To view a method's runtime, you can use the command below. This line will show you the runtime for `XgBoostPipe`'s `process` method.

In [13]:
# NotVerify: all
c3.XgBoostPipe.fieldType('process').declaredAnnotations.py.env

'xgboost_1_0_0'

When implementing custom methods, we have two options for runtime: we can choose an existing runtime, or we can create a new one. If none of the existing runtimes have your desired packages/versions, you may need to create your own.

Note: It is recommended you create a `requirement.yaml` file and test your methods locally prior to deploying your runtime for use server-side.

Once your runtime has been tested thoroughly locally, you can deploy it for server-side use by storing the action runtime as data in a seed folder. To do this, we must first create a json file to hold our data. Relative to your current package, your data to be deployed with the software package will be stored in the `/seed/ActionRuntime/` directory (create this directory if it does not exist). In our case, your data will be stored at `c3server/repo/server/platform/seed/ActionRuntime/py-test-env2.json`. Create this file now, and insert the below data to this file.

By comparing our `py-test-env.yaml` to your data in the `seed` folder, you can see that many fields between these two files have one-to-one mappings. The major difference between these two files is the additional required `runtime` field and optional `connector` field in the json (see `TutorialC3PythonSDKInlineFunctions.ipynb` for more information regarding connectors).

To use this runtime (either for your Notebook kernel or for a Python C3 server-side action), you must (re-)deploy your software package.

### Advanced Remote File Operations
In this section, we present a few file operations that allow you to read and write files that will be persisted through C3 for long-term storage.

#### `c3_open(relative_path, mode, force_refresh)`
relative_path: path in Jupyter `/tree` view of the file you wish to open  
mode: same as python `open()`  
force_refresh: overwrites any local caching you may have for the file

Use `c3_open` to create a new writable file and write to it:

In [14]:
with c3_open('test.txt', 'w') as f:
    f.write('Hello World!')

If you go back to your Jupyter home screen, you should be able to see this new file. Now let's read it back!

In [15]:
with c3_open('test.txt', 'r') as f:
    print(f.read())

Hello World!


Now let's try an example with a Pandas DataFrame.  Recall that the notebook will need to run in a Jupyter kernel that has the pandas library (eg. py-sklearn_1_0_0).  

In [16]:
# Create a sample data frame
import pandas as pd
data = [['Alex', 10], ['Bob', 12], ['Clarke', 13]]
df = pd.DataFrame(data,columns=['Name', 'Age'])
df

Unnamed: 0,Name,Age
0,Alex,10
1,Bob,12
2,Clarke,13


In [17]:
# Save the data frame to a csv locally (not persisted beyond lifetime of this notebook session!)
df.to_csv('file_name.csv')

# Copy the content of the local data frame to persisted storage using c3_open
with c3_open('persist_file_name.csv', 'w') as f, open('file_name.csv', 'r') as f2:
    f.write(f2.read())

In [18]:
# Read out data frame from persisted storage
with c3_open('persist_file_name.csv', 'r') as f:
    df2 = pd.read_csv(f, index_col=0)
df2

Unnamed: 0,Name,Age
0,Alex,10
1,Bob,12
2,Clarke,13


#### `c3_download`
path: list of paths in Jupyter `/tree` view of the file(s) and/or directory(s) you wish to download (must be list!)  
overwrite: if true, forcibly overwrites any existing local content  
disable_directory_merge: if true, then an error will be thrown if directory already exists  
  
e.g. 
```
num_downloaded = c3_download(["tutorials", "libs/helper.py"])
```

#### `c3_upload`
path: list of paths in Jupyter `/tree` view of the file(s) and/or directory(s) you wish to upload (must be list!)  
overwrite: if true, forcibly overwrites any existing remote content  
increment_version: if true, then all versionable content will have version number incremented  
  
e.g. 
```
num_uploaded = c3_upload(["tutorials", "libs/helper.py"])
```

#### `c3_pickle`

In [19]:
# Skip: `c3_pickle` is only supported for C3Notebook 3+
import numpy as np
np_array = np.arange(0,5)
c3_pickle.upload("numpy.pickle", np_array)

In [20]:
# Skip: `c3_pickle` is only supported for C3Notebook 3+
result = c3_pickle.download("numpy.pickle")
result

array([0, 1, 2, 3, 4])

## Next Steps
Now that you're familiar with using the C3 Python SDK with a Jupyter notebook, check out some of our other tutorial notebooks!