# Using the V3IO Frames Library for High-Performance Data Access 

- [Overview](#frames-overview)
- [Initialization](#frames-init)
- [Working with NoSQL Tables (kv Backend)](#frames-kv)
- [Working with Time-Series Databases (tsdb Backend)](#frames-tsdb)
- [Working with Streams (stream Backend)](#frames-stream)
- [Cleanup](#frames-cleanup)

<a id="frames-overview"></a>
## Overview

[V3IO Frames](https://github.com/v3io/frames) (**"Frames"**) is a multi-model open-source data-access library, developed by Iguazio, which provides a unified high-performance DataFrame API for working with data in the data store of the Iguazio Data Science Platform (**"the platform"**).
Frames currently supports the NoSQL (key/value), stream, and time-series (TSDB) data models via its `kv`, `stream`, and `tsdb` backends.

To use Frames, you first need to import the **v3io_frames** library and create and initialize a client object &mdash; an instance of the`Client` class.<br>
The `Client` class features the following object methods for supporting basic data operations; the type of data is derived from the backend type (`tsdb` &mdash; TSDB table / `kv` &mdash; NoSQL table / `stream` &mdash; data stream):

- `create` &mdash; creates a new TSDB table or a stream ("backend data").
- `delete` &mdash; deletes a table or stream or specific NoSQL ("KV") table items.
- `read` &mdash; reads data from a table or stream into pandas DataFrames.
- `write` &mdash; writes data from pandas DataFrames to a table or stream.
- `execute` &mdash; executes a command on a table or stream.
  Each backend may support multiple commands.

For a detailed description of the Frames API, see the [Frames documentation](https://github.com/v3io/frames/blob/development/README.md).<br>
For more help and usage details, use the internal API help &mdash; `<client object>.<command>?` in Jupyter Notebook or `print(<client object>.<command>.__doc__)`.<br>
For example, the following command returns information about the read operation for a client object named `client`:
```
client.read?
```

<a id="frames-init"></a>
## Initialization

To use V3IO Frames, first ensure that your platform tenant has a shared tenant-wide instance of the V3IO Frames service.
This can be done by a platform service administrator from the **Services** dashboard page.<br>
Then, import the required libraries and create a Frames client object (an instance of the `Client` class), as demonstrated in the following code, which creates a client object named `client`.

> **Note:**
> - The client constructor's `container` parameter is set to `"users"` for accessing data in the platform's "users" data container.
> - Because no authentication credentials are passed to the constructor, Frames will use the access key that's assigned to the `V3IO_ACCESS_KEY` environment variable.
>   The platform's Jupyter Notebook service defines this variable automatically and initializes it to a valid access key for the running user of the service.
>   You can pass different credentials by using the constructor's `token` parameter (platform access key) or `user` and `password` parameters (platform username and password).

In [1]:
import pandas as pd
import v3io_frames as v3f
import os

# Create a Frames client
client = v3f.Client("framesd:8081", container="users")

<a id='frames-kv'></a>
## Working with NoSQL Tables (kv Backend)

This section demonstrates how to use the `kv` Frames backend to write and read NoSQL data in the platform.

- [Initialization](#frames-kv-init)
- [Load Data from Amazon S3](frames-kv-load-data-s3)
- [Write to a NoSQL Table](#frames-kv-write)
- [Read from the Table Using an SQL Query](#frames-kv-read-sql-query)
- [Read from the Table Using the Frames API](#frames-kv-read-frames-api)
  - [Read Using a Single DataFrame](#frames-kv-read-frames-api-single-df)
  - [Read Using a DataFrames Iterator (Streaming)](#frames-kv-read-frames-api-df-iterator)
- [Delete the NoSQL Table](#frames-kv-delete)

<a id="frames-kv-init"></a>
### Initialization

Start out by defining table-path variables that will be used in the tutorial's code examples.<br>
The table path (`table`) is relative to the configured parent data container; see [Write to a NoSQL Table](#frames-kv-write).

In [2]:
# Relative path to the NoSQL table within the parent platform data container
table = os.path.join(os.getenv("V3IO_USERNAME") + "/examples/bank")

# Full path to the NoSQL table for SQL queries (platform Presto data-path syntax);
# use the same data container as used for the Frames client ("users")
sql_table_path = 'v3io.users."' + table + '"'

<a id="frames-kv-load-data-s3"></a>
### Load Data from Amazon S3

Read a file from an Amazon Simple Storage (S3) bucket into a Frames pandas DataFrame.

In [3]:
# Read an AWS S3 file into a DataFrame and show its data and metadata
df = pd.read_csv("https://s3.amazonaws.com/iguazio-sample-data/bank.csv", sep=";")
df.head()

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown,no
1,33,services,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,failure,no
2,35,management,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure,no
3,30,management,married,tertiary,no,1476,yes,yes,unknown,3,jun,199,4,-1,0,unknown,no
4,59,blue-collar,married,secondary,no,0,yes,no,unknown,5,may,226,1,-1,0,unknown,no


<a id="frames-kv-write"></a>
### Write to a NoSQL Table

Use the `write` method of the Frames client with the `kv` backend to write the data that was read in the previous step to a NoSQL table.<br>
The mandatory `table` parameter specifies the relative table path within the data container that was configured for the Frames client (see the [main initialization](#frames-init) step).
In the following example, the relative table path is set by using the `table` variable that was defined in the [kv backend initialization](#frames-kv-init) step.<br>
The `dfs` parameter can be set either to a single DataFrame (as done in the following example) or to multiple DataFrames &mdash; either as a DataFrames iterator or as a list of DataFrames.

In [4]:
out = client.write("kv", table=table, dfs=df)

<a id="frames-kv-read-sql-query"></a>
### Read from the Table Using an SQL Query

You can run SQL queries on your NoSQL table (using Presto) to offload data filtering, grouping, joins, etc. to a scale-out high-speed database engine.

> **Note:** To query a table in a platform data container, the table path in the `from` section of the SQL query should be of the format `v3io.<container name>."/path/to/table"`.
> See [Presto Data Paths](https://www.iguazio.com/docs/tutorials/latest-release/getting-started/fundamentals/#data-paths-presto) in the platform documentation.
> In the following example, the path is set by using the `sql_table_path` variable that was defined in the [kv backend initialization](#frames-kv-init) step.
> Unless you changed the code, this variable translates to `v3io.users."<running user>/examples/bank"`; for example, `v3io.users."iguazio/examples/bank"` for user "iguazio".

In [5]:
%sql select * from $sql_table_path where balance > 10000

Done.


loan,education,previous,housing,poutcome,duration,marital,default,balance,month,contact,campaign,y,idx,job,day,age,pdays
no,secondary,0,yes,unknown,149,single,no,10218,nov,cellular,2,no,2916,admin.,19,32,-1
no,tertiary,1,no,failure,699,married,no,11219,aug,cellular,2,no,276,housemaid,12,35,79
no,secondary,0,yes,unknown,249,married,no,19317,aug,cellular,1,yes,3553,retired,4,68,-1
no,secondary,0,no,unknown,14,married,no,17555,aug,cellular,14,no,1776,management,26,43,-1
no,tertiary,0,no,unknown,215,married,no,16264,nov,telephone,3,no,3289,management,17,58,-1
yes,tertiary,0,yes,unknown,197,divorced,no,13204,nov,cellular,2,no,3329,management,20,34,-1
no,tertiary,0,yes,unknown,106,married,no,22370,may,unknown,1,no,2624,entrepreneur,15,53,-1
no,primary,0,no,unknown,205,married,no,71188,oct,cellular,1,no,3700,retired,6,60,-1
no,tertiary,0,no,unknown,29,married,no,13893,jun,unknown,2,no,3608,management,11,44,-1
no,tertiary,0,yes,unknown,288,married,no,10758,jun,cellular,1,no,1005,management,1,41,-1


<a id="frames-kv-read-frames-api"></a>
### Read from the Table Using the Frames API

Use the `read` method of the Frames client with the `kv` backend to read data from your NoSQL table.<br>
The `read` method can return a DataFrame or a DataFrames iterator (a stream), as demonstrated in the following examples.

- [Read Using a Single DataFrame](#frames-kv-read-frames-api-single-df)
- [Read Using a DataFrames Iterator (Streaming)](#frames-kv-read-frames-api-df-iterator)

<a id="frames-kv-read-frames-api-single-df"></a>
#### Read Using a Single DataFrame

The following example uses a single command to read data from the NoSQL table into a DataFrame.

In [6]:
df = client.read(backend="kv", table=table, filter="balance > 20000")
df.head(8)

Unnamed: 0_level_0,age,balance,campaign,contact,day,default,duration,education,housing,job,loan,marital,month,pdays,poutcome,previous,y
idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
871,31,26965,2,cellular,21,no,654,primary,no,housemaid,no,single,apr,-1,unknown,0,yes
2989,42,42045,2,cellular,8,no,205,tertiary,no,entrepreneur,no,married,aug,-1,unknown,0,no
3011,50,26394,4,cellular,25,no,206,secondary,no,services,no,married,aug,-1,unknown,0,no
4047,75,26452,2,telephone,15,no,219,secondary,no,retired,no,married,jul,-1,unknown,0,no
2624,53,22370,1,unknown,15,no,106,tertiary,yes,entrepreneur,no,married,may,-1,unknown,0,no
3700,60,71188,1,cellular,6,no,205,primary,no,retired,no,married,oct,-1,unknown,0,no
1881,36,27359,2,unknown,3,no,71,tertiary,yes,management,no,married,jun,-1,unknown,0,no
1483,43,27733,7,unknown,3,no,164,tertiary,yes,technician,no,single,jun,-1,unknown,0,no


<a id="frames-kv-read-frames-api-df-iterator"></a>
#### Read Using a DataFrames Iterator (Streaming)

The following example uses a DataFrames iterator to stream data from the NoSQL table into multiple DataFrames and allow concurrent data movement and processing.<br>
The example sets the `iterator` parameter to `True` to receive a DataFrames iterator (instead of the default single DataFrame), and then iterates the DataFrames in the returned iterator; you can also use `concat` instead of iterating the DataFrames.

> **Note:** Iterators work with all Frames backends and can be used as input to write functions that support this, such as the `write` method of the Frames client.

In [7]:
dfs = client.read(backend="kv", table=table, filter="balance > 20000", iterator=True)
for df in dfs:
    print(df.head())

      age  balance  campaign    contact  day default  duration  education  \
idx                                                                         
4047   75    26452         2  telephone   15      no       219  secondary   
3011   50    26394         4   cellular   25      no       206  secondary   
2989   42    42045         2   cellular    8      no       205   tertiary   
1881   36    27359         2    unknown    3      no        71   tertiary   
871    31    26965         2   cellular   21      no       654    primary   

     housing           job loan  marital month  pdays poutcome  previous    y  
idx                                                                            
4047      no       retired   no  married   jul     -1  unknown         0   no  
3011      no      services   no  married   aug     -1  unknown         0   no  
2989      no  entrepreneur   no  married   aug     -1  unknown         0   no  
1881     yes    management   no  married   jun     -1  unkno

<a id="frames-kv-delete"></a>
### Delete the NoSQL Table

Use the `delete` method of the Frames client with the `kv` backend to delete the NoSQL table that was used in the previous steps.

In [8]:
# Delete the `table` NoSQL table
client.delete("kv", table)

<a id='frames-tsdb'></a>
## Working with Time-Series Databases (tsdb Backend)

This section demonstrates how to use the `tsdb` Frames backend to create a time-series database (TSDB) table in the platform, ingest data into the table, and read from the table (i.e., submit TSDB queries).

- [Initialization](#frames-tsdb-init)
- [Create a TSDB Table](#frames-tsdb-create)
- [Write to the TSDB Table](#frames-tsdb-write)
- [Read from the TSDB Table](#frames-tsdb-read)
- [Delete the TSDB Table](#frames-tsdb-delete)

<a id="frames-tsdb-init"></a>
### Initialization

Start out by defining a TSDB table-path variable that will be used in the tutorial's code examples.<br>
The table path (`tsdb_table`) is relative to the configured parent data container; see [Create a TSDB Table](#frames-tsdb-create).

In [9]:
# Relative path to the TSDB table within the parent platform data container
tsdb_table = os.path.join(os.getenv("V3IO_USERNAME") + "/examples/tsdb_tab")

<a id="frames-tsdb-create"></a>
### Create a TSDB Table

Use the `create` method of the Frames client with the `tsdb` backend to create a new TSDB table.<br>
The mandatory `table` parameter specifies the relative table path within the data container that was configured for the Frames client (see the [main initialization](#frames-init) step).
In the following example, the relative table path is set by using the `tsdb_table` variable that was defined in the [tsdb backend initialization](#frames-tsdb-init) step.<br>
The `attrs` parameter is used to set additional arguments.
You must set the `rate` argument to the ingestion rate of the TSDB metric-samples, as `"[0-9]+/[smh]"` (where `s` = seconds, `m` = minutes, and `h` = hours); for example, `1/s` (one sample per minute).
The rate should be calculated according to the slowest expected ingestion rate.
You can also set additional optional arguments, such as `aggregates` or `aggregation-granularity`.

In [10]:
# Create a new TSDB table; ingestion rate = one sample per minute ("1/m")
client.create(backend="tsdb", table=tsdb_table, attrs={"rate": "1/m"})

<a id="frames-tsdb-write"></a>
### Write to the TSDB Table

Use the `write` method of the Frames client with the `tsdb` backend to ingest data from a pandas DataFrame into your TSDB table.<br>
The primary-key attribute of platform TSDB tables (i.e., the DataFrame index column) must hold the sample time of the data (displayed as `time` in read outputs).<br>
In addition, TSDB table items (rows) can optionally have sub-index columns (attributes) that are called labels.
You can add labels to TSDB table items in one of two ways; you can also combine these methods:

- Use the `labels` dictionary parameter of the `write` method to add labels to all the written metric-sample table items (DataFrame rows) &mdash; `{<label>: <value>[, <label>: <value>, ...]}`.<br>
  For example, `{"node": "11"}` in the following code example.
  Note that the values of the metric labels must be of type string.
- Define DataFrame index columns for the labels.
  All DataFrame index columns except for the sample-time index column are automatically converted into labels for the respective table items.
  > **Note:** If you wish to use regular columns in your DataFrames as TSDB labels, convert these columns to index columns.
  > The following example converts the `symbol` and `exchange` columns to index columns that will be used as TSDB labels (in addition to the `time` index column):<br>
  > ```python
  > df.index.name="time"                              # Ensure that the sample-time index column is named "time"
  > df.reset_index(level=0, inplace=True)             # Reset the DataFrame indexes
  > df = df.set_index(["time", "symbol", "exchange"]) # Convert the "time" column and additional TSDB-label columns to DataFrame indexes
  > ```

In [11]:
# Prepare metric samples to ingets to the TSDB table
import numpy as np
from datetime import datetime, timedelta

end = datetime.now().replace(minute=0, second=0, microsecond=0)
rng = pd.date_range(end=end, periods=60, freq="300s", tz="EST")
df = pd.DataFrame(np.random.randn(len(rng), 3), index=rng, columns=["cpu", "mem", "disk"])
df = df.cumsum()
print(df.info(), df.head())

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 60 entries, 2019-09-15 10:05:00-05:00 to 2019-09-15 15:00:00-05:00
Freq: 300S
Data columns (total 3 columns):
cpu     60 non-null float64
mem     60 non-null float64
disk    60 non-null float64
dtypes: float64(3)
memory usage: 1.9 KB
None                                 cpu       mem      disk
2019-09-15 10:05:00-05:00  0.057680  0.864139 -0.844771
2019-09-15 10:10:00-05:00  0.174364 -0.566146 -0.780971
2019-09-15 10:15:00-05:00 -0.380715  1.346382 -1.492667
2019-09-15 10:20:00-05:00 -1.351383  3.514912 -1.476890
2019-09-15 10:25:00-05:00 -1.418901  3.645923 -1.368978


In [12]:
# Ingest data into the TSDB table
client.write(backend="tsdb", table=tsdb_table, dfs=df, labels={"node": "11"})

<a id="frames-tsdb-read"></a>
### Read from the TSDB Table

Use the `read` method of the Frames client with the `tsdb` backend to read data from your TSDB table (i.e., query the database).<br>
You can define the target TSDB table either in the `table` parameter of the `read` method or within the query string set in the optional `query` parameter, as demonstrated in the following example.
When the query includes the target table, the `table` parameter (if set) is ignored.<br>
You can set the optional `multi_index` parameter to `True` to return labels as index columns, as demonstrated in the following example.
By default, only the metric sample-time primary-key attribute is returned as an index column.<br>
See the [Frames documentation](https://github.com/v3io/frames/blob/master/README.md) for more information about the supported parameters of the `read` method for the `tsdb` backend.

In [13]:
# Read time-series aggregates from the TSDB table as a data stream; use concat to assemble the DataFrames
query_str= "select avg(*), max(*), min(*) from '" + tsdb_table + "'"
tsdf = client.read(backend="tsdb", query=query_str, step="60m", start="now-7d", end="now", multi_index=True)
tsdf.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,avg(cpu),max(cpu),min(cpu),avg(mem),max(mem),min(mem),avg(disk),max(disk),min(disk)
time,node,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2019-09-15 14:33:38,11,-0.415523,0.425815,-1.418901,2.148689,4.086927,-0.566146,-1.109473,-0.692562,-1.492667


<a id="frames-tsdb-delete"></a>
### Delete the TSDB Table

Use the `delete` method of the Frames client with the `tsdb` backend to delete the TSDB table that was used in the previous steps.

In [14]:
client.delete("tsdb", tsdb_table)

<a id='frames-stream'></a>
## Working with Streams (stream Backend)

The platform supports streams that have an AWS Kinesis-like API. For more information, see the [platform documentation](https://www.iguazio.com/docs/concepts/latest-release/streams/).
<br>
This section demonstrates how to use the `streams` Frames backend to work with streams in the platform.

- [Initialization](#frames-stream-init)
- [Create a Stream](#frames-stream-create)
- [Write to the Stream](#frames-stream-write)
  - [Use the Write Method to Perform a Batch Update](#frames-stream-write-batch-update)
  - [Use the Execute Method's Put Command to Update a Single Record](#frames-stream-execute-put)
- [Read from the Stream](#frames-stream-read)
- [Delete the Stream](#frames-tsdb-delete)

<a id="frames-stream-init"></a>
### Initialization

Start out by defining a stream-path variable that will be used in the tutorial's code examples.<br>
The stream path (`strm`) is relative to the configured parent data container; see [Create a Stream](#frames-stream-create).

In [15]:
# Relative path to the stream within the parent platform data container
strm = os.path.join(os.getenv("V3IO_USERNAME") + "/examples/somestream")

<a id="frames-stream-create"></a>
### Create a Stream

Use the `create` method of the Frames client with the `stream` backend to create a new data stream.<br>
The mandatory `table` parameter specifies the relative stream path within the data container that was configured for the Frames client (see the [main initialization](#frames-init) step).
In the following example, the relative stream path is set by using the `strm` variable that was defined in the [stream backend initialization](#frames-stream-init) step.<br>
You can optionally use the `attrs` parameter to provide additional arguments.
For example, you can set the `shards` argument to the number of shards in the stream, or you can set the `retention_hours` argument to the stream's retention period in hours.

In [16]:
# Create a new stream
client.create(backend="stream", table=strm, attrs={"retention_hours": 48, "shards": 1})

<a id="frames-stream-write"></a>
### Write to the Stream

You can use either of the following methods to ingest data into your stream:

- [Use the Write Method to Perform a Batch Update](#frames-stream-write-batch-update)
- [Use the Execute Method's Put Command to Update a Single Record](#frames-stream-execute-put)

<a id="frames-stream-write-batch-update"></a>
#### Use the Write Method to Perform a Batch Update

Use the `write` method of the Frames client with the `stream` backend to ingest multiple records into your stream (batch update), as demonstrated in the following example.<br>
The `dfs` parameter can be set either to a single DataFrame (as done in the following example) or to multiple DataFrames &mdash; either as a DataFrames iterator or as a list of DataFrames.

In [17]:
# Prepare the ingestion data
import numpy as np
from datetime import datetime, timedelta

end = datetime.now().replace(minute=0, second=0, microsecond=0)
rng = pd.date_range(end=end, periods=60, freq="300s", tz="Israel")
df = pd.DataFrame(np.random.randn(len(rng), 3), index=rng, columns=["cpu", "mem", "disk"])

# Ingest data into the stream
client.write("stream", table=strm, dfs=df)

<a id="frames-stream-execute-put"></a>
#### Use the Execute Method's Put Command to Update a Single Record

Use the `put` command of the `execute` method of the Frames client with the `stream` backend to add a single record to a stream.<br>
Use the `args` parameter of the `put` command to provide the necessary information:
set the mandatory `data` argument to the ingested record data.
You can optionally set the `clientinfo` argument to additional metadata and the `partition` argument to a partition key; records with the same partition key are assigned to the same shard.

In [18]:
client.execute("stream", strm, "put", args={'data': "abcd", "clientinfo": "123"})

<a id="frames-stream-read"></a>
### Read from the Stream

Use the `read` method of the Frames client with the `stream` backend to read data from your stream.<br>
The mandatory `seek` parameter specifies the seek method, which determines the location within the target stream shard from which to read; some methods require setting additional parameters:

- `"earliest"` &mdash; start from the earliest point in the shard; (no additional parameters).
- `"latest"` &mdash; start from the latest location in the shard (i.e., consume only new records).
- `"time"` &mdash; start from a specific point in time, as specified in the `start` parameter (for example, `start="now-1d"`).
- `"sequence"` &mdash; start from a specific record sequence number, as specified in the `sequence` parameter (for example, `sequence=45`).

The `read` method can return a single DataFrame (default) or a DataFrames iterator (a stream) if the `iterator` parameter is set to `True`, as demonstrated in the following example.

In [19]:
# Read from the from the earliest available location (seek="earliest") in the first stream shard (shard_id=0);
# return the result as a DataFrames iterator (iterator=True) and iterate and print the returned data
dfs = client.read("stream", strm, seek="earliest", shard_id="0", iterator=True)
for df in dfs:
    print(df.head(4))

                 cpu      disk               index-0       mem raw_data  \
seq_number                                                                
1           0.617594 -0.904056  2019-09-15T07:05:00Z -0.100935            
2          -1.161206  0.305356  2019-09-15T07:10:00Z -0.010497            
3          -0.177504 -0.941991  2019-09-15T07:15:00Z  1.214677            
4           1.234641  0.279839  2019-09-15T07:20:00Z -0.521239            

                             stream_time  
seq_number                                
1          2019-09-15 15:30:07.145829941  
2          2019-09-15 15:30:07.145829941  
3          2019-09-15 15:30:07.145829941  
4          2019-09-15 15:30:07.145829941  


<a id="frames-tsdb-stream"></a>
### Delete the Stream

Use the `delete` method of the Frames client with the `stream` backend to delete the TSDB table that was used in the previous steps.

In [20]:
client.delete("stream", strm)

<a id="frames-cleanup"></a>
## Cleanup

You can optionally delete any of the directories or files that you created.
See the instructions in the [Creating and Deleting Container Directories](https://www.iguazio.com/docs/tutorials/latest-release/getting-started/containers/#create-delete-container-dirs) tutorial.
For example, the following code uses a local file-system command to delete the entire **&lt;running user&gt;/examples/** directory in the "users" container.
Edit the path, as needed, then remove the comment mark (`#`) and run the code.

In [None]:
#!rm -rf /User/examples/