# Ingest Real-Time Stock Data to Iguazio NoSQL and Time-series DB
the following example function ingest real-time stock information from an internet service (Yahoo finance api) into iguazio platform.<br>
everytime the data is updated it updates a NoSQL table with the recent metadata and updates the time-series DB with the new metrics (price and volume)

The same code can run inside a nuclio (serverless) function and be automatically triggered on a predefined schedule (cron) or through HTTP requests<br>

the example demonstrate the use of `%nuclio` magic commands to specify environment variables, package dependencies,<br>configurations (such as the cron schedule), and to deploy functions automatically onto a cluster.

In [4]:
# if the nuclio-jupyter package is not installed run !pip install nuclio-jupyter
import nuclio 
import os

## Environment

copy the local credentials to the nuclio function config (-c option doesn't initialize locally)

In [18]:
%nuclio env -c V3IO_ACCESS_KEY=${V3IO_ACCESS_KEY}
%nuclio env -c V3IO_USERNAME=${V3IO_USERNAME}
%nuclio env -c V3IO_API=${V3IO_API}

### Set function configuration 
use a cron trigger with 5min interval and define the base image<br>
for more details check [nuclio function configuration reference](https://github.com/nuclio/nuclio/blob/master/docs/reference/function-configuration/function-configuration-reference.md)

In [19]:
%%nuclio config 
kind = "nuclio"
spec.build.baseImage = "mlrun/ml-models"

%nuclio: setting kind to 'nuclio'
%nuclio: setting spec.build.baseImage to 'mlrun/ml-models'


### Install required packages
`%nuclio cmd` allows you to run image build instructions and install packages<br>
Note: `-c` option will only install in nuclio, not locally

In [20]:
%%nuclio cmd -c
pip install lxml
pip install yfinance
pip install requests
pip install v3io_frames

In [54]:
!pip install pandas==1.2.3

Collecting pandas==1.2.3
  Downloading pandas-1.2.3-cp37-cp37m-manylinux1_x86_64.whl (9.9 MB)
[K     |████████████████████████████████| 9.9 MB 5.2 MB/s eta 0:00:01
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 1.0.3
    Uninstalling pandas-1.0.3:
      Successfully uninstalled pandas-1.0.3
Successfully installed pandas-1.2.3


## Nuclio function implementation
this function can run in Jupyter or in nuclio (real-time serverless)

In [6]:
# nuclio: start-code

In [13]:
import yfinance as yf
import os
import pandas as pd
import v3io_frames as v3f
import ast
import mlrun.feature_store as fs
import mlrun

In [14]:
def contruct_dataframe(all_records):
    temp_df = pd.DataFrame(all_records)
    # When a column type is timestamp, there cant be any duplicates in that column, so changing to STR
    temp_df.last_updated = temp_df.last_updated.astype("str")
    return temp_df

In [15]:
def update_tickers(context, perdiod, interval):
    all_records=[]
    stocks_df = pd.DataFrame()
    for sym in context.stock_syms:
        hist = yf.Ticker(sym).history(period=perdiod, interval='1m')
        time = hist.index[len(hist) - 1]
        record = hist.loc[time]
        last = context.last_trade_times.get(sym)
        context.logger.info(f'Received {sym} data from yfinance, including {len(hist)} candles ending at {last}')
        # update the stocks table and TSDB metrics in case of new data 
        if not last or time > last:
            
            # update NoSQL table with stock data
            stock = {'symbol': sym, 'price': record['Close'], 'volume': record['Volume'], 'last_updated': time}
            all_records.append(stock)
            expr = context.expr_template.format(**stock)
            context.logger.debug_with('update expression', symbol=sym, expr=expr)
            context.v3c.execute('kv', context.stocks_kv_table, 'update', args={'key': sym, 'expression': expr})
         
            context.logger.info(f'Updated records from {last} to {time}')
            # update time-series DB with price and volume metrics (use pandas dataframe with a single row, indexed by date)
            context.last_trade_times[sym] = time
            hist['symbol'] = sym
            hist = hist.reset_index()
            hist = hist.set_index(['Datetime', 'symbol'])
            hist = hist.loc[:, ['Close', 'Volume']]
            hist = hist.rename(columns={'Close': 'price', 'Volume': 'volume'})
            stocks_df = stocks_df.append(hist)
            context.logger.info(f'Added records {hist.shape[0]} records for {sym} to history')
        else:
            context.logger.info(f'No update was made, current TS: {last} vs. new data {time}')
    
    # inferring KV to create a scheme
    context.v3c.execute("kv",table = context.stocks_kv_table, command = "infer")
    # Writing to Feature Store only if new records are available
    if(len(all_records) != 0):
        stock_info = contruct_dataframe(all_records)
        context.logger.info(f"Writing new dataframe with shape {stock_info.shape} to feature store")
        fs.ingest(context.stock_info_feature_set, stock_info, infer_options=fs.InferOptions.default())
        
    # write price and volume metrics to the Time-Series DB, add exchange label
    if stocks_df.shape[0]>0:
        stocks_df = stocks_df.sort_index(level=0)
        context.logger.debug_with('writing data to TSDB', stocks=stocks_df)
        stocks_df.to_csv('history.csv')
        context.v3c.write(backend='tsdb', table=context.stocks_tsdb_table, dfs=stocks_df)

In [16]:
def init_context(context):
    context.logger.info("init stocks reader context")
    setattr(context, 'PROJECT_NAME', os.getenv('PROJECT_NAME', 'stocks-' + os.getenv('V3IO_USERNAME')))
    mlrun.set_environment(project = context.PROJECT_NAME)
    # Setup V3IO Client
    setattr(context,"V3IO_FRAMESD", os.getenv("V3IO_FRAMESD",'framesd:8081'))
    client = v3f.Client(context.V3IO_FRAMESD, container=os.getenv('V3IO_CONTAINER', 'users'))
    setattr(context, 'v3c', client)
    
    # Create V3IO Tables and add reference to context
    setattr(context, 'stocks_kv_table', os.getenv('STOCKS_KV_TABLE', os.getenv('V3IO_USERNAME') + '/stocks/stocks_kv'))
    setattr(context, 'stocks_tsdb_table', os.getenv('STOCKS_TSDB_TABLE', os.getenv('V3IO_USERNAME') + '/stocks/stocks_tsdb'))
    context.v3c.create(backend='tsdb', table=context.stocks_tsdb_table, rate='1/m', if_exists=1)
    
    # Supply the feature set to ingest data to.
    stocks_info_set = fs.FeatureSet("stocks", entities=[fs.Entity("symbol")])
    setattr(context,'stock_info_feature_set',stocks_info_set)
    
    # Adding aggregations
    context.stock_info_feature_set.add_aggregation("prices","price",["min","max"],["1h"],"10m")
    context.stock_info_feature_set.add_aggregation("volumes","volume",["min","max"],["1h"],"10m")
    
    # Initiazling featureset with dummy data that will be overtwritten later on
    stock_dummy = pd.DataFrame({"symbol":['GOOGL','MSFT','AMZN','AAPL','INTC'],"price":[0,0,0,0,0],"volume":[0,0,0,0,0],"last_updated":[0,0,0,0,0]})
    fs.ingest(context.stock_info_feature_set, stock_dummy, infer_options=fs.InferOptions.default())
    
    stocks = os.getenv('STOCK_LIST','GOOGL,MSFT,AMZN,AAPL,INTC')
    if stocks.startswith('['):
        stock_syms = ast.literal_eval(stocks)
    else: 
        stock_syms = stocks.split(',')
    setattr(context, 'stock_syms', stock_syms)
    

    # v3io update expression template 
    expr_template = os.getenv('EXPRESSION_TEMPLATE', "symbol='{symbol}';price={price};volume={volume};last_updated='{last_updated}'")
    setattr(context, 'expr_template', expr_template)

    last_trade_times = {}
    setattr(context, 'last_trade_times', last_trade_times)
    
    # Run first initial data preperation
    update_tickers(context, '7d', '1m')
    

In [17]:
def handler(context):
    update_tickers(context, '5m', '1m')
    return 'done'

In [18]:
# nuclio: end-code

## Function invocation
### Local test
the following section simulates nuclio function invocation and will emit the function results

In [None]:
# create a test event and invoke the function locally 
init_context(context)
event = nuclio.Event(body='')
handler(context, event)

## Deploy to cluster

In [7]:
from mlrun import code_to_function
project_name = "stocks-" + os.getenv('V3IO_USERNAME')
# Export bare function
fn = code_to_function('read-stocks',
                      handler='handler')
fn.export('01-read-stocks.yaml')

# Set parameters for current deployment
fn.add_trigger('cron', nuclio.triggers.CronTrigger('300s'))
fn.set_envs({'STOCK_LIST': ['GOOG', 'MSFT', 'AMZN', 'AAPL', 'INTC'],
             'V3IO_CONTAINER': 'users' ,
             'STOCKS_TSDB_TABLE': os.getenv('V3IO_USERNAME')  + 'stocks/stocks_tsdb',
             'STOCKS_KV_TABLE': os.getenv('V3IO_USERNAME')  + 'stocks/stocks_kv',
             'EXPRESSION_TEMPLATE': "symbol='{symbol}';price={price};volume={volume};last_updated='{last_updated}';sentiment='NI';last_reaction='NI'",
             'PROJECT_NAME' : project_name})
fn.spec.max_replicas = 1

> 2021-03-25 11:00:46,655 [info] function spec saved to path: 01-read-stocks.yaml


In [8]:
addr = fn.deploy(project=project_name)

> 2021-03-25 11:00:56,616 [info] Starting remote function deploy
2021-03-25 11:00:56  (info) Deploying function
2021-03-25 11:00:56  (info) Building
2021-03-25 11:00:57  (info) Staging files and preparing base images
2021-03-25 11:00:57  (info) Building processor image
2021-03-25 11:00:58  (info) Build complete
2021-03-25 11:01:21  (info) Function deploy complete
> 2021-03-25 11:01:21,724 [info] function deployed, address=default-tenant.app.dev8.lab.iguazeng.com:31840


In [10]:
!curl {addr}

done