# Stocks news ingestion

Two options to run stocks demo (with kfpipeline and without).
1. run notebooks `01_ingest_news.ipynb`, `02_ingest_stocks.ipynb`, `03_model_training.ipynb`, `04_model_serving.ipynb`, `06_grafana_view`
2. or `01_ingest_news.ipynb`, `02_ingest_stocks.ipynb`, `05_stocks_pipeline`

> <b> Steps </b>
> * [project creation and prerequisits](#project-creation-and-prerequisits)
> * [Deploying sentiment analysis serving function from the function marketplace](#Deploying-sentiment-analysis-serving-function-from-the-function-marketplace)
> * [Creating a feature set and declaring the graph](#Creating-a-feature-set-and-declaring-the-graph)
> * [Dummy ingestion, Deploying ingestion service and getting ingestion endpoint](#Dummy-ingestion,-Deploying-ingestion-service-and-getting-ingestion-endpoint)
> * [Testing ingestion service](#Testing-ingestion-service)
> * [Creating scheduled mlrun job to invoke our function every time delta](#Creating-scheduled-mlrun-job-to-invoke-our-function-every-time-delta)

## project creation and prerequisits

In [1]:
# install prerequisites
# prerequisites for the notebook is installing 2 packages yfinance yahoo_fin for uploading stocks data
import importlib.util
import IPython

def install_missing_packages(notebook_packages):
    install_flag = False
    for package in notebook_packages:
        spec = importlib.util.find_spec(package)
        if spec is None:
            %pip install {package}
            install_flag = True
        else:     
            print("package {} installed".format(package))
        if install_flag:            
            print ("restarting kernel due to package install")
            IPython.Application.instance().kernel.do_shutdown(True)
# For illustrative purposes.
packages  = ['yfinance', 'yahoo_fin']
install_missing_packages(packages)

package yfinance installed
package yahoo_fin installed


In [3]:
import mlrun
project = mlrun.get_or_create_project(name='stocks',user_project=True, context="src/")

> 2023-02-09 11:04:44,738 [info] loaded project moshe from MLRun DB


In [4]:
NUMBER_OF_STOCKS = 3

## Creating a feature set and declaring the graph

In [7]:
import mlrun.feature_store as fstore

# creating feature set
news_set = fstore.FeatureSet("news",
                                 entities=[fstore.Entity("ticker")],
                                 timestamp_key='Datetime', 
                                 description="stocks news feature set")

# setting up the graph
news_set.graph \
    .to(name='get_news', handler='get_news') \
    .to("storey.steps.Flatten", name="flatten_news") \
    .to(name='wrap_event', handler='wrap_event') \
    .to("HuggingSentimentAnalysis",handler= "get_sentiment", full_event=True)

news_set.set_targets(with_defaults=True) 
#news_set.plot(rankdir="LR", with_targets=True)

## Dummy ingestion, Deploying ingestion service and getting ingestion endpoint

In [None]:
# ingesting dummy (A MUST) 
import os
import datetime
# because were ingesting locally, code must be present !
from src.news import *

name = os.environ['V3IO_USERNAME']
now = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')

fstore.ingest(news_set,
              pd.DataFrame.from_dict({'ticker':[name],
                                      'Datetime': now,
                                      'n_stocks':NUMBER_OF_STOCKS}),
              overwrite=True)

2023-02-09 11:04:45.188514: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
2023-02-09 11:04:51.232808: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were in

prediction: [{'label': 'NEGATIVE', 'score': 0.9396888613700867}]
{'ticker': 'A', 'Datetime': '2023-02-08 23:15:11', 'published': '2023-02-08 23:15:11', 'summary': 'Agilent Technologies A closed at 15442 in the latest trading session marking a 006 move from the prior day', 'title': 'Agilent Technologies A Stock Moves 006 What You Should Know', 'prediction': [0]}
prediction: [{'label': 'POSITIVE', 'score': 0.9984098672866821}]
{'ticker': 'A', 'Datetime': '2023-02-08 14:18:02', 'published': '2023-02-08 14:18:02', 'summary': 'PayPals PYPL fourthquarter results are expected to reflect gains from strength across its robust product and services portfolio', 'title': 'PayPal PYPL Q4 Earnings to Benefit From Portfolio Strength', 'prediction': [1]}
prediction: [{'label': 'POSITIVE', 'score': 0.998507559299469}]
{'ticker': 'A', 'Datetime': '2023-02-08 13:32:01', 'published': '2023-02-08 13:32:01', 'summary': 'Jack Henrys JKHY secondquarter fiscal 2023 results benefit from strength across the Compl

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are 

In [None]:
# Deploying ingestion service
# Define the HTTP Source to_dictable the HTTP trigger on our function and expose the endpoint.
# There is an option to declare key and timestamp inside the http source (here we dont send data through the http hence not needed)
http_source = mlrun.datastore.sources.HttpSource()
news_set.spec.source = http_source

# code_to_function our mlrun wrapped function to deploy the ingestion pipeline on.
# the serving runtimes enables the deployment of our feature set's computational graph
function = mlrun.code_to_function(name='get_news',kind='serving',image='mlrun/mlrun', requirements=['yahoo_fin','graphviz'], filename='src/news.py')

function.spec.readiness_timeout = 3600

run_config = fstore.RunConfig(function=function, local=False).apply(mlrun.mount_v3io())

# Deploying
news_set_endpoint = fstore.deploy_ingestion_service(featureset=news_set, run_config=run_config)

## Testing ingestion service

In [None]:
import requests

now = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')

t = requests.post(news_set_endpoint,json={'ticker':['news'],
                                                 'Datetime': now,
                                                 'n_stocks':NUMBER_OF_STOCKS})
t.text

## Creating scheduled mlrun job to invoke our function every time delta

In [None]:
import datetime

now = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
body = {'ticker':['news'],
        'Datetime': now,
        'n_stocks':4}

# specifying '0 8 * * *' as schedule will trigger the function every day at 08:00 AM
fn = mlrun.code_to_function(name='ingestion_service_news',kind='job',image='mlrun/mlrun',handler='ingestion_service_invoker', filename='src/invoker.py')
fn.run(params={'endpoint':news_set_endpoint, 'body': body}, schedule='0 */1 * * *')

In [None]:
# Deleting the schedule job
mlrun.get_run_db().delete_schedule(project.name,'ingestion-service-news-ingestion_service_invoker')