# Sending data to an Amazon Kinesis Stream

In this notebook, we will be sending streaming data from Finnhub.io into an Amazon Kinesis Stream.

Run the following cell to ensure the package websocket_client is installed.

In [1]:
!~/miniconda/bin/pip install websocket_client



In [2]:
import boto3
import botocore
from datetime import datetime, timedelta
import json
import websocket
import random

### Parameters
Run the following cell after updating parameters as necessary. You must change the api key since this one is not valid.

In [3]:
api = 'cdgtatqad3i2r375ghfgcdgtatqad3i2r375ghg0' # *********ENTER YOUR OWN API KEY HERE***************
region_name = 'us-east-1' # Make sure this is the same region as where you are located
stream_name='pyspark-kinesis-test'

## Creating a stream

In the following cells we are creating a stream on Amazon Kinesis. First, we make a connection to the kinesis service through the boto3 package. In the next cell, we have chosen to use one shard for this stream. A shard is a measure of how much data will be flowing into the stream and queried from the stream. Each shard can handle reads up to 2 MB per second as well as up to 1,000 writes per second. **Leave the code as is and execute the following two cells**

In [4]:
client = boto3.client('kinesis', region_name = region_name)

In [5]:
try:
    client.create_stream(
        StreamName=stream_name,
        ShardCount=1)
except client.exceptions.ResourceInUseException: # skip an error if the stream name already exists
    pass

In the following cells we are defining the functions to process the messages from finnhub.io and "putting" each record into an Amazon Kinesis Stream. The code includes the details of how to send a "hello" message to the websocket, how to react each time a record gets sent (and what to do if there is an error), and what to do when the websocket closes. **Leave the code as is and execute the following two cells**

In [6]:
def process_trade_record(dict_in):
    """ Process an individual dictionary record """
    dict_out = {}
    
    try:
        dict_out['symbol'] = dict_in['s']
        dict_out['price_last'] = dict_in['p']
        dict_out['trade_dt'] = datetime.utcfromtimestamp(dict_in['t']/1000).strftime("%m/%d/%Y, %H:%M:%S.%f")
        dict_out['volume'] = dict_in['v']
        dict_out['conditions'] = dict_in['c']
    except:
        print("malformed entry")
        return None
    return dict_out

def process_message(message):
    """ Process batch of data from finnhub """
    
    # convert string to json
    message = json.loads(message)
    
    # check type of message coming in
    if message['type'] not in ['trade','ping']:
        print(f'new message type: {message}')
        raise TypeError("System does not know how to deal with this type")
    
    # process each record
    data_list = [process_trade_record(data_i) for data_i in message['data']]
    data_list = [x for x in data_list if x is not None]
    
    # push each record to AWS Kinesis stream
    result = [client.put_record(
            StreamName=stream_name,
            Data=json.dumps(x),
            PartitionKey='part_key') for x in data_list]
    

def on_message(ws, message):
    """ Function on each websocket message """
    process_message(message)
    dt_now = datetime.now()
    if random.choices([True,False], weights = [0.1,0.9]): print(message)

def on_error(wsapp, err):
    """ Function on websocket error """
    print("Websocket error: ", err)

def on_close(ws, close_status_code, close_msg):
    """ Function on websdocket close """
    print("on_close args:")
    if close_status_code or close_msg:
        print("close status code: " + str(close_status_code))
        print("close message: " + str(close_msg))

def on_open(ws):
    """ Sending requests to the finnhub websocket """
    ws.send('{"type":"subscribe","symbol":"BINANCE:BTCUSDT"}')

Running this cell will create the websocket to `finnhub.io` and run forever getting us data until we tell it to stop. Once you run this cell then you will start seeing new messages from `finnhub.io`. You will notice that these messages include bitcoin transactions!

In [None]:
websocket.enableTrace(False)
ws = websocket.WebSocketApp(f"wss://ws.finnhub.io?token={api}",
                          on_message = on_message,
                          on_error = on_error,
                          on_close = on_close)
ws.on_open = on_open
ws.run_forever()

{"data":[{"c":null,"p":20870.18,"s":"BINANCE:BTCUSDT","t":1667798980969,"v":0.00072},{"c":null,"p":20870.32,"s":"BINANCE:BTCUSDT","t":1667798980980,"v":0.00242},{"c":null,"p":20870.32,"s":"BINANCE:BTCUSDT","t":1667798980981,"v":0.00758},{"c":null,"p":20870.18,"s":"BINANCE:BTCUSDT","t":1667798981169,"v":0.00167},{"c":null,"p":20870.18,"s":"BINANCE:BTCUSDT","t":1667798981169,"v":0.00239},{"c":null,"p":20870.17,"s":"BINANCE:BTCUSDT","t":1667798981169,"v":0.00021},{"c":null,"p":20870.17,"s":"BINANCE:BTCUSDT","t":1667798981273,"v":0.00539},{"c":null,"p":20870,"s":"BINANCE:BTCUSDT","t":1667798981584,"v":0.00131},{"c":null,"p":20870,"s":"BINANCE:BTCUSDT","t":1667798981584,"v":0.00021},{"c":null,"p":20870.05,"s":"BINANCE:BTCUSDT","t":1667798981828,"v":0.00608}],"type":"trade"}
{"data":[{"c":null,"p":20870.06,"s":"BINANCE:BTCUSDT","t":1667798981915,"v":0.00117},{"c":null,"p":20870.06,"s":"BINANCE:BTCUSDT","t":1667798982325,"v":0.00122},{"c":null,"p":20870.06,"s":"BINANCE:BTCUSDT","t":1667798982

Now let this cell above keep running. It should show `[*]` and continue to print new message received lines. This is the data going into your Amazon Kinesis stream. 

**Go to the consumer notebook to start processing the data!**

# AFTER COMPLETING THE LAB, MAKE SURE YOU DELETE YOUR KINESIS STREAM

In [None]:
response = client.delete_stream(StreamName=stream_name)
print(f"Delete Success?... {response['ResponseMetadata']['HTTPStatusCode'] == 200}\n\n{response}")