# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">

<span style="font-width:bold; font-size: 1.4rem;">This notebook creates a data stream using Hopsworks Internal Kafka</span>

## 🗒️ This notebook is divided into the following sections:

1. Creating Simulated Data
2. Creating Kafka Topic and Schema in Hopsworks Feature Store
3. Sending Data to Kafka

## <span style='color:#ff5f27'> 📝 Imports

In [1]:
!pip install faker --quiet

[0m

In [10]:
import pandas as pd
from synthetic_data import synthetic_data
from confluent_kafka import Producer

## <span style="color:#ff5f27;"> ✏️ Creating Simulated Data </span>

In [11]:
data_simulater = synthetic_data.synthetic_data()

credit_cards, trans_df = data_simulater.create_simulated_transactions()

## <span style="color:#ff5f27;"> 📡 Connecting to Hopsworks Feature Store </span>

In [12]:
import hopsworks

project = hopsworks.login()

kafka_api = project.get_kafka_api()

Connection closed.
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://staging.cloud.hopsworks.ai/p/124


## <span style="color:#ff5f27;"> ⚙️ Kafka Topic and Schema Creation </span>

In [13]:
# create kafka topic
KAFKA_TOPIC_NAME = "transactions_topic"
SCHEMA_NAME = "transactions_schema"

In [14]:
schema = {
    "type": "record",
    "name": SCHEMA_NAME,
    "namespace": "io.hops.examples.pyspark.example",
    "fields": [
        {
            "name": "tid",
            "type": [
                "null",
                "string"
            ]
        },
        {
            "name": "datetime",
            "type": [
                "null",
                {
                    "type": "long",
                    "logicalType": "timestamp-micros"
                }
            ]
        },
        {
            "name": "cc_num",
            "type": [
                "null",
                "long"
            ]
        },
        {
            "name": "category",
            "type": [
                "null",
                "string"
            ]
        },
        {
            "name": "amount",
            "type": [
                "null",
                "double"
            ]
        },
        {
            "name": "latitude",
            "type": [
                "null",
                "double"
            ]
        },
        {
            "name": "longitude",
            "type": [
                "null",
                "double"
            ]
        },
        {
            "name": "city",
            "type": [
                "null",
                "string"
            ]
        },
        {
            "name": "country",
            "type": [
                "null",
                "string"
            ]
        },
        {
            "name": "fraud_label",
            "type": [
                "null",
                "string"
            ]
        },
    ]
}

In [15]:
if KAFKA_TOPIC_NAME not in [topic.name for topic in kafka_api.get_topics()]:
    kafka_api.create_schema(SCHEMA_NAME, schema)
    kafka_api.create_topic(KAFKA_TOPIC_NAME, SCHEMA_NAME, 1, replicas=1, partitions=1)

## <span style="color:#ff5f27;"> 📡 Sending Data using created Kafka Topic </span>

In [16]:
trans_df["tid"] = trans_df["tid"].astype("string")
trans_df["datetime"] = pd.to_datetime(trans_df["datetime"])
trans_df["cc_num"] = trans_df["cc_num"].astype("int64")
trans_df["category"] = trans_df["cc_num"].astype("string")
trans_df["amount"] = trans_df["amount"].astype("double")
trans_df["latitude"] = trans_df["latitude"].astype("double")
trans_df["longitude"] = trans_df["longitude"].astype("double")
trans_df["city"] = trans_df["city"].astype("string")
trans_df["country"] = trans_df["country"].astype("string")
trans_df["fraud_label"] = trans_df["fraud_label"].astype("string")

In [None]:
kafka_config = kafka_api.get_default_config()

producer = Producer(kafka_config)

for index, transaction in trans_df.iterrows():
    producer.produce(KAFKA_TOPIC_NAME, transaction.to_json())
    producer.flush()

---
## <span style="color:#ff5f27;">⏭️ **Next:** Part 01: Feature Pipeline</span>

In the following notebook you will use the created Kafka stream to insert data into a Feature Group