# Working with Kafka data

During a simulation, the producer and the marketplace are constantly logging sales and the activity on the market to Kafka. These information are organised in topics. In order to estimate customer demand and predict good prices, merchants can use the SDK to access this data.

Every data operation other than the raw csv formatting is left to the merchant logic.

## Install requirements

Get the `merchant_sdk` folder, python >= 3.5 and install the dependencies using pip:

```
cd merchant_sdk
pip3 install -r requirements.txt
```

## Import SDK

In [1]:
import sys
sys.path.append('../../')

import merchant_sdk

## Init API

In [2]:
from merchant_sdk.api import KafkaApi, PricewarsRequester

In [3]:
kafka_endpoint = 'http://vm-mpws2016hp1-05.eaalab.hpi.uni-potsdam.de:8001'
kafka_api = KafkaApi(host=kafka_endpoint)

## Request topic

The csv export consists of two steps. First you request the csv export for a topic, then the Kafka Reverse Proxy will start to fetch all messages in that topic and write, format them into a csv format and writes them to disk (for huge logs, the server could run out of memory). On success, it will return a url for the downloadable csv file. Depending on how active the simulation is and how much data is logged, this can take some time.

This url can directly be used with pandas to create a DataFrame. That's why pandas is going to be imported first.

In [4]:
import numpy as np
import pandas as pd

In [5]:
csv_url = kafka_api.request_csv_export_for_topic('buyOffer')
csv_url

'http://vm-mpws2016hp1-05.eaalab.hpi.uni-potsdam.de:8001/data/buyOffer_1487359838.csv'

In [6]:
buy_offer_df = pd.read_csv(csv_url)
buy_offer_df[:5]

Unnamed: 0,amount,consumer_id,http_code,left_in_stock,merchant_id,offer_id,price,product_id,quality,timestamp,uid
0,1,ZJ7HCG6HO1OUqeaLl4FGM3Sj8bT0tsZYdAa/3dUkHVo=,200,1,lsP4d66epeRdGEIB51N3sRN3Gy0R0b8qK+4rxc/EYqM=,7584,6.2,4,1,2017-02-15T07:54:37.534Z,41
1,1,ZJ7HCG6HO1OUqeaLl4FGM3Sj8bT0tsZYdAa/3dUkHVo=,200,0,hPjEe9kUnPadEcs0jO1HLUL5maZPb6umcWgcbCxHzdo=,7578,30.0,3,1,2017-02-15T07:54:38.675Z,31
2,1,ZJ7HCG6HO1OUqeaLl4FGM3Sj8bT0tsZYdAa/3dUkHVo=,200,1,sN7jrROVR1hljMZ5OHSLG6cKTwAxKmqDO0OAtWql7Ms=,7581,38.0,4,4,2017-02-15T07:54:42.731Z,44
3,1,ZJ7HCG6HO1OUqeaLl4FGM3Sj8bT0tsZYdAa/3dUkHVo=,200,0,Is16KAXSx7U7whXnqhRVcJD+JVAFleqAeNdQN4WQoV8=,7600,44.0,1,2,2017-02-15T07:54:44.011Z,12
4,1,ZJ7HCG6HO1OUqeaLl4FGM3Sj8bT0tsZYdAa/3dUkHVo=,200,0,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,7572,24.29,2,3,2017-02-15T07:54:45.355Z,23


In [7]:
len(buy_offer_df)

12605

In [10]:
ms_df = pd.read_csv(kafka_api.request_csv_export_for_topic('marketSituation'))
print(len(ms_df))
ms_df[:5]

7


Unnamed: 0,amount,merchant_id,offer_id,price,prime,product_id,quality,shipping_time_prime,shipping_time_standard,timestamp,triggering_merchant_id,uid
0,1,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,7567,19.53,True,1,3,1,5,2017-02-14T17:23:56.542Z,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,13
1,1,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,7568,9.05,True,1,3,1,5,2017-02-14T17:24:09.233Z,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,13
2,1,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,7569,19.76,True,2,3,1,5,2017-02-14T17:27:10.116Z,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,23
3,2,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,7568,18.42,True,1,3,1,5,2017-02-14T17:27:20.748Z,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,13
4,1,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,7570,17.57,True,1,2,1,5,2017-02-14T17:27:26.648Z,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,12


### as merchant with token

The data is currently available without authentication. A token, however, is optional and reduces the output to the data, a merchant "may be allowed" to see (i.e. market situations from updates from this merchant, its own sales).

In [8]:
PricewarsRequester.add_api_token('2ZnJAUNCcv8l2ILULiCwANo7LGEsHCRJlFdvj18MvG8yYTTtCfqN3fTOuhGCthWf')

In [9]:
buy_offer_df2 = pd.read_csv(kafka_api.request_csv_export_for_topic('buyOffer'))
print(len(buy_offer_df2))
buy_offer_df2[:5]

9


Unnamed: 0,amount,consumer_id,http_code,left_in_stock,merchant_id,offer_id,price,product_id,quality,timestamp,uid
0,1,ZJ7HCG6HO1OUqeaLl4FGM3Sj8bT0tsZYdAa/3dUkHVo=,200,0,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,7572,24.29,2,3,2017-02-15T07:54:45.355Z,23
1,1,x0NLM0ZAinXuUxZczou3nOcY5k59JAcOK6XlBQQxdhs=,200,0,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,7566,38.83,1,1,2017-02-15T07:57:33.331Z,11
2,1,x0NLM0ZAinXuUxZczou3nOcY5k59JAcOK6XlBQQxdhs=,200,0,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,7571,12.53,1,4,2017-02-15T07:57:35.247Z,14
3,1,x0NLM0ZAinXuUxZczou3nOcY5k59JAcOK6XlBQQxdhs=,200,1,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,7568,18.42,1,3,2017-02-15T07:57:39.331Z,13
4,1,x0NLM0ZAinXuUxZczou3nOcY5k59JAcOK6XlBQQxdhs=,200,0,dgOqVxP1nkkncRhIoOTflL2zJ26X1r7xRNcvP6iqlIk=,7567,19.53,1,3,2017-02-15T07:57:40.181Z,13
