# Unit K
# Streaming Databases

- Examples From Video Lecture 


## Starting Confluent Kafka

Lots of services!

```
PS> docker-compose start control-center rest-proxy ksql-datagen ksqldb-server ksqldb-cli zookeeper broker schema-registry
```

Kafka Services:

    - zookeeper
    - broker
    
Confluent KSQL Database Services

    - control-center
    - rest-proxy
    - ksqldb-server
    - ksqldb-cli 
    - schema-registry

## Kakfa
    
### Simple Kafka pub-sub example:

- Open TWO  windows PowerShell terminals
- In each terminal, connect to the broker `docker-compose exec broker bash`
- In terminal 1, setup the producer: `kafka-console-producer --broker-list localhost:9092 --topic test`
- In terminal 2, setup the consumer: `kafka-console-consumer --bootstrap-server localhost:9092 --topic test`
- Type messages in the producer, see them in the consumer!
- If you stop the consumer `CTRL+C`, add more messages then re-start the consumer, you will pick up where you left off!!!
- If you restart the consumer with `--from-beginning` you will see all the messages again!

## kSQL Db

### Streams in KSQL

- First start the ATM stream which can be found in `/work/examples/Kafka-producer.ipynb`
- connect to KSQL `PS> docker-compose exec ksqldb-cli ksql http://ksqldb-server:8088`

```
# Show the topics

KSQL> show topics;

# MAke a stream 

KSQL> CREATE STREAM USERS (user varchar) with (kafka_topic='atm', value_format='json');

# Explain this statement!!!

# RUN A PROGRAM USING THIS STREAM
KSQL> select * from USERS emit changes;

# Discuss what is seen in the output. But don’t explain it all at this time! 
# This is a non-persistent query. Press CTRL + C to break

KSQL> select * from USERS emit changes  limit 3; 

# You only get 3
```

### KSQL: Create Streams

- Streams allow us to tablularize a topic so that we can execute a SELECT over it.

```
CREATE STREAM atmwithdrawls (id varchar key, timestamp bigint,user varchar, amount double, location varchar, status varchar) with (kafka_topic='atm', value_format='json', timestamp='TimeStamp');

# Show output
SELECT * FROM atmwithdrawls EMIT CHANGES limit 5;

# Describe it 
DESCRIBE atmwithdrawls EXTENDED;

# show all streams
SHOW STREAMS;

## Maybe try Drilling it????
```


### KSQL: Persistent Queries

- Persistent queries are SELECT statements that run in the background. 

```
CREATE STREAM big_withdrawls AS select * from atmwithdrawls where amount > 100;

# Look at the stream

SHOW STREAMS;

# View the running query
SHOW QUERIES;

# describe it to get information on what its doing 
DESCRIBE big_withdrawls EXTENDED;

# we can even query the stream
SELECT * FROM big_withdrawls  EMIT CHANGES limit 3;

# stop running 
TERMINATE big_withdrawls;

```

### Aggregates don't work! Enter the persistent Table

```
## must have a group by as this will never yield a result!

Select count(*) from atmwithdrawls;

#This works - what is happening here???

select user, count(*) as wd_count from atmwithdrawls  group by user emit changes;

# let's try to persist this. no can do! SELECT produces a table!

Create stream user_wd_counts as select user, count(*) as wd_count from atmwithdrawls  group by user emit changes;

# Create a table instead
create table wd_by_user as select user, count(*) as wd_count  from atmwithdrawls group by user emit changes;

# query the table!
select * from wd_by_user emit changes;


# Check out what you did from metadata.

show tables;
describe wd_by_user extended;
show queries;
explain <stream>;

```

## KSQL Windows

- Aggregations often don't make sense to run forever, we would like to persist them within windows

```
# Tumbling window: total user withdrawals 30 seconds.

select user, sum(amount) as total_amount, count(*) as transaction_count 
    from atmwithdrawls  window tumbling (size 30 seconds) 
    group by user emit changes;

# Hopping window: activity by location every 15 seconds in 30 second windows

select location, count(*) 
    from atmwithdrawls window hopping (size 30 seconds advance by 15 seconds ) 
    group by location emit changes;
    
# Session window: like tumbling but key dependent. Session starts when the first data arrives for the group key. More than 2 withdrawls in a 5 second window at any location

select location, count(*) 
    from atmwithdrawls window session (5 seconds) 
    group by location 
    having count(*) >= 2 emit changes;
    
```