# Unit H
# Key Value Database Model

- Examples From Video Lecture 

In [1]:
import pyspark
from pyspark.sql import SparkSession
# REDIS CONFIGURATION
redis_host = "redis"
redis_port = "6379"
spark = SparkSession \
    .builder \
    .master("local") \
    .appName('jupyter-pyspark') \
      .config("spark.redis.host", redis_host)\
      .config("spark.redis.port", redis_port)\
      .config("spark.jars.packages","com.redislabs:spark-redis_2.12:3.1.0")\
    .getOrCreate()
sc = spark.sparkContext
sc.setLogLevel("ERROR")

## Redis Commands

From the redis client `docker-compose exec redis redis-cli`

### Basics after connecting

```
KEYS *

SET student:1 john
SET student:2 { "name": "mary", "gpa", 4.0}

GET student:2

KEYS student:*

```

### Redis Strings

```
Set name mike
Get name
Set name “Michael fudge”
Get name 
Mset name “mike age 47

# namespacing is better
Set user:name mike
Set user:age 47
Keys user:*

# Example of how this might be used in page caching

set page:/about "This is the about page“

set page:/contact "This is the HTML of the contact us page“

# find all keys in the page: namespace

Keys page:*

# get two pages at a time

mget page:/about page:/contact

# help page is not cached

exists page:/help

# set this page to expire in 10 seconds

 set page:/help "This is HTML for the help page" EX 10

# get the page – repeat until it expires

Get page:/help

# explicitly delete a key

Del  page:/about

```


### Redis Hashes

```
hset session:mafudge name "Michael Fudge“
hset session:mafudge credit_limit 500
hmset session:mafudge email "mafudge@syr.edu“ twitter “@mafudge”

# get this user’s email
hget session:mafudge email
hmget session:mafudge name email twitter facebook

# check if fields exist
hexists session:mafudge email

hexists session:mafudge last_login

# get all field and values
Hgetall session:mafudge

```

### Redis Lists

```
# simple list example 
Lpush shopping apple
Lpush shopping pear 

Lrange shopping 0 – 1

# add to the end 
Rpush shopping orange

Lindex shopping 1


### work queue example. Perfect example when the work to be done cannot happen immediately either 

lpush app:sms "5551234|First TXT Message!“

lpush app:sms "5551235|First TXT Message!“

lpush app:sms "5551236|First TXT Message!“

Lrange app:sms 0 -1

# processing
rpop app:sms

Rpop app:sms (another client)

# here come more messages

lpush app:sms "5551234|Second TXT Message!“

lpush app:sms "5551235|Second TXT Message!“

lpush app:sms "5551236|Second TXT Message!“

Lrange app:sms 0 -1

Rpop app:sms

```

### Redis Ordered Sets

```
# sales leaderboard

# add sales
zadd app:leaderboard 100 dave 120 sally 90 bill 200 George

# view leaderboard
zrange app:leaderboard 0 -1
zrange app:leaderboard 0 -1 withscores

# sally sold 100 more
zadd app:leaderboard incr 100 sally

#out of last place
zrange app:leaderboard 0 -1 withscores

# give bill 100
zincrby app:leaderboard 100 bill
zrange app:leaderboard 0 -1 withscores


#fix the error bill had 240
Zadd app:leaderboard 240 bill
zrange app:leaderboard 0 -1 withscores

# get the index of George
Zrank app:leaderboard George

#get the score of George 
Zscore app:leaderboard George
```


### Retwis

- sign up for retwis and make 2 tweets
- sign up as someone else
    - follow the first person
    - make another tweet
- Check out redis as see what's happening
    - list of keys to track keys!
    -
    
### Redis Streams Pub/Sub

```
Need two windows for this

#window1 

Subscribe chat

#windows 2
Publish chat “hi”

Publish chat “are you there?”

# window 1 stop
subscribe app:leaderboard:update


# windows 2
zadd app:leaderboard incr 5 bill
publish app:leaderboard:update "update"

# When you see updatre
zrange app:leaderboard 0 -1 withscores






```


## Redis with Spark

- Spark supports Redis through the package: `com.redislabs:spark-redis`
- Hash support only – each hash is a row in a table, the hashes under the namespace are common values
- Data Written by Spark has a special `_spark` key in Redis to store the metadata / schema.
- Data Read from Redis without a schema must include a `schema()` or `"infer.schema"` in Spark
- Docs:  https://github.com/RedisLabs/spark-redis/blob/master/doc/dataframe.md 


In [2]:
# Read in Stocks
df = spark.read.option("multiline","true").json("/home/jovyan/datasets/json-samples/stocks.json")
df.toPandas()

Unnamed: 0,price,symbol
0,126.82,AAPL
1,3098.12,AMZN
2,251.11,FB
3,1725.05,GOOG
4,128.39,IBM
5,212.55,MSFT
6,78.0,NET
7,497.0,NFLX
8,823.8,TSLA
9,45.11,TWTR


In [3]:
# Write stocks to redis
df.write.format("org.apache.spark.sql.redis")\
  .mode("overwrite")\
  .option("table", "stocks")\
  .option("key.column","symbol")\
  .save()

### Check out the Redis client

- we have a `_spark` key to hold the `stocks` schema.
- There are as bunch of keys `stock:symbol` 
- Under each key is a hash!


### Loading Redis data into Spark without a schema.

- Example: Let's create some students:

```
HMSET student:mafudge name Mike gpa 3.0
HMSET student:leferger name Laurie gpa 4.0
HMSET student:ccaicedo name Carlos gpa 3.8
HMSET student:dlnosky name Deb gpa 2.6
```


In [None]:
# read this data in as a Spark DataFrame - ONLY WORKS for Hashes!
students = spark.read.format("org.apache.spark.sql.redis")\
  .option("keys.pattern", "student:*")\
  .option("key.column", "netid")\
  .option("infer.schema",True) \
  .load()
students.toPandas()

In [5]:
# read posts from retwis
retwis_posts = spark.read.format("org.apache.spark.sql.redis")\
  .option("keys.pattern", "post:*")\
  .option("key.column", "key")\
  .option("infer.schema",True) \
  .load()
retwis_posts.toPandas()

Unnamed: 0,body,user_id,time,key
0,ccheese,2,1734443261,3
1,yo,1,1734443231,1
2,this,1,1734443235,2


In [6]:
# read users
retwis_users =  spark.read.format("org.apache.spark.sql.redis")\
  .option("keys.pattern", "user:*")\
  .option("key.column", "key")\
  .option("infer.schema",True) \
  .load()
retwis_users.toPandas()

Unnamed: 0,food,age,username,key
0,cheese,99.0,tony,3
1,,,chris,2
2,,,mike,1
