# Example to Read / Write to Redis with Spark

Documentation: https://github.com/RedisLabs/spark-redis/

NOTE: Spark dataframe integration is limited to Redis hashes only. No other data structures are supported with Spark dataframes.

In [2]:
import pyspark
from pyspark.sql import SparkSession

In [3]:
# REDIS CONFIGURATION
redis_host = "redis"
redis_port = "6379"

In [4]:
# Spark init
spark = SparkSession \
    .builder \
    .master("local") \
    .appName('jupyter-pyspark') \
      .config("spark.redis.host", redis_host)\
      .config("spark.redis.port", redis_port)\
      .config("spark.jars.packages","com.redislabs:spark-redis_2.12:3.0.0")\
    .getOrCreate()
sc = spark.sparkContext
sc.setLogLevel("ERROR")

Ivy Default Cache set to: /home/jovyan/.ivy2/cache
The jars for the packages stored in: /home/jovyan/.ivy2/jars
com.redislabs#spark-redis_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-be76a4fd-6a40-4f1a-adf2-3e53a625997b;1.0
	confs: [default]


:: loading settings :: url = jar:file:/usr/local/spark-3.1.2-bin-hadoop3.2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml


	found com.redislabs#spark-redis_2.12;3.0.0 in central
	found org.apache.commons#commons-pool2;2.0 in central
	found redis.clients#jedis;3.4.1 in central
	found org.slf4j#slf4j-api;1.7.30 in central
downloading https://repo1.maven.org/maven2/com/redislabs/spark-redis_2.12/3.0.0/spark-redis_2.12-3.0.0.jar ...
	[SUCCESSFUL ] com.redislabs#spark-redis_2.12;3.0.0!spark-redis_2.12.jar (50ms)
downloading https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.0/commons-pool2-2.0.jar ...
	[SUCCESSFUL ] org.apache.commons#commons-pool2;2.0!commons-pool2.jar (22ms)
downloading https://repo1.maven.org/maven2/redis/clients/jedis/3.4.1/jedis-3.4.1.jar ...
	[SUCCESSFUL ] redis.clients#jedis;3.4.1!jedis.jar (46ms)
downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.30/slf4j-api-1.7.30.jar ...
	[SUCCESSFUL ] org.slf4j#slf4j-api;1.7.30!slf4j-api.jar (18ms)
:: resolution report :: resolve 1457ms :: artifacts dl 140ms
	:: modules in use:
	com.redislabs#spark-redis_2.12;3.0.0 fro

In [5]:
# read local data
df = spark.read.option("multiline","true").json("/home/jovyan/datasets/json-samples/stocks.json")
df.toPandas()

Unnamed: 0,price,symbol
0,126.82,AAPL
1,3098.12,AMZN
2,251.11,FB
3,1725.05,GOOG
4,128.39,IBM
5,212.55,MSFT
6,78.0,NET
7,497.0,NFLX
8,823.8,TSLA
9,45.11,TWTR


In [6]:
# Write to back to redis as a hash under the following key stocks
df.write.format("org.apache.spark.sql.redis")\
  .mode("overwrite")\
  .option("table", "stocks")\
  .option("key.column","symbol")\
  .save()

In [9]:
# read back from Redis!
df1 = spark.read.format("org.apache.spark.sql.redis")\
  .option("table", "stocks")\
  .option("key.column", "symbol")\
  .load()
df1.toPandas()

Unnamed: 0,price,symbol
0,212.55,MSFT
1,1725.05,GOOG
2,823.8,TSLA
3,497.0,NFLX
4,3098.12,AMZN
5,78.0,NET
6,126.82,AAPL
7,128.39,IBM
8,45.11,TWTR
9,251.11,FB
