# How to load json typed data from socket?


There is pyspark.sql.streaming.DataStreamReader.json
<a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.streaming.DataStreamReader.json.html#pyspark.sql.streaming.DataStreamReader.json">(reference).</a>
However, It could only read from a file stored in a directory. Not Socket until I found at least.

In [None]:
from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Load Streaming Data") \
    .getOrCreate()

## 1. load static data from a sample data

In [None]:
from pyspark.sql.types import StructType
schema = spark.read.json("./sample.json").schema
schema

## 2. load Data using Socket

In [None]:
socketDF = spark \
    .readStream \
    .format("socket") \
    .option("host", "localhost") \
    .option("port", 9999) \
    .load() 

## 3. Transfom one column DataFrame with JSON STRING to multiple columns DataFrame

<a href="https://spark.apache.org/docs/3.2.0/sql-ref-functions-builtin.html#json-functions"> reference about json_tpule()</a>

In [None]:
from pyspark.sql.functions import json_tuple
from pyspark.sql.functions import col

df = socketDF.select(json_tuple("value", *schema.names)) \
    .toDF(*schema.names)

# Debuging through write Stream 

In [None]:
launch = df \
    .writeStream \
    .outputMode("append") \
    .queryName("df") \
    .format("memory") \
    .start()

In [None]:
# you can confirm result each time with query name of "df"
spark.sql("select * from socketDF").show()

In [None]:
# - you can stop connection while running Streaming with stop()
# launch.stop()