## Read JSON Data using Spark Structured Streaming

Let us understand how to read files using Spark Structured Streaming.
* `spark.readStream` exposes several APIs to read data using different file formats.
  * `json`
  * `csv`
  * `parquet`
  * `orc`
* You can check by typing `spark.readStream.` and then by hitting tab.
* We can also pass file format as argument to `spark.readStream.format`.
* Here are the esamples to read the files using `json` file format:
  * Direct API: `spark.readStream.json('/mnt/itv-github-db/streaming/landing/ghactivity')`
  * Using format: `spark.readStream.format('json').load('/mnt/itv-github-db/streaming/landing/ghactivity')`

* When we read `json` files using `spark.readStream`, by default schema will note be inferred.
* The below cell will fail as schema is mandatory for `spark.readStream.json`.

In [0]:
spark.readStream.json('/mnt/itv-github-db/streaming/landing/ghactivity')

* We can set `spark.sql.streaming.schemaInference` to `true` so that the schema can be inferred automatically when we use `spark.readStream.json`.
* However, you should use it caution as the whole data will be read every time to apply the schema.
* Let us go ahead and try reading `json` files after enabling the **schema inference**.

In [0]:
spark.conf.set('spark.sql.streaming.schemaInference', 'true')

In [0]:
ghactivity_df = spark.readStream.json('/mnt/itv-github-db/streaming/landing/ghactivity')

In [0]:
ghactivity_df.isStreaming

In [0]:
ghactivity_df.printSchema()

> Keep in mind that, we typically do not infer schema as the compute will be wasted to scan the data for the purpose of inferring the Schema. Instead we apply schema.