-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Extracting Data Directly from Files

In this notebook, **you'll learn to extract data directly from files using Spark SQL on Databricks**.

A number of file formats support this option, but it is most useful for self-describing data formats (such as parquet and JSON).

## Learning Objectives
By the end of this lesson, you should be able to:
- Use Spark SQL to directly query data files
- Leverage **`text`** and **`binaryFile`** methods to review raw file contents

## Run Setup

The setup script will create the data and declare necessary values for the rest of this notebook to execute.

In [0]:
%run ../Includes/Classroom-Setup-4.1

## Data Overview

In this example, we'll work with a sample of raw **Kafka** data written as JSON files.

#### `Kafka definition`: 
- Kafka is used to build real-time streaming data pipelines and real-time streaming applications. A data pipeline reliably processes and moves data from one system to another, and a streaming application is an application that consumes streams of data.

**Each file contains all records consumed during a 5-second interval, stored with the full Kafka schema as a multiple-record JSON file.**

| field | type | description |
| --- | --- | --- |
| key | BINARY | The **`user_id`** field is used as the key; this is a unique alphanumeric field that corresponds to session/cookie information |
| value | BINARY | This is the full data payload (to be discussed later), sent as JSON |
| topic | STRING | While the Kafka service hosts multiple topics, only those records from the **`clickstream`** topic are included here |
| partition | INTEGER | Our current Kafka implementation uses only 2 partitions (0 and 1) |
| offset | LONG | This is a unique value, monotonically increasing for each partition |
| timestamp | LONG | This timestamp is recorded as milliseconds since epoch, and represents the time at which the producer appends a record to a partition |

Note that our source directory contains many JSON files.

In [0]:
%python
dataset_path = f"{DA.paths.datasets}/raw/events-kafka"
print(dataset_path)

files = dbutils.fs.ls(dataset_path)
display(files)

path,name,size,modificationTime
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/000.json,000.json,200116,1658828293000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/001.json,001.json,169907,1658828294000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/002.json,002.json,140680,1658828295000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/003.json,003.json,139280,1658828296000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/004.json,004.json,122411,1658828297000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/005.json,005.json,96034,1658828298000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/006.json,006.json,98332,1658828299000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/007.json,007.json,86452,1658828300000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/008.json,008.json,68052,1658828301000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/009.json,009.json,44729,1658828302000


Here, we'll be using relative file paths to data that's been written to the DBFS root. 

Most workflows will require users to access data from external cloud storage locations. 

In most companies, a workspace administrator will be responsible for configuring access to these storage locations.

Instructions for configuring and accessing these locations can be found in the cloud-vendor specific self-paced courses titled "Cloud Architecture & Systems Integrations".

## Query a Single File

To query the data contained in a single file, execute the query with the following pattern:

<strong><code>SELECT * FROM file_format.&#x60;/path/to/file&#x60;</code></strong>

Make special note of the use of back-ticks (not single quotes) around the path.

In [0]:
%sql
SELECT * FROM json.`${da.paths.datasets}/raw/events-kafka/001.json`

key,offset,partition,timestamp,topic,value
VUEwMDAwMDAxMDczODAyOTY=,219246233,0,1593880175268,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImNhcmVlcnMiLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODAwMTM1MDY5NzksImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDE3NTI1Mjk0OSwiZ2VvIjp7ImNpdHkiOiJZdW1hIiwic3RhdGUiOiJDTyJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoiZ29vZ2xlIiwidXNlcl9maXJzdF90b3VjaF90aW1lc3RhbXAiOjE1OTM4Nzg5ODg2MjU5NTEsInVzZXJfaWQiOiJVQTAwMDAwMDEwNzM4MDI5NiJ9
VUEwMDAwMDAxMDczOTEyODU=,219428744,1,1593880175652,clickstream,eyJkZXZpY2UiOiJtYWNPUyIsImVjb21tZXJjZSI6e30sImV2ZW50X25hbWUiOiJtYWluIiwiZXZlbnRfdGltZXN0YW1wIjoxNTkzODgwMTc1NjQ3NDE4LCJnZW8iOnsiY2l0eSI6IlNwcmluZ2ZpZWxkIiwic3RhdGUiOiJNQSJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoiZ29vZ2xlIiwidXNlcl9maXJzdF90b3VjaF90aW1lc3RhbXAiOjE1OTM4ODAxNzU2NDc0MTgsInVzZXJfaWQiOiJVQTAwMDAwMDEwNzM5MTI4NSJ9
VUEwMDAwMDAxMDczOTEyODM=,219428745,1,1593880175674,clickstream,eyJkZXZpY2UiOiJXaW5kb3dzIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6Im1hdHRyZXNzZXMiLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODAxNzU2MzEwMTksImdlbyI6eyJjaXR5IjoiTWFydGluIiwic3RhdGUiOiJUTiJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoiZmFjZWJvb2siLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDE3NTYzMTAxOSwidXNlcl9pZCI6IlVBMDAwMDAwMTA3MzkxMjgzIn0=
VUEwMDAwMDAxMDczOTEzMTE=,219428830,1,1593880178176,clickstream,eyJkZXZpY2UiOiJpT1MiLCJlY29tbWVyY2UiOnt9LCJldmVudF9uYW1lIjoibWF0dHJlc3NlcyIsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDE3ODEzOTQ4NiwiZ2VvIjp7ImNpdHkiOiJSZXlub2xkc2J1cmciLCJzdGF0ZSI6Ik9IIn0sIml0ZW1zIjpbXSwidHJhZmZpY19zb3VyY2UiOiJpbnN0YWdyYW0iLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDE3ODEzOTQ4NiwidXNlcl9pZCI6IlVBMDAwMDAwMTA3MzkxMzExIn0=
VUEwMDAwMDAxMDczOTEyNzk=,219428903,1,1593880174869,clickstream,eyJkZXZpY2UiOiJDaHJvbWUgT1MiLCJlY29tbWVyY2UiOnt9LCJldmVudF9uYW1lIjoibWF0dHJlc3NlcyIsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDE3NDgyMTE1OCwiZ2VvIjp7ImNpdHkiOiJUaG9tYXN2aWxsZSIsInN0YXRlIjoiTkMifSwiaXRlbXMiOltdLCJ0cmFmZmljX3NvdXJjZSI6ImZhY2Vib29rIiwidXNlcl9maXJzdF90b3VjaF90aW1lc3RhbXAiOjE1OTM4ODAxNzQ4MjExNTgsInVzZXJfaWQiOiJVQTAwMDAwMDEwNzM5MTI3OSJ9
VUEwMDAwMDAxMDczOTEzMDA=,219246249,0,1593880177078,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6Im1haW4iLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODAxNzcwNjU0NDMsImdlbyI6eyJjaXR5IjoiTG9zIEFuZ2VsZXMiLCJzdGF0ZSI6IkNBIn0sIml0ZW1zIjpbXSwidHJhZmZpY19zb3VyY2UiOiJnb29nbGUiLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDE3NzA2NTQ0MywidXNlcl9pZCI6IlVBMDAwMDAwMTA3MzkxMzAwIn0=
VUEwMDAwMDAxMDczOTExNDk=,219428668,1,1593880160056,clickstream,eyJkZXZpY2UiOiJpT1MiLCJlY29tbWVyY2UiOnt9LCJldmVudF9uYW1lIjoibWF0dHJlc3NlcyIsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDE2MDAzOTg4OSwiZ2VvIjp7ImNpdHkiOiJEb3duZXkiLCJzdGF0ZSI6IkNBIn0sIml0ZW1zIjpbXSwidHJhZmZpY19zb3VyY2UiOiJkaXJlY3QiLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDE2MDAzOTg4OSwidXNlcl9pZCI6IlVBMDAwMDAwMTA3MzkxMTQ5In0=
VUEwMDAwMDAxMDczODM0Mzc=,219428728,1,1593880163121,clickstream,eyJkZXZpY2UiOiJtYWNPUyIsImVjb21tZXJjZSI6e30sImV2ZW50X25hbWUiOiJjYXJlZXJzIiwiZXZlbnRfcHJldmlvdXNfdGltZXN0YW1wIjoxNTkzODc5NzE3NjgxODEyLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODAxNjMwOTc4MjAsImdlbyI6eyJjaXR5IjoiT3JhbmdlIiwic3RhdGUiOiJDQSJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoieW91dHViZSIsInVzZXJfZmlyc3RfdG91Y2hfdGltZXN0YW1wIjoxNTkzODc5MzI0Nzk1MjcwLCJ1c2VyX2lkIjoiVUEwMDAwMDAxMDczODM0MzcifQ==
VUEwMDAwMDAxMDczODE4NzM=,219428747,1,1593880170127,clickstream,eyJkZXZpY2UiOiJXaW5kb3dzIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6Im9yaWdpbmFsIiwiZXZlbnRfcHJldmlvdXNfdGltZXN0YW1wIjoxNTkzODc5MTU2MTU0ODAxLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODAxNzAxMDc1NjQsImdlbyI6eyJjaXR5IjoiSGF3dGhvcm5lIiwic3RhdGUiOiJDQSJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoiZ29vZ2xlIiwidXNlcl9maXJzdF90b3VjaF90aW1lc3RhbXAiOjE1OTM4NzkxNTYxNTQ4MDEsInVzZXJfaWQiOiJVQTAwMDAwMDEwNzM4MTg3MyJ9
VUEwMDAwMDAxMDczODIyMzM=,219428785,1,1593880176955,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImNhcnQiLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4NzkyMDk1ODMyOTAsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDE3NjkzMjQ1MywiZ2VvIjp7ImNpdHkiOiJOZXcgWW9yayIsInN0YXRlIjoiTlkifSwiaXRlbXMiOlt7Iml0ZW1faWQiOiJNX1NUQU5fRiIsIml0ZW1fbmFtZSI6IlN0YW5kYXJkIEZ1bGwgTWF0dHJlc3MiLCJpdGVtX3JldmVudWVfaW5fdXNkIjo5NDUuMCwicHJpY2VfaW5fdXNkIjo5NDUuMCwicXVhbnRpdHkiOjF9XSwidHJhZmZpY19zb3VyY2UiOiJpbnN0YWdyYW0iLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg3OTE5NTI2MjA0NywidXNlcl9pZCI6IlVBMDAwMDAwMTA3MzgyMjMzIn0=


Note that our preview displays all 321 rows of our source file.

## Query a Directory of Files

Assuming all of the files in a directory have the same format and schema, all files can be queried simultaneously by specifying the directory path rather than an individual file.

In [0]:
%sql
SELECT * FROM json.`${da.paths.datasets}/raw/events-kafka`

key,offset,partition,timestamp,topic,value
VUEwMDAwMDAxMDczOTgwNTQ=,219255030,0,1593880885085,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6Im1haW4iLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODUwMzYxMjksImdlbyI6eyJjaXR5IjoiTmV3IFlvcmsiLCJzdGF0ZSI6Ik5ZIn0sIml0ZW1zIjpbXSwidHJhZmZpY19zb3VyY2UiOiJnb29nbGUiLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDg4NTAzNjEyOSwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk4MDU0In0=
VUEwMDAwMDAxMDczOTI0NTg=,219255043,0,1593880892303,clickstream,eyJkZXZpY2UiOiJpT1MiLCJlY29tbWVyY2UiOnt9LCJldmVudF9uYW1lIjoiYWRkX2l0ZW0iLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODAzMDA2OTY3NTEsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg5MjI1MTMxMCwiZ2VvIjp7ImNpdHkiOiJXZXN0YnJvb2siLCJzdGF0ZSI6Ik1FIn0sIml0ZW1zIjpbeyJpdGVtX2lkIjoiTV9TVEFOX1QiLCJpdGVtX25hbWUiOiJTdGFuZGFyZCBUd2luIE1hdHRyZXNzIiwiaXRlbV9yZXZlbnVlX2luX3VzZCI6NTk1LjAsInByaWNlX2luX3VzZCI6NTk1LjAsInF1YW50aXR5IjoxfV0sInRyYWZmaWNfc291cmNlIjoiZ29vZ2xlIiwidXNlcl9maXJzdF90b3VjaF90aW1lc3RhbXAiOjE1OTM4ODAzMDA2OTY3NTEsInVzZXJfaWQiOiJVQTAwMDAwMDEwNzM5MjQ1OCJ9
VUEwMDAwMDAxMDczOTU5Njg=,219255108,0,1593880889174,clickstream,eyJkZXZpY2UiOiJtYWNPUyIsImVjb21tZXJjZSI6e30sImV2ZW50X25hbWUiOiJwcmVtaXVtIiwiZXZlbnRfcHJldmlvdXNfdGltZXN0YW1wIjoxNTkzODgwODYxMDMwMjQxLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODkxMjY3NzgsImdlbyI6eyJjaXR5IjoiRmlzaGVycyIsInN0YXRlIjoiSU4ifSwiaXRlbXMiOltdLCJ0cmFmZmljX3NvdXJjZSI6InlvdXR1YmUiLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDY2NDY1ODc3MywidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk1OTY4In0=
VUEwMDAwMDAxMDczOTgwMzA=,219255118,0,1593880889725,clickstream,eyJkZXZpY2UiOiJpT1MiLCJlY29tbWVyY2UiOnt9LCJldmVudF9uYW1lIjoib3JpZ2luYWwiLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODA4ODI0Mjk5ODAsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4OTY3Njg1NywiZ2VvIjp7ImNpdHkiOiJMb21pdGEiLCJzdGF0ZSI6IkNBIn0sIml0ZW1zIjpbXSwidHJhZmZpY19zb3VyY2UiOiJmYWNlYm9vayIsInVzZXJfZmlyc3RfdG91Y2hfdGltZXN0YW1wIjoxNTkzODgwODgyNDI5OTgwLCJ1c2VyX2lkIjoiVUEwMDAwMDAxMDczOTgwMzAifQ==
VUEwMDAwMDAxMDczODIyMzM=,219438025,1,1593880886106,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImNjX2luZm8iLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODAzNjQzMjEwODgsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4NjA2NTEyNSwiZ2VvIjp7ImNpdHkiOiJOZXcgWW9yayIsInN0YXRlIjoiTlkifSwiaXRlbXMiOlt7Iml0ZW1faWQiOiJNX1NUQU5fRiIsIml0ZW1fbmFtZSI6IlN0YW5kYXJkIEZ1bGwgTWF0dHJlc3MiLCJpdGVtX3JldmVudWVfaW5fdXNkIjo5NDUuMCwicHJpY2VfaW5fdXNkIjo5NDUuMCwicXVhbnRpdHkiOjF9XSwidHJhZmZpY19zb3VyY2UiOiJpbnN0YWdyYW0iLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg3OTE5NTI2MjA0NywidXNlcl9pZCI6IlVBMDAwMDAwMTA3MzgyMjMzIn0=
VUEwMDAwMDAxMDczODIyMzM=,219438069,1,1593880886106,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImNjX2luZm8iLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODAzNjQzMjEwODgsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4NjA2NTEyNSwiZ2VvIjp7ImNpdHkiOiJOZXcgWW9yayIsInN0YXRlIjoiTlkifSwiaXRlbXMiOlt7Iml0ZW1faWQiOiJNX1NUQU5fRiIsIml0ZW1fbmFtZSI6IlN0YW5kYXJkIEZ1bGwgTWF0dHJlc3MiLCJpdGVtX3JldmVudWVfaW5fdXNkIjo5NDUuMCwicHJpY2VfaW5fdXNkIjo5NDUuMCwicXVhbnRpdHkiOjF9XSwidHJhZmZpY19zb3VyY2UiOiJpbnN0YWdyYW0iLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg3OTE5NTI2MjA0NywidXNlcl9pZCI6IlVBMDAwMDAwMTA3MzgyMjMzIn0=
VUEwMDAwMDAxMDczOTgwMzc=,219438089,1,1593880887640,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImRlbGl2ZXJ5IiwiZXZlbnRfcHJldmlvdXNfdGltZXN0YW1wIjoxNTkzODgwODgyOTY0MjYyLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODc2MDUzMzcsImdlbyI6eyJjaXR5IjoiVmVybm9uIiwic3RhdGUiOiJUWCJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoiZmFjZWJvb2siLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDg4Mjk2NDI2MiwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk4MDM3In0=
VUEwMDAwMDAxMDczOTgxNTk=,219438114,1,1593880894803,clickstream,eyJkZXZpY2UiOiJtYWNPUyIsImVjb21tZXJjZSI6e30sImV2ZW50X25hbWUiOiJtYWluIiwiZXZlbnRfdGltZXN0YW1wIjoxNTkzODgwODk0Nzg5NTc5LCJnZW8iOnsiY2l0eSI6Ikxha2V3b29kIiwic3RhdGUiOiJDTyJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoieW91dHViZSIsInVzZXJfZmlyc3RfdG91Y2hfdGltZXN0YW1wIjoxNTkzODgwODk0Nzg5NTc5LCJ1c2VyX2lkIjoiVUEwMDAwMDAxMDczOTgxNTkifQ==
VUEwMDAwMDAxMDczNzY0Njc=,219438126,1,1593880888445,clickstream,eyJkZXZpY2UiOiJXaW5kb3dzIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImNhcnQiLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4Nzk2MTk4NTI2NzgsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4ODM5MjQ5OCwiZ2VvIjp7ImNpdHkiOiJEZW52ZXIiLCJzdGF0ZSI6IkNPIn0sIml0ZW1zIjpbeyJpdGVtX2lkIjoiTV9QUkVNX0siLCJpdGVtX25hbWUiOiJQcmVtaXVtIEtpbmcgTWF0dHJlc3MiLCJpdGVtX3JldmVudWVfaW5fdXNkIjoxOTk1LjAsInByaWNlX2luX3VzZCI6MTk5NS4wLCJxdWFudGl0eSI6MX0seyJpdGVtX2lkIjoiTV9TVEFOX1EiLCJpdGVtX25hbWUiOiJTdGFuZGFyZCBRdWVlbiBNYXR0cmVzcyIsIml0ZW1fcmV2ZW51ZV9pbl91c2QiOjIwOTAuMCwicHJpY2VfaW5fdXNkIjoxMDQ1LjAsInF1YW50aXR5IjoyfV0sInRyYWZmaWNfc291cmNlIjoiZmFjZWJvb2siLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg3ODU1OTgwMzQ3MCwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzc2NDY3In0=
VUEwMDAwMDAxMDczOTgwMzc=,219438135,1,1593880887640,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImRlbGl2ZXJ5IiwiZXZlbnRfcHJldmlvdXNfdGltZXN0YW1wIjoxNTkzODgwODgyOTY0MjYyLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODc2MDUzMzcsImdlbyI6eyJjaXR5IjoiVmVybm9uIiwic3RhdGUiOiJUWCJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoiZmFjZWJvb2siLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDg4Mjk2NDI2MiwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk4MDM3In0=


By default, this query will only show the first 1000 rows.

## Create References to Files
This ability to directly query files and directories means that additional Spark logic can be chained to queries against files.

When we create a view from a query against a path, we can reference this view in later queries. Here, we'll create a temporary view, but you can also create a permanent reference with regular view.

In [0]:
%sql
CREATE OR REPLACE TEMP VIEW events_temp_view
AS SELECT * FROM json.`${da.paths.datasets}/raw/events-kafka/`;

SELECT * FROM events_temp_view

key,offset,partition,timestamp,topic,value
VUEwMDAwMDAxMDczOTgwNTQ=,219255030,0,1593880885085,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6Im1haW4iLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODUwMzYxMjksImdlbyI6eyJjaXR5IjoiTmV3IFlvcmsiLCJzdGF0ZSI6Ik5ZIn0sIml0ZW1zIjpbXSwidHJhZmZpY19zb3VyY2UiOiJnb29nbGUiLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDg4NTAzNjEyOSwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk4MDU0In0=
VUEwMDAwMDAxMDczOTI0NTg=,219255043,0,1593880892303,clickstream,eyJkZXZpY2UiOiJpT1MiLCJlY29tbWVyY2UiOnt9LCJldmVudF9uYW1lIjoiYWRkX2l0ZW0iLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODAzMDA2OTY3NTEsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg5MjI1MTMxMCwiZ2VvIjp7ImNpdHkiOiJXZXN0YnJvb2siLCJzdGF0ZSI6Ik1FIn0sIml0ZW1zIjpbeyJpdGVtX2lkIjoiTV9TVEFOX1QiLCJpdGVtX25hbWUiOiJTdGFuZGFyZCBUd2luIE1hdHRyZXNzIiwiaXRlbV9yZXZlbnVlX2luX3VzZCI6NTk1LjAsInByaWNlX2luX3VzZCI6NTk1LjAsInF1YW50aXR5IjoxfV0sInRyYWZmaWNfc291cmNlIjoiZ29vZ2xlIiwidXNlcl9maXJzdF90b3VjaF90aW1lc3RhbXAiOjE1OTM4ODAzMDA2OTY3NTEsInVzZXJfaWQiOiJVQTAwMDAwMDEwNzM5MjQ1OCJ9
VUEwMDAwMDAxMDczOTU5Njg=,219255108,0,1593880889174,clickstream,eyJkZXZpY2UiOiJtYWNPUyIsImVjb21tZXJjZSI6e30sImV2ZW50X25hbWUiOiJwcmVtaXVtIiwiZXZlbnRfcHJldmlvdXNfdGltZXN0YW1wIjoxNTkzODgwODYxMDMwMjQxLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODkxMjY3NzgsImdlbyI6eyJjaXR5IjoiRmlzaGVycyIsInN0YXRlIjoiSU4ifSwiaXRlbXMiOltdLCJ0cmFmZmljX3NvdXJjZSI6InlvdXR1YmUiLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDY2NDY1ODc3MywidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk1OTY4In0=
VUEwMDAwMDAxMDczOTgwMzA=,219255118,0,1593880889725,clickstream,eyJkZXZpY2UiOiJpT1MiLCJlY29tbWVyY2UiOnt9LCJldmVudF9uYW1lIjoib3JpZ2luYWwiLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODA4ODI0Mjk5ODAsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4OTY3Njg1NywiZ2VvIjp7ImNpdHkiOiJMb21pdGEiLCJzdGF0ZSI6IkNBIn0sIml0ZW1zIjpbXSwidHJhZmZpY19zb3VyY2UiOiJmYWNlYm9vayIsInVzZXJfZmlyc3RfdG91Y2hfdGltZXN0YW1wIjoxNTkzODgwODgyNDI5OTgwLCJ1c2VyX2lkIjoiVUEwMDAwMDAxMDczOTgwMzAifQ==
VUEwMDAwMDAxMDczODIyMzM=,219438025,1,1593880886106,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImNjX2luZm8iLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODAzNjQzMjEwODgsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4NjA2NTEyNSwiZ2VvIjp7ImNpdHkiOiJOZXcgWW9yayIsInN0YXRlIjoiTlkifSwiaXRlbXMiOlt7Iml0ZW1faWQiOiJNX1NUQU5fRiIsIml0ZW1fbmFtZSI6IlN0YW5kYXJkIEZ1bGwgTWF0dHJlc3MiLCJpdGVtX3JldmVudWVfaW5fdXNkIjo5NDUuMCwicHJpY2VfaW5fdXNkIjo5NDUuMCwicXVhbnRpdHkiOjF9XSwidHJhZmZpY19zb3VyY2UiOiJpbnN0YWdyYW0iLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg3OTE5NTI2MjA0NywidXNlcl9pZCI6IlVBMDAwMDAwMTA3MzgyMjMzIn0=
VUEwMDAwMDAxMDczODIyMzM=,219438069,1,1593880886106,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImNjX2luZm8iLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODAzNjQzMjEwODgsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4NjA2NTEyNSwiZ2VvIjp7ImNpdHkiOiJOZXcgWW9yayIsInN0YXRlIjoiTlkifSwiaXRlbXMiOlt7Iml0ZW1faWQiOiJNX1NUQU5fRiIsIml0ZW1fbmFtZSI6IlN0YW5kYXJkIEZ1bGwgTWF0dHJlc3MiLCJpdGVtX3JldmVudWVfaW5fdXNkIjo5NDUuMCwicHJpY2VfaW5fdXNkIjo5NDUuMCwicXVhbnRpdHkiOjF9XSwidHJhZmZpY19zb3VyY2UiOiJpbnN0YWdyYW0iLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg3OTE5NTI2MjA0NywidXNlcl9pZCI6IlVBMDAwMDAwMTA3MzgyMjMzIn0=
VUEwMDAwMDAxMDczOTgwMzc=,219438089,1,1593880887640,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImRlbGl2ZXJ5IiwiZXZlbnRfcHJldmlvdXNfdGltZXN0YW1wIjoxNTkzODgwODgyOTY0MjYyLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODc2MDUzMzcsImdlbyI6eyJjaXR5IjoiVmVybm9uIiwic3RhdGUiOiJUWCJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoiZmFjZWJvb2siLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDg4Mjk2NDI2MiwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk4MDM3In0=
VUEwMDAwMDAxMDczOTgxNTk=,219438114,1,1593880894803,clickstream,eyJkZXZpY2UiOiJtYWNPUyIsImVjb21tZXJjZSI6e30sImV2ZW50X25hbWUiOiJtYWluIiwiZXZlbnRfdGltZXN0YW1wIjoxNTkzODgwODk0Nzg5NTc5LCJnZW8iOnsiY2l0eSI6Ikxha2V3b29kIiwic3RhdGUiOiJDTyJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoieW91dHViZSIsInVzZXJfZmlyc3RfdG91Y2hfdGltZXN0YW1wIjoxNTkzODgwODk0Nzg5NTc5LCJ1c2VyX2lkIjoiVUEwMDAwMDAxMDczOTgxNTkifQ==
VUEwMDAwMDAxMDczNzY0Njc=,219438126,1,1593880888445,clickstream,eyJkZXZpY2UiOiJXaW5kb3dzIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImNhcnQiLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4Nzk2MTk4NTI2NzgsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4ODM5MjQ5OCwiZ2VvIjp7ImNpdHkiOiJEZW52ZXIiLCJzdGF0ZSI6IkNPIn0sIml0ZW1zIjpbeyJpdGVtX2lkIjoiTV9QUkVNX0siLCJpdGVtX25hbWUiOiJQcmVtaXVtIEtpbmcgTWF0dHJlc3MiLCJpdGVtX3JldmVudWVfaW5fdXNkIjoxOTk1LjAsInByaWNlX2luX3VzZCI6MTk5NS4wLCJxdWFudGl0eSI6MX0seyJpdGVtX2lkIjoiTV9TVEFOX1EiLCJpdGVtX25hbWUiOiJTdGFuZGFyZCBRdWVlbiBNYXR0cmVzcyIsIml0ZW1fcmV2ZW51ZV9pbl91c2QiOjIwOTAuMCwicHJpY2VfaW5fdXNkIjoxMDQ1LjAsInF1YW50aXR5IjoyfV0sInRyYWZmaWNfc291cmNlIjoiZmFjZWJvb2siLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg3ODU1OTgwMzQ3MCwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzc2NDY3In0=
VUEwMDAwMDAxMDczOTgwMzc=,219438135,1,1593880887640,clickstream,eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImRlbGl2ZXJ5IiwiZXZlbnRfcHJldmlvdXNfdGltZXN0YW1wIjoxNTkzODgwODgyOTY0MjYyLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODc2MDUzMzcsImdlbyI6eyJjaXR5IjoiVmVybm9uIiwic3RhdGUiOiJUWCJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoiZmFjZWJvb2siLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDg4Mjk2NDI2MiwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk4MDM3In0=


## Extract Text Files as Raw Strings

When working with text-based files (which include JSON, CSV, TSV, and TXT formats), you can use the **`text`** format to load each line of the file as a row with one string column named **`value`**. This can be useful when data sources are prone to corruption and custom text parsing functions will be used to extract value from text fields.

In [0]:
%sql
SELECT * FROM text.`${da.paths.datasets}/raw/events-kafka/`

value
"{""key"":""VUEwMDAwMDAxMDczOTgwNTQ="",""offset"":219255030,""partition"":0,""timestamp"":1593880885085,""topic"":""clickstream"",""value"":""eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6Im1haW4iLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODUwMzYxMjksImdlbyI6eyJjaXR5IjoiTmV3IFlvcmsiLCJzdGF0ZSI6Ik5ZIn0sIml0ZW1zIjpbXSwidHJhZmZpY19zb3VyY2UiOiJnb29nbGUiLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDg4NTAzNjEyOSwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk4MDU0In0=""}"
"{""key"":""VUEwMDAwMDAxMDczOTI0NTg="",""offset"":219255043,""partition"":0,""timestamp"":1593880892303,""topic"":""clickstream"",""value"":""eyJkZXZpY2UiOiJpT1MiLCJlY29tbWVyY2UiOnt9LCJldmVudF9uYW1lIjoiYWRkX2l0ZW0iLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODAzMDA2OTY3NTEsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg5MjI1MTMxMCwiZ2VvIjp7ImNpdHkiOiJXZXN0YnJvb2siLCJzdGF0ZSI6Ik1FIn0sIml0ZW1zIjpbeyJpdGVtX2lkIjoiTV9TVEFOX1QiLCJpdGVtX25hbWUiOiJTdGFuZGFyZCBUd2luIE1hdHRyZXNzIiwiaXRlbV9yZXZlbnVlX2luX3VzZCI6NTk1LjAsInByaWNlX2luX3VzZCI6NTk1LjAsInF1YW50aXR5IjoxfV0sInRyYWZmaWNfc291cmNlIjoiZ29vZ2xlIiwidXNlcl9maXJzdF90b3VjaF90aW1lc3RhbXAiOjE1OTM4ODAzMDA2OTY3NTEsInVzZXJfaWQiOiJVQTAwMDAwMDEwNzM5MjQ1OCJ9""}"
"{""key"":""VUEwMDAwMDAxMDczOTU5Njg="",""offset"":219255108,""partition"":0,""timestamp"":1593880889174,""topic"":""clickstream"",""value"":""eyJkZXZpY2UiOiJtYWNPUyIsImVjb21tZXJjZSI6e30sImV2ZW50X25hbWUiOiJwcmVtaXVtIiwiZXZlbnRfcHJldmlvdXNfdGltZXN0YW1wIjoxNTkzODgwODYxMDMwMjQxLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODkxMjY3NzgsImdlbyI6eyJjaXR5IjoiRmlzaGVycyIsInN0YXRlIjoiSU4ifSwiaXRlbXMiOltdLCJ0cmFmZmljX3NvdXJjZSI6InlvdXR1YmUiLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDY2NDY1ODc3MywidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk1OTY4In0=""}"
"{""key"":""VUEwMDAwMDAxMDczOTgwMzA="",""offset"":219255118,""partition"":0,""timestamp"":1593880889725,""topic"":""clickstream"",""value"":""eyJkZXZpY2UiOiJpT1MiLCJlY29tbWVyY2UiOnt9LCJldmVudF9uYW1lIjoib3JpZ2luYWwiLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODA4ODI0Mjk5ODAsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4OTY3Njg1NywiZ2VvIjp7ImNpdHkiOiJMb21pdGEiLCJzdGF0ZSI6IkNBIn0sIml0ZW1zIjpbXSwidHJhZmZpY19zb3VyY2UiOiJmYWNlYm9vayIsInVzZXJfZmlyc3RfdG91Y2hfdGltZXN0YW1wIjoxNTkzODgwODgyNDI5OTgwLCJ1c2VyX2lkIjoiVUEwMDAwMDAxMDczOTgwMzAifQ==""}"
"{""key"":""VUEwMDAwMDAxMDczODIyMzM="",""offset"":219438025,""partition"":1,""timestamp"":1593880886106,""topic"":""clickstream"",""value"":""eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImNjX2luZm8iLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODAzNjQzMjEwODgsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4NjA2NTEyNSwiZ2VvIjp7ImNpdHkiOiJOZXcgWW9yayIsInN0YXRlIjoiTlkifSwiaXRlbXMiOlt7Iml0ZW1faWQiOiJNX1NUQU5fRiIsIml0ZW1fbmFtZSI6IlN0YW5kYXJkIEZ1bGwgTWF0dHJlc3MiLCJpdGVtX3JldmVudWVfaW5fdXNkIjo5NDUuMCwicHJpY2VfaW5fdXNkIjo5NDUuMCwicXVhbnRpdHkiOjF9XSwidHJhZmZpY19zb3VyY2UiOiJpbnN0YWdyYW0iLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg3OTE5NTI2MjA0NywidXNlcl9pZCI6IlVBMDAwMDAwMTA3MzgyMjMzIn0=""}"
"{""key"":""VUEwMDAwMDAxMDczODIyMzM="",""offset"":219438069,""partition"":1,""timestamp"":1593880886106,""topic"":""clickstream"",""value"":""eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImNjX2luZm8iLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4ODAzNjQzMjEwODgsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4NjA2NTEyNSwiZ2VvIjp7ImNpdHkiOiJOZXcgWW9yayIsInN0YXRlIjoiTlkifSwiaXRlbXMiOlt7Iml0ZW1faWQiOiJNX1NUQU5fRiIsIml0ZW1fbmFtZSI6IlN0YW5kYXJkIEZ1bGwgTWF0dHJlc3MiLCJpdGVtX3JldmVudWVfaW5fdXNkIjo5NDUuMCwicHJpY2VfaW5fdXNkIjo5NDUuMCwicXVhbnRpdHkiOjF9XSwidHJhZmZpY19zb3VyY2UiOiJpbnN0YWdyYW0iLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg3OTE5NTI2MjA0NywidXNlcl9pZCI6IlVBMDAwMDAwMTA3MzgyMjMzIn0=""}"
"{""key"":""VUEwMDAwMDAxMDczOTgwMzc="",""offset"":219438089,""partition"":1,""timestamp"":1593880887640,""topic"":""clickstream"",""value"":""eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImRlbGl2ZXJ5IiwiZXZlbnRfcHJldmlvdXNfdGltZXN0YW1wIjoxNTkzODgwODgyOTY0MjYyLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODc2MDUzMzcsImdlbyI6eyJjaXR5IjoiVmVybm9uIiwic3RhdGUiOiJUWCJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoiZmFjZWJvb2siLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDg4Mjk2NDI2MiwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk4MDM3In0=""}"
"{""key"":""VUEwMDAwMDAxMDczOTgxNTk="",""offset"":219438114,""partition"":1,""timestamp"":1593880894803,""topic"":""clickstream"",""value"":""eyJkZXZpY2UiOiJtYWNPUyIsImVjb21tZXJjZSI6e30sImV2ZW50X25hbWUiOiJtYWluIiwiZXZlbnRfdGltZXN0YW1wIjoxNTkzODgwODk0Nzg5NTc5LCJnZW8iOnsiY2l0eSI6Ikxha2V3b29kIiwic3RhdGUiOiJDTyJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoieW91dHViZSIsInVzZXJfZmlyc3RfdG91Y2hfdGltZXN0YW1wIjoxNTkzODgwODk0Nzg5NTc5LCJ1c2VyX2lkIjoiVUEwMDAwMDAxMDczOTgxNTkifQ==""}"
"{""key"":""VUEwMDAwMDAxMDczNzY0Njc="",""offset"":219438126,""partition"":1,""timestamp"":1593880888445,""topic"":""clickstream"",""value"":""eyJkZXZpY2UiOiJXaW5kb3dzIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImNhcnQiLCJldmVudF9wcmV2aW91c190aW1lc3RhbXAiOjE1OTM4Nzk2MTk4NTI2NzgsImV2ZW50X3RpbWVzdGFtcCI6MTU5Mzg4MDg4ODM5MjQ5OCwiZ2VvIjp7ImNpdHkiOiJEZW52ZXIiLCJzdGF0ZSI6IkNPIn0sIml0ZW1zIjpbeyJpdGVtX2lkIjoiTV9QUkVNX0siLCJpdGVtX25hbWUiOiJQcmVtaXVtIEtpbmcgTWF0dHJlc3MiLCJpdGVtX3JldmVudWVfaW5fdXNkIjoxOTk1LjAsInByaWNlX2luX3VzZCI6MTk5NS4wLCJxdWFudGl0eSI6MX0seyJpdGVtX2lkIjoiTV9TVEFOX1EiLCJpdGVtX25hbWUiOiJTdGFuZGFyZCBRdWVlbiBNYXR0cmVzcyIsIml0ZW1fcmV2ZW51ZV9pbl91c2QiOjIwOTAuMCwicHJpY2VfaW5fdXNkIjoxMDQ1LjAsInF1YW50aXR5IjoyfV0sInRyYWZmaWNfc291cmNlIjoiZmFjZWJvb2siLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg3ODU1OTgwMzQ3MCwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzc2NDY3In0=""}"
"{""key"":""VUEwMDAwMDAxMDczOTgwMzc="",""offset"":219438135,""partition"":1,""timestamp"":1593880887640,""topic"":""clickstream"",""value"":""eyJkZXZpY2UiOiJBbmRyb2lkIiwiZWNvbW1lcmNlIjp7fSwiZXZlbnRfbmFtZSI6ImRlbGl2ZXJ5IiwiZXZlbnRfcHJldmlvdXNfdGltZXN0YW1wIjoxNTkzODgwODgyOTY0MjYyLCJldmVudF90aW1lc3RhbXAiOjE1OTM4ODA4ODc2MDUzMzcsImdlbyI6eyJjaXR5IjoiVmVybm9uIiwic3RhdGUiOiJUWCJ9LCJpdGVtcyI6W10sInRyYWZmaWNfc291cmNlIjoiZmFjZWJvb2siLCJ1c2VyX2ZpcnN0X3RvdWNoX3RpbWVzdGFtcCI6MTU5Mzg4MDg4Mjk2NDI2MiwidXNlcl9pZCI6IlVBMDAwMDAwMTA3Mzk4MDM3In0=""}"


## Extract the Raw Bytes and Metadata of a File

Some workflows may require working with entire files, such as when dealing with images or unstructured data. Using **`binaryFile`** to query a directory will provide file metadata alongside the binary representation of the file contents.

Specifically, the fields created will indicate the **`path`**, **`modificationTime`**, **`length`**, and **`content`**.

In [0]:
%sql
SELECT * FROM binaryFile.`${da.paths.datasets}/raw/events-kafka/`

path,modificationTime,length,content
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/000.json,2022-07-26T09:38:13.000+0000,200116,eyJrZXkiOiJWVUV3TURBd01EQXhNRGN6T1Rnd05UUT0iLCJvZmZzZXQiOjIxOTI1NTAzMCwicGFydGl0aW9uIjowLCJ0aW1lc3RhbXAiOjE1OTM4ODA4ODUwODUsInRvcGljIjoiY2xpY2tzdHJlYW0iLCJ2YWx1ZSI6ImV5Sms= (truncated)
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/001.json,2022-07-26T09:38:14.000+0000,169907,eyJrZXkiOiJWVUV3TURBd01EQXhNRGN6T0RBeU9UWT0iLCJvZmZzZXQiOjIxOTI0NjIzMywicGFydGl0aW9uIjowLCJ0aW1lc3RhbXAiOjE1OTM4ODAxNzUyNjgsInRvcGljIjoiY2xpY2tzdHJlYW0iLCJ2YWx1ZSI6ImV5Sms= (truncated)
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/002.json,2022-07-26T09:38:15.000+0000,140680,eyJrZXkiOiJWVUV3TURBd01EQXhNRGN6T1RJNE1EWT0iLCJvZmZzZXQiOjIxOTI0ODE3OSwicGFydGl0aW9uIjowLCJ0aW1lc3RhbXAiOjE1OTM4ODAzMzQ4MzIsInRvcGljIjoiY2xpY2tzdHJlYW0iLCJ2YWx1ZSI6ImV5Sms= (truncated)
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/003.json,2022-07-26T09:38:16.000+0000,139280,eyJrZXkiOiJWVUV3TURBd01EQXhNRGN6TmpRM01Uaz0iLCJvZmZzZXQiOjIxOTQyMTUwOCwicGFydGl0aW9uIjoxLCJ0aW1lc3RhbXAiOjE1OTM4Nzk2MTYyMTIsInRvcGljIjoiY2xpY2tzdHJlYW0iLCJ2YWx1ZSI6ImV5Sms= (truncated)
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/004.json,2022-07-26T09:38:17.000+0000,122411,eyJrZXkiOiJWVUV3TURBd01EQXhNRGN6T0RVMk16QT0iLCJvZmZzZXQiOjIxOTIzOTY3NiwicGFydGl0aW9uIjowLCJ0aW1lc3RhbXAiOjE1OTM4Nzk2MjExMDEsInRvcGljIjoiY2xpY2tzdHJlYW0iLCJ2YWx1ZSI6ImV5Sms= (truncated)
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/006.json,2022-07-26T09:38:19.000+0000,98332,eyJrZXkiOiJWVUV3TURBd01EQXhNRGN6T1RJeU5UUT0iLCJvZmZzZXQiOjIxOTI1MzYzMiwicGFydGl0aW9uIjowLCJ0aW1lc3RhbXAiOjE1OTM4ODA3NzY1NjUsInRvcGljIjoiY2xpY2tzdHJlYW0iLCJ2YWx1ZSI6ImV5Sms= (truncated)
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/005.json,2022-07-26T09:38:18.000+0000,96034,eyJrZXkiOiJWVUV3TURBd01EQXhNRGN6TnpVeU1qYz0iLCJvZmZzZXQiOjIxOTQzMjk3OSwicGFydGl0aW9uIjoxLCJ0aW1lc3RhbXAiOjE1OTM4ODA1MDEwNDIsInRvcGljIjoiY2xpY2tzdHJlYW0iLCJ2YWx1ZSI6ImV5Sms= (truncated)
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/007.json,2022-07-26T09:38:20.000+0000,86452,eyJrZXkiOiJWVUV3TURBd01EQXhNRGN6T1RJNU1qWT0iLCJvZmZzZXQiOjIxOTI0ODMzMywicGFydGl0aW9uIjowLCJ0aW1lc3RhbXAiOjE1OTM4ODAzNTAwMDgsInRvcGljIjoiY2xpY2tzdHJlYW0iLCJ2YWx1ZSI6ImV5Sms= (truncated)
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/008.json,2022-07-26T09:38:21.000+0000,68052,eyJrZXkiOiJWVUV3TURBd01EQXhNRGN6T0RZNE9UUT0iLCJvZmZzZXQiOjIxOTI0MDYxNSwicGFydGl0aW9uIjowLCJ0aW1lc3RhbXAiOjE1OTM4Nzk3MDMwMzAsInRvcGljIjoiY2xpY2tzdHJlYW0iLCJ2YWx1ZSI6ImV5Sms= (truncated)
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/dewd/source/eltwss/raw/events-kafka/009.json,2022-07-26T09:38:22.000+0000,44729,eyJrZXkiOiJWVUV3TURBd01EQXhNRGN6TnpjM09UTT0iLCJvZmZzZXQiOjIxOTQzNTIwNywicGFydGl0aW9uIjoxLCJ0aW1lc3RhbXAiOjE1OTM4ODA2ODA5MTYsInRvcGljIjoiY2xpY2tzdHJlYW0iLCJ2YWx1ZSI6ImV5Sms= (truncated)


Run the following cell to delete the tables and files associated with this lesson.

In [0]:
%python 
DA.cleanup()

-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>