-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Delta Review

There are a few key operations necessary to understand and make use of <a href="https://docs.delta.io/latest/quick-start.html#create-a-table" target="_blank">Delta Lake</a>.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lesson you will:<br>
- Create a Delta Table
- Read data from your Delta Table
- Update data in your Delta Table
- Access previous versions of your Delta Table using <a href="https://databricks.com/blog/2019/02/04/introducing-delta-time-travel-for-large-scale-data-lakes.html" target="_blank">time travel</a>
- <a href="https://databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html" target="_blank">Understand the Transaction Log</a>

In this notebook we will be using the SF Airbnb rental dataset from <a href="http://insideairbnb.com/get-the-data.html" target="_blank">Inside Airbnb</a>.

###Why Delta Lake?<br><br>

<div style="img align: center; line-height: 0; padding-top: 9px;">
  <img src="https://user-images.githubusercontent.com/20408077/87175470-4d8e1580-c29e-11ea-8f33-0ee14348a2c1.png" width="500"/>
</div>

At a glance, Delta Lake is an open source storage layer that brings both **reliability and performance** to data lakes. Delta Lake provides **ACID transactions, scalable metadata handling, and unifies streaming and batch data processing**. 

**Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.** <a href="https://docs.databricks.com/delta/delta-intro.html" target="_blank">For more information </a>

In [0]:
%run "./Includes/Classroom-Setup"

###Creating a Delta Table
First we need to read the Airbnb dataset as a **Spark DataFrame**

In [0]:
file_path = f"{datasets_dir}/airbnb/sf-listings/sf-listings-2019-03-06-clean.parquet/"
airbnb_df = spark.read.format("parquet").load(file_path)

display(airbnb_df)

host_is_superhost,cancellation_policy,instant_bookable,host_total_listings_count,neighbourhood_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,minimum_nights,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,price,bedrooms_na,bathrooms_na,beds_na,review_scores_rating_na,review_scores_accuracy_na,review_scores_cleanliness_na,review_scores_checkin_na,review_scores_communication_na,review_scores_location_na,review_scores_value_na
t,moderate,t,1.0,Western Addition,37.76931,-122.43386,Apartment,Entire home/apt,3.0,1.0,1.0,2.0,Real Bed,1.0,180.0,97.0,10.0,10.0,10.0,10.0,10.0,10.0,170.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,2.0,Bernal Heights,37.74511,-122.42102,Apartment,Entire home/apt,5.0,1.0,2.0,3.0,Real Bed,30.0,111.0,98.0,10.0,10.0,10.0,10.0,10.0,9.0,235.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,10.0,Haight Ashbury,37.76669,-122.4525,Apartment,Private room,2.0,4.0,1.0,1.0,Real Bed,32.0,17.0,85.0,8.0,8.0,9.0,9.0,9.0,8.0,65.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,10.0,Haight Ashbury,37.76487,-122.45183,Apartment,Private room,2.0,4.0,1.0,1.0,Real Bed,32.0,8.0,93.0,9.0,9.0,10.0,10.0,9.0,9.0,65.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,2.0,Western Addition,37.77525,-122.43637,House,Entire home/apt,5.0,1.5,2.0,2.0,Real Bed,7.0,27.0,97.0,10.0,10.0,10.0,10.0,10.0,9.0,785.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,moderate,f,1.0,Western Addition,37.78471,-122.44555,Apartment,Entire home/apt,6.0,1.0,2.0,3.0,Real Bed,2.0,31.0,90.0,9.0,8.0,10.0,10.0,9.0,9.0,255.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,t,2.0,Mission,37.75919,-122.42237,Condominium,Private room,3.0,1.0,1.0,2.0,Real Bed,1.0,647.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,139.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,1.0,Potrero Hill,37.76259,-122.40543,House,Private room,2.0,1.0,1.0,1.0,Real Bed,1.0,453.0,94.0,10.0,10.0,10.0,10.0,10.0,10.0,135.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,moderate,f,1.0,Mission,37.75874,-122.41327,Apartment,Entire home/apt,6.0,1.0,2.0,3.0,Real Bed,3.0,320.0,96.0,10.0,10.0,10.0,10.0,10.0,9.0,265.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,44.0,Haight Ashbury,37.77187,-122.43859,Apartment,Entire home/apt,3.0,1.0,3.0,3.0,Real Bed,30.0,37.0,89.0,9.0,9.0,10.0,9.0,9.0,9.0,177.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


**The cell below converts the data to a Delta table using the schema provided by the Spark DataFrame.**

In [0]:
# Converting Spark DataFrame to Delta Table
dbutils.fs.rm(working_dir, True)
airbnb_df.write.format("delta").mode("overwrite").save(working_dir)

**A Delta directory can also be registered as a table in the metastore.**

In [0]:
spark.sql(f"CREATE DATABASE IF NOT EXISTS {cleaned_username}")
spark.sql(f"USE {cleaned_username}")

airbnb_df.write.format("delta").mode("overwrite").saveAsTable("delta_review")

In [0]:
%sql
select * from delta_review limit 10

host_is_superhost,cancellation_policy,instant_bookable,host_total_listings_count,neighbourhood_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,minimum_nights,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,price,bedrooms_na,bathrooms_na,beds_na,review_scores_rating_na,review_scores_accuracy_na,review_scores_cleanliness_na,review_scores_checkin_na,review_scores_communication_na,review_scores_location_na,review_scores_value_na
t,moderate,t,1.0,Western Addition,37.76931,-122.43386,Apartment,Entire home/apt,3.0,1.0,1.0,2.0,Real Bed,1.0,180.0,97.0,10.0,10.0,10.0,10.0,10.0,10.0,170.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,2.0,Bernal Heights,37.74511,-122.42102,Apartment,Entire home/apt,5.0,1.0,2.0,3.0,Real Bed,30.0,111.0,98.0,10.0,10.0,10.0,10.0,10.0,9.0,235.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,10.0,Haight Ashbury,37.76669,-122.4525,Apartment,Private room,2.0,4.0,1.0,1.0,Real Bed,32.0,17.0,85.0,8.0,8.0,9.0,9.0,9.0,8.0,65.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,10.0,Haight Ashbury,37.76487,-122.45183,Apartment,Private room,2.0,4.0,1.0,1.0,Real Bed,32.0,8.0,93.0,9.0,9.0,10.0,10.0,9.0,9.0,65.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,2.0,Western Addition,37.77525,-122.43637,House,Entire home/apt,5.0,1.5,2.0,2.0,Real Bed,7.0,27.0,97.0,10.0,10.0,10.0,10.0,10.0,9.0,785.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,moderate,f,1.0,Western Addition,37.78471,-122.44555,Apartment,Entire home/apt,6.0,1.0,2.0,3.0,Real Bed,2.0,31.0,90.0,9.0,8.0,10.0,10.0,9.0,9.0,255.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,t,2.0,Mission,37.75919,-122.42237,Condominium,Private room,3.0,1.0,1.0,2.0,Real Bed,1.0,647.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,139.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,1.0,Potrero Hill,37.76259,-122.40543,House,Private room,2.0,1.0,1.0,1.0,Real Bed,1.0,453.0,94.0,10.0,10.0,10.0,10.0,10.0,10.0,135.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,moderate,f,1.0,Mission,37.75874,-122.41327,Apartment,Entire home/apt,6.0,1.0,2.0,3.0,Real Bed,3.0,320.0,96.0,10.0,10.0,10.0,10.0,10.0,9.0,265.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,44.0,Haight Ashbury,37.77187,-122.43859,Apartment,Entire home/apt,3.0,1.0,3.0,3.0,Real Bed,30.0,37.0,89.0,9.0,9.0,10.0,9.0,9.0,9.0,177.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


- Delta supports partitioning. Partitioning puts data with the same value for the partitioned column into its own directory.
- Operations with a filter on the partitioned column will only read directories that match the filter. This optimization is called **partition pruning**. 
- Choose partition columns based in the patterns in your data, this dataset for example might benefit if partitioned by neighborhood.

In [0]:
airbnb_df.write.format("delta").mode("overwrite").partitionBy("neighbourhood_cleansed").option("overwriteSchema", "true").save(working_dir)

###Understanding the <a href="https://databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html" target="_blank">Transaction Log </a>
Let's take a look at the Delta Transaction Log. We can see how Delta stores the different neighborhood partitions in separate files. Additionally, we can also see a directory called _delta_log.

In [0]:
display(dbutils.fs.ls(working_dir))

path,name,size,modificationTime
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/_delta_log/,_delta_log/,0,1661336806000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Bayview/,neighbourhood_cleansed=Bayview/,0,1661336795000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Bernal Heights/,neighbourhood_cleansed=Bernal Heights/,0,1661336796000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Castro%2FUpper Market/,neighbourhood_cleansed=Castro%2FUpper Market/,0,1661336796000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Chinatown/,neighbourhood_cleansed=Chinatown/,0,1661336796000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Crocker Amazon/,neighbourhood_cleansed=Crocker Amazon/,0,1661336797000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Diamond Heights/,neighbourhood_cleansed=Diamond Heights/,0,1661336797000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Downtown%2FCivic Center/,neighbourhood_cleansed=Downtown%2FCivic Center/,0,1661336797000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Excelsior/,neighbourhood_cleansed=Excelsior/,0,1661336797000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Financial District/,neighbourhood_cleansed=Financial District/,0,1661336798000


<div style="img align: center; line-height: 0; padding-top: 9px;">
  <img src="https://user-images.githubusercontent.com/20408077/87174138-609fe600-c29c-11ea-90cc-84df0c1357f1.png" width="500"/>
</div>

- **When a user creates a Delta Lake table, that table’s transaction log is automatically created in the _delta_log subdirectory.**
- **As he or she makes changes to that table, those changes are recorded as ordered, atomic commits in the transaction log. Each commit is written out as a JSON file, starting with 000000.json. Additional changes to the table generate more JSON files.**

In [0]:
display(dbutils.fs.ls(working_dir + "/_delta_log/"))

path,name,size,modificationTime
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/_delta_log/00000000000000000000.crc,00000000000000000000.crc,4691,1661335906000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/_delta_log/00000000000000000000.json,00000000000000000000.json,6655,1661335902000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/_delta_log/00000000000000000001.crc,00000000000000000001.crc,4723,1661336806000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/_delta_log/00000000000000000001.json,00000000000000000001.json,110482,1661336803000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/_delta_log/__tmp_path_dir/,__tmp_path_dir/,0,1661336806000


## Next, let's take a look at a Transaction Log File.

There are <a href="https://docs.databricks.com/delta/delta-utility.html" target="_blank">four columns</a> each represent a different part of the very first commit to the Delta Table when the table was created.<br><br>

- **The add column has statistics about the DataFrame as a whole and individual columns**.
- **The commitInfo column has useful information about what the operation was (WRITE or READ) and who executed the operation**.
- **The metaData column contains information about the column schema**.
- **The protocol version contains information about the minimum Delta version necessary to either write or read to this Delta Table**.

In [0]:
display(spark.read.json(working_dir + "/_delta_log/00000000000000000000.json"))

add,commitInfo,metaData,protocol
,,,"List(1, 2)"
,,"List(1661335895816, List(parquet), 6738c75d-5122-485e-9f20-d65d23bfba1c, List(), {""type"":""struct"",""fields"":[{""name"":""host_is_superhost"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""cancellation_policy"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""instant_bookable"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""host_total_listings_count"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""neighbourhood_cleansed"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""latitude"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""longitude"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""property_type"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""room_type"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""accommodates"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""bathrooms"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""bedrooms"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""beds"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""bed_type"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""minimum_nights"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""number_of_reviews"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_rating"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_accuracy"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_cleanliness"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_checkin"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_communication"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_location"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_value"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""price"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""bedrooms_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""bathrooms_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""beds_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_rating_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_accuracy_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_cleanliness_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_checkin_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_communication_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_location_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_value_na"",""type"":""double"",""nullable"":true,""metadata"":{}}]})",
"List(true, 1661335899000, part-00000-f0d5ae6b-d476-4449-895b-a11e2bd89bb9-c000.snappy.parquet, 191755, {""numRecords"":7146,""minValues"":{""host_is_superhost"":""f"",""cancellation_policy"":""flexible"",""instant_bookable"":""f"",""host_total_listings_count"":0.0,""neighbourhood_cleansed"":""Bayview"",""latitude"":37.70743,""longitude"":-122.51306,""property_type"":""Aparthotel"",""room_type"":""Entire home/apt"",""accommodates"":1.0,""bathrooms"":0.0,""bedrooms"":0.0,""beds"":0.0,""bed_type"":""Airbed"",""minimum_nights"":1.0,""number_of_reviews"":0.0,""review_scores_rating"":20.0,""review_scores_accuracy"":2.0,""review_scores_cleanliness"":2.0,""review_scores_checkin"":2.0,""review_scores_communication"":2.0,""review_scores_location"":2.0,""review_scores_value"":2.0,""price"":10.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":0.0,""review_scores_accuracy_na"":0.0,""review_scores_cleanliness_na"":0.0,""review_scores_checkin_na"":0.0,""review_scores_communication_na"":0.0},""maxValues"":{""host_is_superhost"":""t"",""cancellation_policy"":""super_strict_60"",""instant_bookable"":""t"",""host_total_listings_count"":1199.0,""neighbourhood_cleansed"":""Western Addition"",""latitude"":37.81031,""longitude"":-122.36979,""property_type"":""Villa"",""room_type"":""Shared room"",""accommodates"":16.0,""bathrooms"":14.0,""bedrooms"":14.0,""beds"":14.0,""bed_type"":""Real Bed"",""minimum_nights"":365.0,""number_of_reviews"":677.0,""review_scores_rating"":100.0,""review_scores_accuracy"":10.0,""review_scores_cleanliness"":10.0,""review_scores_checkin"":10.0,""review_scores_communication"":10.0,""review_scores_location"":10.0,""review_scores_value"":10.0,""price"":10000.0,""bedrooms_na"":1.0,""bathrooms_na"":1.0,""beds_na"":1.0,""review_scores_rating_na"":1.0,""review_scores_accuracy_na"":1.0,""review_scores_cleanliness_na"":1.0,""review_scores_checkin_na"":1.0,""review_scores_communication_na"":1.0},""nullCount"":{""host_is_superhost"":0,""cancellation_policy"":0,""instant_bookable"":0,""host_total_listings_count"":0,""neighbourhood_cleansed"":0,""latitude"":0,""longitude"":0,""property_type"":0,""room_type"":0,""accommodates"":0,""bathrooms"":0,""bedrooms"":0,""beds"":0,""bed_type"":0,""minimum_nights"":0,""number_of_reviews"":0,""review_scores_rating"":0,""review_scores_accuracy"":0,""review_scores_cleanliness"":0,""review_scores_checkin"":0,""review_scores_communication"":0,""review_scores_location"":0,""review_scores_value"":0,""price"":0,""bedrooms_na"":0,""bathrooms_na"":0,""beds_na"":0,""review_scores_rating_na"":0,""review_scores_accuracy_na"":0,""review_scores_cleanliness_na"":0,""review_scores_checkin_na"":0,""review_scores_communication_na"":0}}, List(1661335899000000, 268435456))",,,
,"List(0822-094520-n0irdqef, Databricks-Runtime/10.4.x-scala2.12, false, WriteSerializable, List(2051889157726300), WRITE, List(1, 191755, 7146), List(Overwrite, []), 1661335902207, 11629fa1-53dc-4628-956d-4342c7cc85fb, 6997591375752473, manujkumar.joshi@celebaltech.com)",,


The second transaction log has 39 rows. This includes metadata for each partition.

In [0]:
display(spark.read.json(working_dir + "/_delta_log/00000000000000000001.json"))

add,commitInfo,metaData,remove
,,"List(1661335895816, List(parquet), 6738c75d-5122-485e-9f20-d65d23bfba1c, List(neighbourhood_cleansed), {""type"":""struct"",""fields"":[{""name"":""host_is_superhost"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""cancellation_policy"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""instant_bookable"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""host_total_listings_count"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""neighbourhood_cleansed"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""latitude"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""longitude"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""property_type"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""room_type"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""accommodates"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""bathrooms"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""bedrooms"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""beds"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""bed_type"",""type"":""string"",""nullable"":true,""metadata"":{}},{""name"":""minimum_nights"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""number_of_reviews"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_rating"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_accuracy"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_cleanliness"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_checkin"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_communication"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_location"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_value"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""price"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""bedrooms_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""bathrooms_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""beds_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_rating_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_accuracy_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_cleanliness_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_checkin_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_communication_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_location_na"",""type"":""double"",""nullable"":true,""metadata"":{}},{""name"":""review_scores_value_na"",""type"":""double"",""nullable"":true,""metadata"":{}}]})",
"List(true, 1661336795000, List(Bayview), neighbourhood_cleansed=Bayview/part-00000-54e44e99-a4f5-4311-a3ef-4068bdf0e775.c000.snappy.parquet, 16372, {""numRecords"":157,""minValues"":{""host_is_superhost"":""f"",""cancellation_policy"":""flexible"",""instant_bookable"":""f"",""host_total_listings_count"":1.0,""latitude"":37.7106,""longitude"":-122.40647,""property_type"":""Apartment"",""room_type"":""Entire home/apt"",""accommodates"":1.0,""bathrooms"":1.0,""bedrooms"":0.0,""beds"":0.0,""bed_type"":""Futon"",""minimum_nights"":1.0,""number_of_reviews"":0.0,""review_scores_rating"":30.0,""review_scores_accuracy"":2.0,""review_scores_cleanliness"":3.0,""review_scores_checkin"":8.0,""review_scores_communication"":7.0,""review_scores_location"":2.0,""review_scores_value"":2.0,""price"":36.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":0.0,""review_scores_accuracy_na"":0.0,""review_scores_cleanliness_na"":0.0,""review_scores_checkin_na"":0.0,""review_scores_communication_na"":0.0,""review_scores_location_na"":0.0},""maxValues"":{""host_is_superhost"":""t"",""cancellation_policy"":""super_strict_60"",""instant_bookable"":""t"",""host_total_listings_count"":35.0,""latitude"":37.74514,""longitude"":-122.36979,""property_type"":""Villa"",""room_type"":""Shared room"",""accommodates"":8.0,""bathrooms"":4.0,""bedrooms"":4.0,""beds"":7.0,""bed_type"":""Real Bed"",""minimum_nights"":180.0,""number_of_reviews"":300.0,""review_scores_rating"":100.0,""review_scores_accuracy"":10.0,""review_scores_cleanliness"":10.0,""review_scores_checkin"":10.0,""review_scores_communication"":10.0,""review_scores_location"":10.0,""review_scores_value"":10.0,""price"":550.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":1.0,""review_scores_accuracy_na"":1.0,""review_scores_cleanliness_na"":1.0,""review_scores_checkin_na"":1.0,""review_scores_communication_na"":1.0,""review_scores_location_na"":1.0},""nullCount"":{""host_is_superhost"":0,""cancellation_policy"":0,""instant_bookable"":0,""host_total_listings_count"":0,""latitude"":0,""longitude"":0,""property_type"":0,""room_type"":0,""accommodates"":0,""bathrooms"":0,""bedrooms"":0,""beds"":0,""bed_type"":0,""minimum_nights"":0,""number_of_reviews"":0,""review_scores_rating"":0,""review_scores_accuracy"":0,""review_scores_cleanliness"":0,""review_scores_checkin"":0,""review_scores_communication"":0,""review_scores_location"":0,""review_scores_value"":0,""price"":0,""bedrooms_na"":0,""bathrooms_na"":0,""beds_na"":0,""review_scores_rating_na"":0,""review_scores_accuracy_na"":0,""review_scores_cleanliness_na"":0,""review_scores_checkin_na"":0,""review_scores_communication_na"":0,""review_scores_location_na"":0}}, List(1661336795000000, 268435456))",,,
"List(true, 1661336796000, List(Bernal Heights), neighbourhood_cleansed=Bernal%20Heights/part-00000-19405676-77d7-4172-a567-6cdf25f19457.c000.snappy.parquet, 21983, {""numRecords"":373,""minValues"":{""host_is_superhost"":""f"",""cancellation_policy"":""flexible"",""instant_bookable"":""f"",""host_total_listings_count"":1.0,""latitude"":37.73198,""longitude"":-122.43261,""property_type"":""Apartment"",""room_type"":""Entire home/apt"",""accommodates"":1.0,""bathrooms"":1.0,""bedrooms"":0.0,""beds"":0.0,""bed_type"":""Airbed"",""minimum_nights"":1.0,""number_of_reviews"":0.0,""review_scores_rating"":60.0,""review_scores_accuracy"":5.0,""review_scores_cleanliness"":6.0,""review_scores_checkin"":8.0,""review_scores_communication"":2.0,""review_scores_location"":8.0,""review_scores_value"":6.0,""price"":39.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":0.0,""review_scores_accuracy_na"":0.0,""review_scores_cleanliness_na"":0.0,""review_scores_checkin_na"":0.0,""review_scores_communication_na"":0.0,""review_scores_location_na"":0.0},""maxValues"":{""host_is_superhost"":""t"",""cancellation_policy"":""super_strict_30"",""instant_bookable"":""t"",""host_total_listings_count"":852.0,""latitude"":37.74845,""longitude"":-122.40495,""property_type"":""Townhouse"",""room_type"":""Shared room"",""accommodates"":10.0,""bathrooms"":4.0,""bedrooms"":4.0,""beds"":7.0,""bed_type"":""Real Bed"",""minimum_nights"":60.0,""number_of_reviews"":360.0,""review_scores_rating"":100.0,""review_scores_accuracy"":10.0,""review_scores_cleanliness"":10.0,""review_scores_checkin"":10.0,""review_scores_communication"":10.0,""review_scores_location"":10.0,""review_scores_value"":10.0,""price"":1000.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":1.0,""review_scores_accuracy_na"":1.0,""review_scores_cleanliness_na"":1.0,""review_scores_checkin_na"":1.0,""review_scores_communication_na"":1.0,""review_scores_location_na"":1.0},""nullCount"":{""host_is_superhost"":0,""cancellation_policy"":0,""instant_bookable"":0,""host_total_listings_count"":0,""latitude"":0,""longitude"":0,""property_type"":0,""room_type"":0,""accommodates"":0,""bathrooms"":0,""bedrooms"":0,""beds"":0,""bed_type"":0,""minimum_nights"":0,""number_of_reviews"":0,""review_scores_rating"":0,""review_scores_accuracy"":0,""review_scores_cleanliness"":0,""review_scores_checkin"":0,""review_scores_communication"":0,""review_scores_location"":0,""review_scores_value"":0,""price"":0,""bedrooms_na"":0,""bathrooms_na"":0,""beds_na"":0,""review_scores_rating_na"":0,""review_scores_accuracy_na"":0,""review_scores_cleanliness_na"":0,""review_scores_checkin_na"":0,""review_scores_communication_na"":0,""review_scores_location_na"":0}}, List(1661336795000001, 268435456))",,,
"List(true, 1661336796000, List(Castro/Upper Market), neighbourhood_cleansed=Castro%252FUpper%20Market/part-00000-93735736-a2bf-4469-92e1-227795c7f2f8.c000.snappy.parquet, 23318, {""numRecords"":405,""minValues"":{""host_is_superhost"":""f"",""cancellation_policy"":""flexible"",""instant_bookable"":""f"",""host_total_listings_count"":0.0,""latitude"":37.75608,""longitude"":-122.44618,""property_type"":""Apartment"",""room_type"":""Entire home/apt"",""accommodates"":1.0,""bathrooms"":0.0,""bedrooms"":0.0,""beds"":0.0,""bed_type"":""Couch"",""minimum_nights"":1.0,""number_of_reviews"":0.0,""review_scores_rating"":40.0,""review_scores_accuracy"":2.0,""review_scores_cleanliness"":2.0,""review_scores_checkin"":2.0,""review_scores_communication"":2.0,""review_scores_location"":4.0,""review_scores_value"":2.0,""price"":10.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":0.0,""review_scores_accuracy_na"":0.0,""review_scores_cleanliness_na"":0.0,""review_scores_checkin_na"":0.0,""review_scores_communication_na"":0.0,""review_scores_location_na"":0.0},""maxValues"":{""host_is_superhost"":""t"",""cancellation_policy"":""super_strict_30"",""instant_bookable"":""t"",""host_total_listings_count"":852.0,""latitude"":37.76936,""longitude"":-122.4262,""property_type"":""Townhouse"",""room_type"":""Shared room"",""accommodates"":12.0,""bathrooms"":4.0,""bedrooms"":6.0,""beds"":7.0,""bed_type"":""Real Bed"",""minimum_nights"":120.0,""number_of_reviews"":540.0,""review_scores_rating"":100.0,""review_scores_accuracy"":10.0,""review_scores_cleanliness"":10.0,""review_scores_checkin"":10.0,""review_scores_communication"":10.0,""review_scores_location"":10.0,""review_scores_value"":10.0,""price"":1450.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":1.0,""review_scores_accuracy_na"":1.0,""review_scores_cleanliness_na"":1.0,""review_scores_checkin_na"":1.0,""review_scores_communication_na"":1.0,""review_scores_location_na"":1.0},""nullCount"":{""host_is_superhost"":0,""cancellation_policy"":0,""instant_bookable"":0,""host_total_listings_count"":0,""latitude"":0,""longitude"":0,""property_type"":0,""room_type"":0,""accommodates"":0,""bathrooms"":0,""bedrooms"":0,""beds"":0,""bed_type"":0,""minimum_nights"":0,""number_of_reviews"":0,""review_scores_rating"":0,""review_scores_accuracy"":0,""review_scores_cleanliness"":0,""review_scores_checkin"":0,""review_scores_communication"":0,""review_scores_location"":0,""review_scores_value"":0,""price"":0,""bedrooms_na"":0,""bathrooms_na"":0,""beds_na"":0,""review_scores_rating_na"":0,""review_scores_accuracy_na"":0,""review_scores_cleanliness_na"":0,""review_scores_checkin_na"":0,""review_scores_communication_na"":0,""review_scores_location_na"":0}}, List(1661336795000002, 268435456))",,,
"List(true, 1661336797000, List(Chinatown), neighbourhood_cleansed=Chinatown/part-00000-9f069f1c-b4b1-4825-b872-735264eb89f9.c000.snappy.parquet, 14689, {""numRecords"":118,""minValues"":{""host_is_superhost"":""f"",""cancellation_policy"":""flexible"",""instant_bookable"":""f"",""host_total_listings_count"":1.0,""latitude"":37.79028,""longitude"":-122.40929,""property_type"":""Aparthotel"",""room_type"":""Entire home/apt"",""accommodates"":1.0,""bathrooms"":1.0,""bedrooms"":0.0,""beds"":1.0,""bed_type"":""Real Bed"",""minimum_nights"":1.0,""number_of_reviews"":0.0,""review_scores_rating"":60.0,""review_scores_accuracy"":7.0,""review_scores_cleanliness"":6.0,""review_scores_checkin"":7.0,""review_scores_communication"":7.0,""review_scores_location"":8.0,""review_scores_value"":7.0,""price"":30.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":0.0,""review_scores_accuracy_na"":0.0,""review_scores_cleanliness_na"":0.0,""review_scores_checkin_na"":0.0,""review_scores_communication_na"":0.0,""review_scores_location_na"":0.0},""maxValues"":{""host_is_superhost"":""t"",""cancellation_policy"":""strict_14_with_grace_period"",""instant_bookable"":""t"",""host_total_listings_count"":852.0,""latitude"":37.79781,""longitude"":-122.40436,""property_type"":""Serviced apartment"",""room_type"":""Shared room"",""accommodates"":7.0,""bathrooms"":2.5,""bedrooms"":3.0,""beds"":4.0,""bed_type"":""Real Bed"",""minimum_nights"":45.0,""number_of_reviews"":136.0,""review_scores_rating"":100.0,""review_scores_accuracy"":10.0,""review_scores_cleanliness"":10.0,""review_scores_checkin"":10.0,""review_scores_communication"":10.0,""review_scores_location"":10.0,""review_scores_value"":10.0,""price"":800.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":1.0,""review_scores_rating_na"":1.0,""review_scores_accuracy_na"":1.0,""review_scores_cleanliness_na"":1.0,""review_scores_checkin_na"":1.0,""review_scores_communication_na"":1.0,""review_scores_location_na"":1.0},""nullCount"":{""host_is_superhost"":0,""cancellation_policy"":0,""instant_bookable"":0,""host_total_listings_count"":0,""latitude"":0,""longitude"":0,""property_type"":0,""room_type"":0,""accommodates"":0,""bathrooms"":0,""bedrooms"":0,""beds"":0,""bed_type"":0,""minimum_nights"":0,""number_of_reviews"":0,""review_scores_rating"":0,""review_scores_accuracy"":0,""review_scores_cleanliness"":0,""review_scores_checkin"":0,""review_scores_communication"":0,""review_scores_location"":0,""review_scores_value"":0,""price"":0,""bedrooms_na"":0,""bathrooms_na"":0,""beds_na"":0,""review_scores_rating_na"":0,""review_scores_accuracy_na"":0,""review_scores_cleanliness_na"":0,""review_scores_checkin_na"":0,""review_scores_communication_na"":0,""review_scores_location_na"":0}}, List(1661336795000003, 268435456))",,,
"List(true, 1661336797000, List(Crocker Amazon), neighbourhood_cleansed=Crocker%20Amazon/part-00000-de5b03f2-a273-48d4-9c9c-34318a38a20c.c000.snappy.parquet, 12974, {""numRecords"":52,""minValues"":{""host_is_superhost"":""f"",""cancellation_policy"":""flexible"",""instant_bookable"":""f"",""host_total_listings_count"":1.0,""latitude"":37.70743,""longitude"":-122.4508,""property_type"":""Apartment"",""room_type"":""Entire home/apt"",""accommodates"":1.0,""bathrooms"":1.0,""bedrooms"":0.0,""beds"":0.0,""bed_type"":""Real Bed"",""minimum_nights"":1.0,""number_of_reviews"":0.0,""review_scores_rating"":20.0,""review_scores_accuracy"":2.0,""review_scores_cleanliness"":2.0,""review_scores_checkin"":6.0,""review_scores_communication"":2.0,""review_scores_location"":6.0,""review_scores_value"":2.0,""price"":35.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":0.0,""review_scores_accuracy_na"":0.0,""review_scores_cleanliness_na"":0.0,""review_scores_checkin_na"":0.0,""review_scores_communication_na"":0.0,""review_scores_location_na"":0.0},""maxValues"":{""host_is_superhost"":""t"",""cancellation_policy"":""strict_14_with_grace_period"",""instant_bookable"":""t"",""host_total_listings_count"":118.0,""latitude"":37.7158,""longitude"":-122.43077,""property_type"":""House"",""room_type"":""Shared room"",""accommodates"":12.0,""bathrooms"":2.0,""bedrooms"":5.0,""beds"":8.0,""bed_type"":""Real Bed"",""minimum_nights"":30.0,""number_of_reviews"":399.0,""review_scores_rating"":100.0,""review_scores_accuracy"":10.0,""review_scores_cleanliness"":10.0,""review_scores_checkin"":10.0,""review_scores_communication"":10.0,""review_scores_location"":10.0,""review_scores_value"":10.0,""price"":699.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":1.0,""review_scores_accuracy_na"":1.0,""review_scores_cleanliness_na"":1.0,""review_scores_checkin_na"":1.0,""review_scores_communication_na"":1.0,""review_scores_location_na"":1.0},""nullCount"":{""host_is_superhost"":0,""cancellation_policy"":0,""instant_bookable"":0,""host_total_listings_count"":0,""latitude"":0,""longitude"":0,""property_type"":0,""room_type"":0,""accommodates"":0,""bathrooms"":0,""bedrooms"":0,""beds"":0,""bed_type"":0,""minimum_nights"":0,""number_of_reviews"":0,""review_scores_rating"":0,""review_scores_accuracy"":0,""review_scores_cleanliness"":0,""review_scores_checkin"":0,""review_scores_communication"":0,""review_scores_location"":0,""review_scores_value"":0,""price"":0,""bedrooms_na"":0,""bathrooms_na"":0,""beds_na"":0,""review_scores_rating_na"":0,""review_scores_accuracy_na"":0,""review_scores_cleanliness_na"":0,""review_scores_checkin_na"":0,""review_scores_communication_na"":0,""review_scores_location_na"":0}}, List(1661336795000004, 268435456))",,,
"List(true, 1661336797000, List(Diamond Heights), neighbourhood_cleansed=Diamond%20Heights/part-00000-189040e8-303e-4633-92c8-0ed24d1995f6.c000.snappy.parquet, 11878, {""numRecords"":19,""minValues"":{""host_is_superhost"":""f"",""cancellation_policy"":""flexible"",""instant_bookable"":""f"",""host_total_listings_count"":0.0,""latitude"":37.73564,""longitude"":-122.44681,""property_type"":""Apartment"",""room_type"":""Entire home/apt"",""accommodates"":2.0,""bathrooms"":1.0,""bedrooms"":1.0,""beds"":1.0,""bed_type"":""Real Bed"",""minimum_nights"":1.0,""number_of_reviews"":0.0,""review_scores_rating"":92.0,""review_scores_accuracy"":9.0,""review_scores_cleanliness"":9.0,""review_scores_checkin"":10.0,""review_scores_communication"":10.0,""review_scores_location"":9.0,""review_scores_value"":9.0,""price"":80.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":0.0,""review_scores_accuracy_na"":0.0,""review_scores_cleanliness_na"":0.0,""review_scores_checkin_na"":0.0,""review_scores_communication_na"":0.0,""review_scores_location_na"":0.0},""maxValues"":{""host_is_superhost"":""t"",""cancellation_policy"":""strict_14_with_grace_period"",""instant_bookable"":""t"",""host_total_listings_count"":33.0,""latitude"":37.74699,""longitude"":-122.43596,""property_type"":""Townhouse"",""room_type"":""Private room"",""accommodates"":8.0,""bathrooms"":3.0,""bedrooms"":3.0,""beds"":4.0,""bed_type"":""Real Bed"",""minimum_nights"":30.0,""number_of_reviews"":252.0,""review_scores_rating"":100.0,""review_scores_accuracy"":10.0,""review_scores_cleanliness"":10.0,""review_scores_checkin"":10.0,""review_scores_communication"":10.0,""review_scores_location"":10.0,""review_scores_value"":10.0,""price"":495.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":1.0,""review_scores_accuracy_na"":1.0,""review_scores_cleanliness_na"":1.0,""review_scores_checkin_na"":1.0,""review_scores_communication_na"":1.0,""review_scores_location_na"":1.0},""nullCount"":{""host_is_superhost"":0,""cancellation_policy"":0,""instant_bookable"":0,""host_total_listings_count"":0,""latitude"":0,""longitude"":0,""property_type"":0,""room_type"":0,""accommodates"":0,""bathrooms"":0,""bedrooms"":0,""beds"":0,""bed_type"":0,""minimum_nights"":0,""number_of_reviews"":0,""review_scores_rating"":0,""review_scores_accuracy"":0,""review_scores_cleanliness"":0,""review_scores_checkin"":0,""review_scores_communication"":0,""review_scores_location"":0,""review_scores_value"":0,""price"":0,""bedrooms_na"":0,""bathrooms_na"":0,""beds_na"":0,""review_scores_rating_na"":0,""review_scores_accuracy_na"":0,""review_scores_cleanliness_na"":0,""review_scores_checkin_na"":0,""review_scores_communication_na"":0,""review_scores_location_na"":0}}, List(1661336795000005, 268435456))",,,
"List(true, 1661336797000, List(Downtown/Civic Center), neighbourhood_cleansed=Downtown%252FCivic%20Center/part-00000-020bfdf5-c7d5-49e5-9ef2-0b27938dc893.c000.snappy.parquet, 25849, {""numRecords"":538,""minValues"":{""host_is_superhost"":""f"",""cancellation_policy"":""flexible"",""instant_bookable"":""f"",""host_total_listings_count"":1.0,""latitude"":37.77381,""longitude"":-122.42341,""property_type"":""Aparthotel"",""room_type"":""Entire home/apt"",""accommodates"":1.0,""bathrooms"":0.0,""bedrooms"":0.0,""beds"":0.0,""bed_type"":""Pull-out Sofa"",""minimum_nights"":1.0,""number_of_reviews"":0.0,""review_scores_rating"":20.0,""review_scores_accuracy"":2.0,""review_scores_cleanliness"":4.0,""review_scores_checkin"":6.0,""review_scores_communication"":4.0,""review_scores_location"":2.0,""review_scores_value"":2.0,""price"":19.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":0.0,""review_scores_accuracy_na"":0.0,""review_scores_cleanliness_na"":0.0,""review_scores_checkin_na"":0.0,""review_scores_communication_na"":0.0,""review_scores_location_na"":0.0},""maxValues"":{""host_is_superhost"":""t"",""cancellation_policy"":""super_strict_30"",""instant_bookable"":""t"",""host_total_listings_count"":852.0,""latitude"":37.79025,""longitude"":-122.40678,""property_type"":""Townhouse"",""room_type"":""Shared room"",""accommodates"":14.0,""bathrooms"":14.0,""bedrooms"":14.0,""beds"":14.0,""bed_type"":""Real Bed"",""minimum_nights"":180.0,""number_of_reviews"":431.0,""review_scores_rating"":100.0,""review_scores_accuracy"":10.0,""review_scores_cleanliness"":10.0,""review_scores_checkin"":10.0,""review_scores_communication"":10.0,""review_scores_location"":10.0,""review_scores_value"":10.0,""price"":1200.0,""bedrooms_na"":1.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":1.0,""review_scores_accuracy_na"":1.0,""review_scores_cleanliness_na"":1.0,""review_scores_checkin_na"":1.0,""review_scores_communication_na"":1.0,""review_scores_location_na"":1.0},""nullCount"":{""host_is_superhost"":0,""cancellation_policy"":0,""instant_bookable"":0,""host_total_listings_count"":0,""latitude"":0,""longitude"":0,""property_type"":0,""room_type"":0,""accommodates"":0,""bathrooms"":0,""bedrooms"":0,""beds"":0,""bed_type"":0,""minimum_nights"":0,""number_of_reviews"":0,""review_scores_rating"":0,""review_scores_accuracy"":0,""review_scores_cleanliness"":0,""review_scores_checkin"":0,""review_scores_communication"":0,""review_scores_location"":0,""review_scores_value"":0,""price"":0,""bedrooms_na"":0,""bathrooms_na"":0,""beds_na"":0,""review_scores_rating_na"":0,""review_scores_accuracy_na"":0,""review_scores_cleanliness_na"":0,""review_scores_checkin_na"":0,""review_scores_communication_na"":0,""review_scores_location_na"":0}}, List(1661336795000006, 268435456))",,,
"List(true, 1661336797000, List(Excelsior), neighbourhood_cleansed=Excelsior/part-00000-867f3bb8-4f90-417f-8081-7e8ce3938206.c000.snappy.parquet, 16336, {""numRecords"":162,""minValues"":{""host_is_superhost"":""f"",""cancellation_policy"":""flexible"",""instant_bookable"":""f"",""host_total_listings_count"":1.0,""latitude"":37.71554,""longitude"":-122.43832,""property_type"":""Apartment"",""room_type"":""Entire home/apt"",""accommodates"":1.0,""bathrooms"":0.0,""bedrooms"":0.0,""beds"":0.0,""bed_type"":""Airbed"",""minimum_nights"":1.0,""number_of_reviews"":0.0,""review_scores_rating"":67.0,""review_scores_accuracy"":7.0,""review_scores_cleanliness"":7.0,""review_scores_checkin"":8.0,""review_scores_communication"":8.0,""review_scores_location"":7.0,""review_scores_value"":6.0,""price"":30.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":0.0,""review_scores_accuracy_na"":0.0,""review_scores_cleanliness_na"":0.0,""review_scores_checkin_na"":0.0,""review_scores_communication_na"":0.0,""review_scores_location_na"":0.0},""maxValues"":{""host_is_superhost"":""t"",""cancellation_policy"":""strict_14_with_grace_period"",""instant_bookable"":""t"",""host_total_listings_count"":41.0,""latitude"":37.7338,""longitude"":-122.40463,""property_type"":""Townhouse"",""room_type"":""Private room"",""accommodates"":16.0,""bathrooms"":5.0,""bedrooms"":5.0,""beds"":9.0,""bed_type"":""Real Bed"",""minimum_nights"":31.0,""number_of_reviews"":314.0,""review_scores_rating"":100.0,""review_scores_accuracy"":10.0,""review_scores_cleanliness"":10.0,""review_scores_checkin"":10.0,""review_scores_communication"":10.0,""review_scores_location"":10.0,""review_scores_value"":10.0,""price"":599.0,""bedrooms_na"":0.0,""bathrooms_na"":1.0,""beds_na"":0.0,""review_scores_rating_na"":1.0,""review_scores_accuracy_na"":1.0,""review_scores_cleanliness_na"":1.0,""review_scores_checkin_na"":1.0,""review_scores_communication_na"":1.0,""review_scores_location_na"":1.0},""nullCount"":{""host_is_superhost"":0,""cancellation_policy"":0,""instant_bookable"":0,""host_total_listings_count"":0,""latitude"":0,""longitude"":0,""property_type"":0,""room_type"":0,""accommodates"":0,""bathrooms"":0,""bedrooms"":0,""beds"":0,""bed_type"":0,""minimum_nights"":0,""number_of_reviews"":0,""review_scores_rating"":0,""review_scores_accuracy"":0,""review_scores_cleanliness"":0,""review_scores_checkin"":0,""review_scores_communication"":0,""review_scores_location"":0,""review_scores_value"":0,""price"":0,""bedrooms_na"":0,""bathrooms_na"":0,""beds_na"":0,""review_scores_rating_na"":0,""review_scores_accuracy_na"":0,""review_scores_cleanliness_na"":0,""review_scores_checkin_na"":0,""review_scores_communication_na"":0,""review_scores_location_na"":0}}, List(1661336795000007, 268435456))",,,
"List(true, 1661336798000, List(Financial District), neighbourhood_cleansed=Financial%20District/part-00000-4005f7fe-170b-474f-8f6e-0277c0c10b8c.c000.snappy.parquet, 15492, {""numRecords"":134,""minValues"":{""host_is_superhost"":""f"",""cancellation_policy"":""flexible"",""instant_bookable"":""f"",""host_total_listings_count"":1.0,""latitude"":37.78288,""longitude"":-122.40693,""property_type"":""Apartment"",""room_type"":""Entire home/apt"",""accommodates"":1.0,""bathrooms"":0.0,""bedrooms"":0.0,""beds"":1.0,""bed_type"":""Couch"",""minimum_nights"":1.0,""number_of_reviews"":0.0,""review_scores_rating"":70.0,""review_scores_accuracy"":6.0,""review_scores_cleanliness"":6.0,""review_scores_checkin"":4.0,""review_scores_communication"":6.0,""review_scores_location"":8.0,""review_scores_value"":6.0,""price"":45.0,""bedrooms_na"":0.0,""bathrooms_na"":0.0,""beds_na"":0.0,""review_scores_rating_na"":0.0,""review_scores_accuracy_na"":0.0,""review_scores_cleanliness_na"":0.0,""review_scores_checkin_na"":0.0,""review_scores_communication_na"":0.0,""review_scores_location_na"":0.0},""maxValues"":{""host_is_superhost"":""t"",""cancellation_policy"":""super_strict_60"",""instant_bookable"":""t"",""host_total_listings_count"":1199.0,""latitude"":37.79881,""longitude"":-122.39103,""property_type"":""Timeshare"",""room_type"":""Shared room"",""accommodates"":8.0,""bathrooms"":3.5,""bedrooms"":3.0,""beds"":5.0,""bed_type"":""Real Bed"",""minimum_nights"":183.0,""number_of_reviews"":356.0,""review_scores_rating"":100.0,""review_scores_accuracy"":10.0,""review_scores_cleanliness"":10.0,""review_scores_checkin"":10.0,""review_scores_communication"":10.0,""review_scores_location"":10.0,""review_scores_value"":10.0,""price"":2010.0,""bedrooms_na"":0.0,""bathrooms_na"":1.0,""beds_na"":0.0,""review_scores_rating_na"":1.0,""review_scores_accuracy_na"":1.0,""review_scores_cleanliness_na"":1.0,""review_scores_checkin_na"":1.0,""review_scores_communication_na"":1.0,""review_scores_location_na"":1.0},""nullCount"":{""host_is_superhost"":0,""cancellation_policy"":0,""instant_bookable"":0,""host_total_listings_count"":0,""latitude"":0,""longitude"":0,""property_type"":0,""room_type"":0,""accommodates"":0,""bathrooms"":0,""bedrooms"":0,""beds"":0,""bed_type"":0,""minimum_nights"":0,""number_of_reviews"":0,""review_scores_rating"":0,""review_scores_accuracy"":0,""review_scores_cleanliness"":0,""review_scores_checkin"":0,""review_scores_communication"":0,""review_scores_location"":0,""review_scores_value"":0,""price"":0,""bedrooms_na"":0,""bathrooms_na"":0,""beds_na"":0,""review_scores_rating_na"":0,""review_scores_accuracy_na"":0,""review_scores_cleanliness_na"":0,""review_scores_checkin_na"":0,""review_scores_communication_na"":0,""review_scores_location_na"":0}}, List(1661336795000008, 268435456))",,,


Finally, let's take a look at the files inside one of the Neighborhood partitions. The file inside corresponds to the partition commit (file 01) in the _delta_log directory.

In [0]:
display(dbutils.fs.ls(working_dir + "/neighbourhood_cleansed=Bayview/"))

path,name,size,modificationTime
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Bayview/part-00000-54e44e99-a4f5-4311-a3ef-4068bdf0e775.c000.snappy.parquet,part-00000-54e44e99-a4f5-4311-a3ef-4068bdf0e775.c000.snappy.parquet,16372,1661336795000


### Reading data from your Delta table

In [0]:
df = spark.read.format("delta").load(working_dir)
display(df)

host_is_superhost,cancellation_policy,instant_bookable,host_total_listings_count,neighbourhood_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,minimum_nights,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,price,bedrooms_na,bathrooms_na,beds_na,review_scores_rating_na,review_scores_accuracy_na,review_scores_cleanliness_na,review_scores_checkin_na,review_scores_communication_na,review_scores_location_na,review_scores_value_na
t,strict_14_with_grace_period,t,1.0,Castro/Upper Market,37.76075,-122.43032,Apartment,Private room,1.0,1.0,1.0,1.0,Real Bed,3.0,390.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,79.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,1.0,Castro/Upper Market,37.75963,-122.44143,House,Entire home/apt,2.0,1.0,1.0,1.0,Real Bed,3.0,353.0,97.0,10.0,10.0,10.0,10.0,10.0,10.0,155.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,moderate,f,1.0,Castro/Upper Market,37.76298,-122.43136,Apartment,Entire home/apt,2.0,1.0,1.0,1.0,Real Bed,30.0,129.0,100.0,10.0,10.0,10.0,10.0,10.0,10.0,165.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,flexible,f,4.0,Castro/Upper Market,37.76125,-122.43335,House,Entire home/apt,3.0,1.0,1.0,1.0,Real Bed,4.0,43.0,94.0,10.0,9.0,10.0,10.0,10.0,10.0,159.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,moderate,f,4.0,Castro/Upper Market,37.76068,-122.43331,Apartment,Entire home/apt,2.0,1.0,0.0,2.0,Real Bed,3.0,121.0,94.0,10.0,9.0,10.0,10.0,10.0,9.0,116.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,2.0,Castro/Upper Market,37.75969,-122.44445,Guest suite,Entire home/apt,2.0,1.0,0.0,1.0,Real Bed,4.0,222.0,95.0,10.0,10.0,9.0,10.0,10.0,10.0,85.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,1.0,Castro/Upper Market,37.76102,-122.43005,Condominium,Private room,1.0,1.0,1.0,1.0,Real Bed,1.0,501.0,97.0,10.0,10.0,10.0,10.0,10.0,10.0,125.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,1.0,Castro/Upper Market,37.75887,-122.43565,Apartment,Entire home/apt,3.0,1.0,1.0,2.0,Real Bed,3.0,137.0,99.0,10.0,10.0,10.0,10.0,10.0,10.0,328.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,flexible,f,1.0,Castro/Upper Market,37.75823,-122.43273,House,Private room,2.0,1.0,1.0,1.0,Real Bed,5.0,57.0,99.0,10.0,10.0,10.0,10.0,10.0,10.0,150.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,flexible,f,1.0,Castro/Upper Market,37.76787,-122.43172,Apartment,Entire home/apt,6.0,1.0,2.0,4.0,Real Bed,5.0,19.0,99.0,10.0,9.0,10.0,10.0,10.0,10.0,275.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#Updating your Delta Table

Let's filter for rows where the host is a superhost.

In [0]:
df_update = airbnb_df.filter(airbnb_df["host_is_superhost"] == "t")
display(df_update)

host_is_superhost,cancellation_policy,instant_bookable,host_total_listings_count,neighbourhood_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,minimum_nights,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,price,bedrooms_na,bathrooms_na,beds_na,review_scores_rating_na,review_scores_accuracy_na,review_scores_cleanliness_na,review_scores_checkin_na,review_scores_communication_na,review_scores_location_na,review_scores_value_na
t,moderate,t,1.0,Western Addition,37.76931,-122.43386,Apartment,Entire home/apt,3.0,1.0,1.0,2.0,Real Bed,1.0,180.0,97.0,10.0,10.0,10.0,10.0,10.0,10.0,170.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,t,2.0,Mission,37.75919,-122.42237,Condominium,Private room,3.0,1.0,1.0,2.0,Real Bed,1.0,647.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,139.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,moderate,f,1.0,Mission,37.75874,-122.41327,Apartment,Entire home/apt,6.0,1.0,2.0,3.0,Real Bed,3.0,320.0,96.0,10.0,10.0,10.0,10.0,10.0,9.0,265.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,t,1.0,Castro/Upper Market,37.76075,-122.43032,Apartment,Private room,1.0,1.0,1.0,1.0,Real Bed,3.0,390.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,79.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,2.0,Inner Sunset,37.76203,-122.45455,Townhouse,Entire home/apt,3.0,1.0,2.0,3.0,Real Bed,30.0,16.0,95.0,9.0,9.0,9.0,9.0,9.0,9.0,136.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,1.0,Noe Valley,37.74888,-122.42982,Apartment,Entire home/apt,3.0,1.0,0.0,1.0,Real Bed,30.0,61.0,96.0,10.0,10.0,10.0,10.0,10.0,10.0,107.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,3.0,Western Addition,37.77252,-122.43216,Townhouse,Private room,2.0,1.0,1.0,1.0,Real Bed,2.0,363.0,97.0,10.0,10.0,10.0,10.0,10.0,10.0,110.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,moderate,f,1.0,Mission,37.76349,-122.41517,Guest suite,Entire home/apt,2.0,1.0,0.0,1.0,Real Bed,5.0,227.0,96.0,10.0,10.0,10.0,10.0,9.0,9.0,198.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,moderate,f,1.0,Nob Hill,37.7958,-122.41533,Loft,Entire home/apt,2.0,1.0,1.0,1.0,Real Bed,30.0,119.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,125.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,t,2.0,Bernal Heights,37.74556,-122.41207,Condominium,Private room,1.0,1.0,1.0,1.0,Real Bed,3.0,234.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [0]:
df_update.write.format("delta").mode("overwrite").save(working_dir)

In [0]:
df = spark.read.format("delta").load(working_dir)
display(df)

host_is_superhost,cancellation_policy,instant_bookable,host_total_listings_count,neighbourhood_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,minimum_nights,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,price,bedrooms_na,bathrooms_na,beds_na,review_scores_rating_na,review_scores_accuracy_na,review_scores_cleanliness_na,review_scores_checkin_na,review_scores_communication_na,review_scores_location_na,review_scores_value_na
t,strict_14_with_grace_period,t,1.0,Castro/Upper Market,37.76075,-122.43032,Apartment,Private room,1.0,1.0,1.0,1.0,Real Bed,3.0,390.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,79.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,1.0,Castro/Upper Market,37.75963,-122.44143,House,Entire home/apt,2.0,1.0,1.0,1.0,Real Bed,3.0,353.0,97.0,10.0,10.0,10.0,10.0,10.0,10.0,155.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,2.0,Castro/Upper Market,37.75969,-122.44445,Guest suite,Entire home/apt,2.0,1.0,0.0,1.0,Real Bed,4.0,222.0,95.0,10.0,10.0,9.0,10.0,10.0,10.0,85.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,1.0,Castro/Upper Market,37.76102,-122.43005,Condominium,Private room,1.0,1.0,1.0,1.0,Real Bed,1.0,501.0,97.0,10.0,10.0,10.0,10.0,10.0,10.0,125.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,1.0,Castro/Upper Market,37.75887,-122.43565,Apartment,Entire home/apt,3.0,1.0,1.0,2.0,Real Bed,3.0,137.0,99.0,10.0,10.0,10.0,10.0,10.0,10.0,328.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,flexible,f,1.0,Castro/Upper Market,37.75823,-122.43273,House,Private room,2.0,1.0,1.0,1.0,Real Bed,5.0,57.0,99.0,10.0,10.0,10.0,10.0,10.0,10.0,150.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,moderate,f,1.0,Castro/Upper Market,37.7684,-122.43086,Apartment,Private room,2.0,1.0,1.0,1.0,Real Bed,3.0,178.0,95.0,10.0,10.0,10.0,10.0,10.0,10.0,99.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,f,5.0,Castro/Upper Market,37.75967,-122.44257,Serviced apartment,Entire home/apt,5.0,1.0,2.0,3.0,Real Bed,120.0,16.0,99.0,10.0,10.0,10.0,10.0,10.0,9.0,175.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,t,1.0,Castro/Upper Market,37.76188,-122.44215,Condominium,Private room,2.0,1.0,1.0,1.0,Real Bed,1.0,540.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,139.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,t,1.0,Castro/Upper Market,37.76294,-122.43974,Apartment,Private room,2.0,1.5,1.0,1.0,Real Bed,2.0,162.0,94.0,10.0,10.0,10.0,10.0,10.0,10.0,115.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's look at the files in the Bayview partition post-update. Remember, the different files in this directory are snapshots of your DataFrame corresponding to different commits.

In [0]:
display(dbutils.fs.ls(working_dir + "/neighbourhood_cleansed=Bayview/"))

path,name,size,modificationTime
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Bayview/part-00000-16a3e174-2bd2-498b-a650-73419279ec19.c000.snappy.parquet,part-00000-16a3e174-2bd2-498b-a650-73419279ec19.c000.snappy.parquet,13009,1661340032000
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Bayview/part-00000-54e44e99-a4f5-4311-a3ef-4068bdf0e775.c000.snappy.parquet,part-00000-54e44e99-a4f5-4311-a3ef-4068bdf0e775.c000.snappy.parquet,16372,1661336795000


#Delta Time Travel

Oops, actually we need the entire dataset! You can access a previous version of your Delta Table using <a href="https://databricks.com/blog/2019/02/04/introducing-delta-time-travel-for-large-scale-data-lakes.html" target="_blank">Delta Time Travel</a>. Use the following two cells to access your version history. Delta Lake will keep a 30 day version history by default, though it can maintain that history for longer if needed.

In [0]:
spark.sql("DROP TABLE IF EXISTS train_delta")
spark.sql(f"CREATE TABLE train_delta USING DELTA LOCATION '{working_dir}'")

In [0]:
%sql
DESCRIBE HISTORY train_delta

version,timestamp,userId,userName,operation,operationParameters,job,notebook,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
2,2022-08-24T11:20:42.000+0000,6997591375752473,manujkumar.joshi@celebaltech.com,WRITE,"Map(mode -> Overwrite, partitionBy -> [])",,List(2051889157726300),0822-094520-n0irdqef,1.0,WriteSerializable,False,"Map(numFiles -> 36, numOutputRows -> 2931, numOutputBytes -> 492523)",,Databricks-Runtime/10.4.x-scala2.12
1,2022-08-24T10:26:43.000+0000,6997591375752473,manujkumar.joshi@celebaltech.com,WRITE,"Map(mode -> Overwrite, partitionBy -> [""neighbourhood_cleansed""])",,List(2051889157726300),0822-094520-n0irdqef,0.0,WriteSerializable,False,"Map(numFiles -> 36, numOutputRows -> 7146, numOutputBytes -> 618098)",,Databricks-Runtime/10.4.x-scala2.12
0,2022-08-24T10:11:42.000+0000,6997591375752473,manujkumar.joshi@celebaltech.com,WRITE,"Map(mode -> Overwrite, partitionBy -> [])",,List(2051889157726300),0822-094520-n0irdqef,,WriteSerializable,False,"Map(numFiles -> 1, numOutputRows -> 7146, numOutputBytes -> 191755)",,Databricks-Runtime/10.4.x-scala2.12


Using the **`versionAsOf`** option allows you to easily access previous versions of our Delta Table.

In [0]:
df = spark.read.format("delta").option("versionAsOf", 0).load(working_dir)
display(df)

host_is_superhost,cancellation_policy,instant_bookable,host_total_listings_count,neighbourhood_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,minimum_nights,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,price,bedrooms_na,bathrooms_na,beds_na,review_scores_rating_na,review_scores_accuracy_na,review_scores_cleanliness_na,review_scores_checkin_na,review_scores_communication_na,review_scores_location_na,review_scores_value_na
t,moderate,t,1.0,Western Addition,37.76931,-122.43386,Apartment,Entire home/apt,3.0,1.0,1.0,2.0,Real Bed,1.0,180.0,97.0,10.0,10.0,10.0,10.0,10.0,10.0,170.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,2.0,Bernal Heights,37.74511,-122.42102,Apartment,Entire home/apt,5.0,1.0,2.0,3.0,Real Bed,30.0,111.0,98.0,10.0,10.0,10.0,10.0,10.0,9.0,235.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,10.0,Haight Ashbury,37.76669,-122.4525,Apartment,Private room,2.0,4.0,1.0,1.0,Real Bed,32.0,17.0,85.0,8.0,8.0,9.0,9.0,9.0,8.0,65.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,10.0,Haight Ashbury,37.76487,-122.45183,Apartment,Private room,2.0,4.0,1.0,1.0,Real Bed,32.0,8.0,93.0,9.0,9.0,10.0,10.0,9.0,9.0,65.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,2.0,Western Addition,37.77525,-122.43637,House,Entire home/apt,5.0,1.5,2.0,2.0,Real Bed,7.0,27.0,97.0,10.0,10.0,10.0,10.0,10.0,9.0,785.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,moderate,f,1.0,Western Addition,37.78471,-122.44555,Apartment,Entire home/apt,6.0,1.0,2.0,3.0,Real Bed,2.0,31.0,90.0,9.0,8.0,10.0,10.0,9.0,9.0,255.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,t,2.0,Mission,37.75919,-122.42237,Condominium,Private room,3.0,1.0,1.0,2.0,Real Bed,1.0,647.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,139.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,1.0,Potrero Hill,37.76259,-122.40543,House,Private room,2.0,1.0,1.0,1.0,Real Bed,1.0,453.0,94.0,10.0,10.0,10.0,10.0,10.0,10.0,135.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,moderate,f,1.0,Mission,37.75874,-122.41327,Apartment,Entire home/apt,6.0,1.0,2.0,3.0,Real Bed,3.0,320.0,96.0,10.0,10.0,10.0,10.0,10.0,9.0,265.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,44.0,Haight Ashbury,37.77187,-122.43859,Apartment,Entire home/apt,3.0,1.0,3.0,3.0,Real Bed,30.0,37.0,89.0,9.0,9.0,10.0,9.0,9.0,9.0,177.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


**You can also access older versions using a timestamp**.

Replace the timestamp string with the information from your version history. Note that you can use a date without the time information if necessary.

In [0]:
# Use your own timestamp 
# time_stamp_string = "FILL_IN"

# OR programatically get the first verion's timestamp value
time_stamp_string = str(spark.sql("DESCRIBE HISTORY train_delta").collect()[-1]["timestamp"])

df = spark.read.format("delta").option("timestampAsOf", time_stamp_string).load(working_dir)
display(df)

host_is_superhost,cancellation_policy,instant_bookable,host_total_listings_count,neighbourhood_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,minimum_nights,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,price,bedrooms_na,bathrooms_na,beds_na,review_scores_rating_na,review_scores_accuracy_na,review_scores_cleanliness_na,review_scores_checkin_na,review_scores_communication_na,review_scores_location_na,review_scores_value_na
t,moderate,t,1.0,Western Addition,37.76931,-122.43386,Apartment,Entire home/apt,3.0,1.0,1.0,2.0,Real Bed,1.0,180.0,97.0,10.0,10.0,10.0,10.0,10.0,10.0,170.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,2.0,Bernal Heights,37.74511,-122.42102,Apartment,Entire home/apt,5.0,1.0,2.0,3.0,Real Bed,30.0,111.0,98.0,10.0,10.0,10.0,10.0,10.0,9.0,235.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,10.0,Haight Ashbury,37.76669,-122.4525,Apartment,Private room,2.0,4.0,1.0,1.0,Real Bed,32.0,17.0,85.0,8.0,8.0,9.0,9.0,9.0,8.0,65.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,10.0,Haight Ashbury,37.76487,-122.45183,Apartment,Private room,2.0,4.0,1.0,1.0,Real Bed,32.0,8.0,93.0,9.0,9.0,10.0,10.0,9.0,9.0,65.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,2.0,Western Addition,37.77525,-122.43637,House,Entire home/apt,5.0,1.5,2.0,2.0,Real Bed,7.0,27.0,97.0,10.0,10.0,10.0,10.0,10.0,9.0,785.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,moderate,f,1.0,Western Addition,37.78471,-122.44555,Apartment,Entire home/apt,6.0,1.0,2.0,3.0,Real Bed,2.0,31.0,90.0,9.0,8.0,10.0,10.0,9.0,9.0,255.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,strict_14_with_grace_period,t,2.0,Mission,37.75919,-122.42237,Condominium,Private room,3.0,1.0,1.0,2.0,Real Bed,1.0,647.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,139.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,1.0,Potrero Hill,37.76259,-122.40543,House,Private room,2.0,1.0,1.0,1.0,Real Bed,1.0,453.0,94.0,10.0,10.0,10.0,10.0,10.0,10.0,135.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
t,moderate,f,1.0,Mission,37.75874,-122.41327,Apartment,Entire home/apt,6.0,1.0,2.0,3.0,Real Bed,3.0,320.0,96.0,10.0,10.0,10.0,10.0,10.0,9.0,265.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
f,strict_14_with_grace_period,f,44.0,Haight Ashbury,37.77187,-122.43859,Apartment,Entire home/apt,3.0,1.0,3.0,3.0,Real Bed,30.0,37.0,89.0,9.0,9.0,10.0,9.0,9.0,9.0,177.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now that we're happy with our Delta Table, we can clean up our directory using **`VACUUM`**. Vacuum accepts a retention period in hours as an input.

Uh-oh, our code doesn't run! By default, to prevent accidentally vacuuming recent commits, Delta Lake will not let users vacuum a period under 7 days or 168 hours. Once vacuumed, you cannot return to a prior commit through time travel, only your most recent Delta Table will be saved.

Try changing the vacuum parameter to different values.

In [0]:
# from delta.tables import DeltaTable

# delta_table = DeltaTable.forPath(spark, working_dir)
# delta_table.vacuum(0)

We can workaround this by setting a spark configuration that will bypass the default retention period check.

In [0]:
from delta.tables import DeltaTable

spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", "false")
delta_table = DeltaTable.forPath(spark, working_dir)
delta_table.vacuum(0)

Let's take a look at our Delta Table files now. After vacuuming, the directory only holds the partition of our most recent Delta Table commit.

In [0]:
display(dbutils.fs.ls(working_dir + "/neighbourhood_cleansed=Bayview/"))

path,name,size,modificationTime
dbfs:/user/manujkumar.joshi@celebaltech.com/dbacademy/machine_learning/ml_00c_delta_review/neighbourhood_cleansed=Bayview/part-00000-16a3e174-2bd2-498b-a650-73419279ec19.c000.snappy.parquet,part-00000-16a3e174-2bd2-498b-a650-73419279ec19.c000.snappy.parquet,13009,1661340032000


Since vacuuming deletes files referenced by the Delta Table, we can no longer access past versions. The code below should throw an error.

In [0]:
# df = spark.read.format("delta").option("versionAsOf", 0).load(working_dir)
# display(df)

-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>