#### CONVERT TO DELTA IN DELTA LAKE

- Consider we have a hive table which is of parquet format and we need to convert it into Delta table.
- By using Convert to Delta approach we can convert classical parquet table to Delta table

In [0]:
%sql
create database my_db

- preparing the data

In [0]:

data = [
    (1, "Alice", "2024-01-01", "NY"),
    (2, "Bob", "2024-01-02", "CA"),
    (3, "Charlie", "2024-01-03", "TX"),
    (4, "David", "2024-01-04", "FL"),
    (5, "Eve", "2024-01-05", "IL"),
    (6, "Frank", "2024-01-06", "NY"),
    (7, "Grace", "2024-01-07", "CA"),
    (8, "Hank", "2024-01-08", "TX"),
    (9, "Ivy", "2024-01-09", "FL"),
    (10, "Jack", "2024-01-10", "IL")
]

columns = ["id", "name", "date", "state"]

df = spark.createDataFrame(data, columns)


In [0]:
## Writing the data into a table (not delta table)

(
    df.write
    .format('parquet')
    .mode('overwrite')
    .partitionBy("state")
    .saveAsTable('my_db.persons')
)

In [0]:
%sql
describe detail my_db.persons

format,id,name,description,location,createdAt,lastModified,partitionColumns,numFiles,sizeInBytes,properties,minReaderVersion,minWriterVersion,tableFeatures,statistics
parquet,,spark_catalog.my_db.persons,,dbfs:/user/hive/warehouse/my_db.db/persons,2025-01-29T12:20:27.000+0000,,List(state),,,Map(),,,,Map()


- Checking the data inside the underlying table path

In [0]:
%fs
ls dbfs:/user/hive/warehouse/my_db.db/persons

path,name,size,modificationTime
dbfs:/user/hive/warehouse/my_db.db/persons/_SUCCESS,_SUCCESS,0,1738153226000
dbfs:/user/hive/warehouse/my_db.db/persons/state=CA/,state=CA/,0,0
dbfs:/user/hive/warehouse/my_db.db/persons/state=FL/,state=FL/,0,0
dbfs:/user/hive/warehouse/my_db.db/persons/state=IL/,state=IL/,0,0
dbfs:/user/hive/warehouse/my_db.db/persons/state=NY/,state=NY/,0,0
dbfs:/user/hive/warehouse/my_db.db/persons/state=TX/,state=TX/,0,0


- In the above result, we cannot see the _delta_log folder, so it is not a Delta table.
- You can also confirm by running DESCRIBE HISTORY {table_name} , if it is a delta table it will not throw any error.

* Converting the table to delta format

In [0]:
%sql
CONVERT TO DELTA my_db.persons
PARTITIONED BY (state string)

In [0]:
%fs
ls dbfs:/user/hive/warehouse/my_db.db/persons

path,name,size,modificationTime
dbfs:/user/hive/warehouse/my_db.db/persons/_SUCCESS,_SUCCESS,0,1738153226000
dbfs:/user/hive/warehouse/my_db.db/persons/_delta_log/,_delta_log/,0,0
dbfs:/user/hive/warehouse/my_db.db/persons/state=CA/,state=CA/,0,0
dbfs:/user/hive/warehouse/my_db.db/persons/state=FL/,state=FL/,0,0
dbfs:/user/hive/warehouse/my_db.db/persons/state=IL/,state=IL/,0,0
dbfs:/user/hive/warehouse/my_db.db/persons/state=NY/,state=NY/,0,0
dbfs:/user/hive/warehouse/my_db.db/persons/state=TX/,state=TX/,0,0


In [0]:
%sql
describe history my_db.persons

version,timestamp,userId,userName,operation,operationParameters,job,notebook,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
0,2025-01-29T12:28:20.000+0000,6821501072026142,rockyrams1998@gmail.com,CONVERT,"Map(numFiles -> 10, partitionedBy -> [""state""], collectStats -> true, catalogTable -> `spark_catalog`.`my_db`.`persons`)",,List(4262887905780432),0129-120610-n6g9lwij,-1,Serializable,False,Map(numConvertedFiles -> 10),,Databricks-Runtime/12.2.x-scala2.12


- In the above 2 results that , _delta_log folder has been added and also we can see the version history of the table.
- So we successfully converted classical parquet table to Delta table

#### HAPPY LEARNING