#### **How to delete Files from Databricks File System (DBFS)?**

- **Removes** a **file or directory** and optionally all of its contents.
- If a **file** is specified, the **recurse** parameter is **ignored**.
- If a **directory** is specified, an **error** occurs if **recurse is disabled** and the directory is not empty.

**dbutils.fs.rm()**

- How to remove Files?
- How to remove Folders?
- How to remove checkpoints?
- How to create folder and Remove folders?

**dbutils.fs.mkdirs()**

**dbutils.fs.put()**

#### **Syntax**

     dbutils.fs.rm("/FileStore/tables/Show_truncate-1.csv")
                         (or)
     dbutils.fs.rm("/FileStore/tables/circuits.csv", recurse=False)
                         (or)
     %fs rm -r /FileStore/tables/Party_Relationship.csv
                         (or)
     %fs rm -r /FileStore/tables/Parquet_Userdata3

       where,
              %fs magic command to use dbutils
              rm remove command
              -r recursive flag to delete a directory and all its contents
              /mnt/driver-daemon/jars/ path to directory

In [0]:
dbutils.fs.help("rm")

#### **Three ways to check for existance / deletion of file**

- %fs ls /FileStore/tables/

- read csv file --> df = spark.read.csv("dbfs:/FileStore/tables/MarketPrice-1.csv", header=True, inferSchema=True)

- Catalog --> DBFS --> /FileStore/tables/ --> MarketPrice-1.csv

In [0]:
%fs ls /FileStore/tables/

path,name,size,modificationTime
dbfs:/FileStore/tables/Flatten Nested Array.json,Flatten Nested Array.json,3756,1718618620000
dbfs:/FileStore/tables/MarketPrice-1.csv,MarketPrice-1.csv,19528,1728118055000
dbfs:/FileStore/tables/MarketPrice.csv,MarketPrice.csv,19528,1719656208000
dbfs:/FileStore/tables/MultiLineJSON.json/,MultiLineJSON.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON01.json/,MultiLineJSON01.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON1.json/,MultiLineJSON1.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON123.json/,MultiLineJSON123.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON2.json/,MultiLineJSON2.json/,0,0
dbfs:/FileStore/tables/Question7.csv,Question7.csv,154,1725816645000
dbfs:/FileStore/tables/RunningData_Rev02.csv,RunningData_Rev02.csv,1222,1719810609000


In [0]:
# To check whether a folder has been deleted
display(dbutils.fs.ls("/FileStore/tables/"))

path,name,size,modificationTime
dbfs:/FileStore/tables/Flatten Nested Array.json,Flatten Nested Array.json,3756,1718618620000
dbfs:/FileStore/tables/MarketPrice-1.csv,MarketPrice-1.csv,19528,1728118055000
dbfs:/FileStore/tables/MarketPrice.csv,MarketPrice.csv,19528,1719656208000
dbfs:/FileStore/tables/MultiLineJSON.json/,MultiLineJSON.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON01.json/,MultiLineJSON01.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON1.json/,MultiLineJSON1.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON123.json/,MultiLineJSON123.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON2.json/,MultiLineJSON2.json/,0,0
dbfs:/FileStore/tables/Question7.csv,Question7.csv,154,1725816645000
dbfs:/FileStore/tables/RunningData_Rev02.csv,RunningData_Rev02.csv,1222,1719810609000


In [0]:
%fs ls dbfs:/FileStore/tables/temp_data/

path,name,size,modificationTime
dbfs:/FileStore/tables/temp_data/data_16062023/,data_16062023/,0,0
dbfs:/FileStore/tables/temp_data/data_16062024/,data_16062024/,0,0
dbfs:/FileStore/tables/temp_data/sales_01.csv,sales_01.csv,1423,1728143544000
dbfs:/FileStore/tables/temp_data/sales_02.csv,sales_02.csv,1423,1728143544000


#### **1) How to remove Files?**

     dbutils.fs.rm("/FileStore/tables/Show_truncate-1.csv")
                         (or)
     dbutils.fs.rm("/FileStore/tables/circuits.csv", recurse=False)
                         (or)
     %fs rm -r /FileStore/tables/Party_Relationship.csv

**How to remove single file?**

In [0]:
dbutils.fs.rm("dbfs:/FileStore/tables/MarketPrice-1.csv")

Out[1]: False

In [0]:
df = spark.read.csv("dbfs:/FileStore/tables/MarketPrice-1.csv", header=True, inferSchema=True)

**How to remove multiple files?**

In [0]:
dbutils.fs.rm("dbfs:/FileStore/tables/RunningData_Rev02.csv")
dbutils.fs.rm("dbfs:/FileStore/tables/Sales_Collect_Rev02.csv")
dbutils.fs.rm("dbfs:/FileStore/tables/StructType-1.csv")
dbutils.fs.rm("dbfs:/FileStore/tables/StructType-2.csv")
dbutils.fs.rm("dbfs:/FileStore/tables/StructType-3.csv")
dbutils.fs.rm("dbfs:/FileStore/tables/person-1.json")

**How to check and delete files?**

In [0]:
# Check the folder and list the content
display(dbutils.fs.ls("dbfs:/user/hive/warehouse/tbl_enriched_marketprice"))

path,name,size,modificationTime
dbfs:/user/hive/warehouse/tbl_enriched_marketprice/_delta_log/,_delta_log/,0,0
dbfs:/user/hive/warehouse/tbl_enriched_marketprice/part-00000-f100a20e-3ae2-4121-b294-bdbd582ca9c7-c000.snappy.parquet,part-00000-f100a20e-3ae2-4121-b294-bdbd582ca9c7-c000.snappy.parquet,7689,1719675399000


In [0]:
dbutils.fs.rm("dbfs:/user/hive/warehouse/tbl_enriched_marketprice", recurse=true)

#### **2) How to remove folder?**

     dbutils.fs.rm("/FileStore/tables/Parquet_Userdata1", recurse=True)
                                 (or)
     dbutils.fs.rm("/FileStore/tables/Parquet_Userdata1", True)
                                 (or)
     %fs rm -r /FileStore/tables/Parquet_Userdata3
     where,
           %fs magic command to use dbutils
           rm remove command
           -r recursive flag to delete a directory and all its contents
           /mnt/driver-daemon/jars/ path to directory                            

**EX 01**

In [0]:
# remove root folder
dbutils.fs.rm("dbfs:/FileStore/tables/temp", recurse=True)

Out[1]: True

In [0]:
# remove sub folder
dbutils.fs.rm("dbfs:/FileStore/tables/temp_data/data_16062024/", recurse=True)

Out[5]: True

**EX 02**

In [0]:
data = "Name, Location, Domain, Country, Age\nSuresh, Bangalore, ADE, India, 25\nSampath, Bihar, Excel, India, 35\nKishore, Chennai, ADf, India, 28\nBharath, Hyderabad, Admin, India, 38\nBharani, Amaravathi, GITHUB, India, 45"

dbutils.fs.put("/Volumes/main/default/my-volume/hello.txt", data, True)

In [0]:
dbutils.fs.rm("dbfs:/Volumes", True)

#### **3) How to remove checkpoint?**

In [0]:
from pyspark.sql.types import StructType, StructField, IntegerType, StringType

# Define the schema based on the CSV structure
schema_csv = StructType([
    StructField("Id", IntegerType(), True),
    StructField("Name", StringType(), True),
    StructField("Age", IntegerType(), True)
])

In [0]:
df_stream_Single = spark.readStream\
                        .format("csv")\
                        .option("header", True)\
                        .schema(schema_csv)\
                        .csv("/FileStore/tables/Streaming/Stream_readStream/")
                        
display(df_stream_Single)

Id,Name,Age


In [0]:
checkpoint = "dbfs:/FileStore/tables/Streaming/Stream_checkpoint"

df_stream_Single.writeStream\
                .format('parquet')\
                .outputMode('append')\
                .option("path", "/FileStore/tables/Streaming/Stream_writeStream/")\
                .option("checkpointLocation", checkpoint)\
                .start()

display(df_stream_Single)

Id,Name,Age
1.0,Niroop,35.0
2.0,Nayani,25.0
3.0,Swaroop,33.0
4.0,Sam,29.0
5.0,Rashi,42.0
6.0,Tanvee,22.0
7.0,Shobha,34.0
8.0,Divya,37.0
9.0,Vivek,39.0
10.0,Narayan,28.0


In [0]:
dbutils.fs.rm(checkpoint, recurse=True)

Out[22]: True

#### **4) Create folder and Remove folders**

#### **mkdirs command (dbutils.fs.mkdirs)**

- Creates the given **directory if it does not exist**.

In [0]:
dbutils.fs.help("mkdirs")

In [0]:
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_checkpoint/csv")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_checkpoint/json")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_checkpoint/parquet")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_checkpoint/orc")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_checkpoint/avro")

dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_readStream/csv/")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_readStream/json/")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_readStream/parquet/")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_readStream/orc/")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_readStream/avro/")

dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_writeStream/csv/")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_writeStream/json/")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_writeStream/parquet/")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_writeStream/orc/")
dbutils.fs.mkdirs("/FileStore/tables/Streaming/Stream_writeStream/avro/")

In [0]:
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_checkpoint/csv", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_checkpoint/json", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_checkpoint/parquet", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_checkpoint/orc", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_checkpoint/avro", True)

dbutils.fs.rm("/FileStore/tables/Streaming/Stream_readStream/csv", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_readStream/json", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_readStream/parquet", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_readStream/orc", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_readStream/avro", True)

dbutils.fs.rm("/FileStore/tables/Streaming/Stream_writeStream/csv", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_writeStream/json", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_writeStream/parquet", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_writeStream/orc", True)
dbutils.fs.rm("/FileStore/tables/Streaming/Stream_writeStream/avro", True)