#### **dbutils.fs.mkdirs**: Create Directories and Files

- Utility can be used to create **new directories** and **add new files/scripts** within the newly created directories.

- The example below shows how **dbutils.fs.mkdirs()** can be used to create a **new directory** called **scripts** within “dbfs” file system.

In [0]:
dbutils.fs.help("mkdirs")

In [0]:
dbutils.fs.help("put")

In [0]:
# create directory "databricks"
# create script within directory "databricks"
dbutils.fs.mkdirs("dbfs:/databricks/scripts/")

Out[4]: True

In [0]:
dbutils.fs.rm("dbfs:/databricks", True)

Out[3]: True

In [0]:
%fs ls

path,name,size,modificationTime
dbfs:/FileStore/,FileStore/,0,0
dbfs:/content/,content/,0,0
dbfs:/data/,data/,0,0
dbfs:/databricks/,databricks/,0,0
dbfs:/databricks-datasets/,databricks-datasets/,0,0
dbfs:/databricks-results/,databricks-results/,0,0
dbfs:/local_disk0/,local_disk0/,0,0
dbfs:/user/,user/,0,0


In [0]:
%fs ls dbfs:/databricks/

path,name,size,modificationTime
dbfs:/databricks/mlflow-registry/,mlflow-registry/,0,0
dbfs:/databricks/mlflow-tracking/,mlflow-tracking/,0,0
dbfs:/databricks/scripts/,scripts/,0,0


#### **dbutils.fs.put**

#### **1) Format: json**

In [0]:
dbutils.fs.put("/databricks/scripts/multiLine_nested.json", """[
  {
    "source": "catalog",
    "description": "bravia",
    "input_timestamp": 1124256609,
    "last_update_timestamp": 1524256609,
    "Address":
             {
               "country": "IND",
               "user": "Hari",
               "Location":"Bangalore",
               "Zipcode":"560103"
             }
  },
  {
    "source": "SAP",
    "description": "sony",
    "input_timestamp": 1224256609,
    "last_update_timestamp": 1424256609,
    "Address":
             {
               "country": "US",
               "user": "Rajesh",
               "Location":"Chennai",
               "Zipcode":"860103"
             }
  },
  {
    "source": "ADLS",
    "description": "bse",
    "input_timestamp": 1324256609,
    "last_update_timestamp": 1524256609,
    "Address":
             {
               "country": "CANADA",
               "user": "Lokesh",
               "Location":"Hyderabad",
               "Zipcode":"755103"
             }
  },
  {
    "source": "Blob",
    "description": "exchange",
    "input_timestamp": 1424256609,
    "last_update_timestamp": 1724256609,
    "Address":
             {
               "country": "US",
               "user": "Sharath",
               "Location":"Kochin",
               "Zipcode":"120103"
             }
  },
  {
    "source": "SQL",
    "description": "Stock",
    "input_timestamp": 1524256609,
    "last_update_timestamp": 1664256609,
    "Address":
             {
               "country": "SWEDEN",
               "user": "Sheetal",
               "Location":"Delhi",
               "Zipcode":"875103"
             }
  },
  {
    "source": "datawarehouse",
    "description": "azure",
    "input_timestamp": 1624256609,
    "last_update_timestamp": 1874256609,
    "Address":
             {
               "country": "UK",
               "user": "Raj",
               "Location":"Mumbai",
               "Zipcode":"123403"
             }
  },
  {
    "source": "oracle",
    "description": "ADF",
    "input_timestamp": 1779256609,
    "last_update_timestamp": 188256609,
    "Address":
             {
               "country": "Norway",
               "user": "Synapse",
               "Location":"Nasik",
               "Zipcode":"456103"
             }
  }
]""", True)

Wrote 2233 bytes.
Out[7]: True

In [0]:
%fs head dbfs:/databricks/scripts/multiLine_nested.json

In [0]:
df = spark.read.json("dbfs:/databricks/scripts/multiLine_nested.json", multiLine=True)
display(df)

Address,description,input_timestamp,last_update_timestamp,source
"List(Bangalore, 560103, IND, Hari)",bravia,1124256609,1524256609,catalog
"List(Chennai, 860103, US, Rajesh)",sony,1224256609,1424256609,SAP
"List(Hyderabad, 755103, CANADA, Lokesh)",bse,1324256609,1524256609,ADLS
"List(Kochin, 120103, US, Sharath)",exchange,1424256609,1724256609,Blob
"List(Delhi, 875103, SWEDEN, Sheetal)",Stock,1524256609,1664256609,SQL
"List(Mumbai, 123403, UK, Raj)",azure,1624256609,1874256609,datawarehouse
"List(Nasik, 456103, Norway, Synapse)",ADF,1779256609,188256609,oracle


#### **2) Format: txt**

In [0]:
data = "Name, Location, Domain, Country, Age\nSuresh, Bangalore, ADE, India, 25\nSampath, Bihar, Excel, India, 35\nKishore, Chennai, ADf, India, 28\nBharath, Hyderabad, Admin, India, 38\nBharani, Amaravathi, GITHUB, India, 45\nNiroop, Tituchi, Devops, India, 365\nSardar, Bangalore, JAVA, India, 32\nSwapnil, Bangalore, Automotive, India, 28\nRavi, Madurai, Python, India, 35"

dbutils.fs.put("/data/main_branch/Repo/project/sales.txt", data, True)

Wrote 358 bytes.
Out[9]: True

In [0]:
dbutils.fs.rm("dbfs:/data", True)

Out[8]: True

In [0]:
%fs head /data/main_branch/Repo/project/sales.txt

In [0]:
dft = spark.read.csv("dbfs:/data/main_branch/Repo/project/sales.txt", header=True, inferSchema=True)
display(dft)

Name,Location,Domain,Country,Age
Suresh,Bangalore,ADE,India,25.0
Sampath,Bihar,Excel,India,35.0
Kishore,Chennai,ADf,India,28.0
Bharath,Hyderabad,Admin,India,38.0
Bharani,Amaravathi,GITHUB,India,45.0
Niroop,Tituchi,Devops,India,365.0
Sardar,Bangalore,JAVA,India,32.0
Swapnil,Bangalore,Automotive,India,28.0
Ravi,Madurai,Python,India,35.0


#### **3) Format: CSV**

In [0]:
data_csv = "Index, Effective_Date, Start_Date	End_Date, Income, Delta_Value, Target_Id, Input_Timestamp_UTC, Update_Timestamp_UTC\n123, 6-Feb-23, 14-Jan-23, 6-Feb-23, 1500, 10, 1068, 1724256609000, 1724256609000\n124, 8-Jan-24, 7-Oct-23, 8-Jan-24, 1500, 10, 1068, 1724256609000, 1724256609000\n125, 6-Mar-23, 7-Feb-23, 6-Mar-23, 1500, 10, 1068, 1724256609000, 1724256609000\n126, 6-Jan-25, 9-Jan-24, 6-Jan-25, 1500, 10, 1068, 1724256609000, 1724256609000\n127, 31-Jan-24, 1-Jan-24, 31-Jan-24, 74, 12, 1065, 1724256609000, 1724256609000\n128, 31-Oct-24, 1-Oct-24, 31-Oct-24, 83, 12, 1065, 1724256609000, 1724256609000\n129, 30-Jun-24, 1-Jun-24, 30-Jun-24, 79, 11, 1065, 1724256609000, 1724256609000\n130, 9-Feb-23, 9-Feb-23, 9-Feb-23, 38, 17, 1071, 1724256609000, 1724256609000\n131, 23-Deb-23, 11-Deb-23, 25-Nov-23, 38, 17, 1071, 1724256609000, 1724256609000\n131, 23-Deb-23, 11-Deb-23, 25-Nov-23, 38, 17, 1071, 1724256609000, 1724256609000"

dbutils.fs.put("/data/main_branch/Repo/project/Marketing.csv", data_csv, True)

Wrote 918 bytes.
Out[11]: True

In [0]:
%fs head /data/main_branch/Repo/project/Marketing.csv

In [0]:
dfm = spark.read.csv("/data/main_branch/Repo/project/Marketing.csv", header=True, inferSchema=True)
display(dfm)

Index,Effective_Date,Start_Date	End_Date,Income,Delta_Value,Target_Id,Input_Timestamp_UTC,Update_Timestamp_UTC
123,6-Feb-23,14-Jan-23,6-Feb-23,1500.0,10.0,1068.0,1724256609000.0
124,8-Jan-24,7-Oct-23,8-Jan-24,1500.0,10.0,1068.0,1724256609000.0
125,6-Mar-23,7-Feb-23,6-Mar-23,1500.0,10.0,1068.0,1724256609000.0
126,6-Jan-25,9-Jan-24,6-Jan-25,1500.0,10.0,1068.0,1724256609000.0
127,31-Jan-24,1-Jan-24,31-Jan-24,74.0,12.0,1065.0,1724256609000.0
128,31-Oct-24,1-Oct-24,31-Oct-24,83.0,12.0,1065.0,1724256609000.0
129,30-Jun-24,1-Jun-24,30-Jun-24,79.0,11.0,1065.0,1724256609000.0
130,9-Feb-23,9-Feb-23,9-Feb-23,38.0,17.0,1071.0,1724256609000.0
131,23-Deb-23,11-Deb-23,25-Nov-23,38.0,17.0,1071.0,1724256609000.0


#### **4) Format: avro / parquet / orc**

- **dbutils.fs.put** is primarily for writing **string or text** data to a file in DBFS (Databricks File System).
-  It **does not** directly support writing data in **binary formats** such as **Avro, parquet and orc**. 