# Ames Housing Dataset Ingestion Notebook

This notebook demonstrates how to download the Ames Housing dataset from Kaggle, copy it to a Unity Catalog volume in Databricks (AWS), and prepare it for further analysis. The workflow includes setting up widgets for catalog/schema/volume selection, downloading the dataset, and copying it to the desired location.

**Workflow Steps:**
1. Set up widgets for catalog, schema, and volume selection.
2. Retrieve widget values and construct the volume path.
3. Download the dataset using `kagglehub`.
4. Copy the dataset files to the specified Unity Catalog volume.


In [0]:
# Step 1: Set up widgets for catalog, schema, and volume selection
# These widgets allow you to easily change the target Unity Catalog location for your data

dbutils.widgets.text("catalog_use", "main", "Catalog")
dbutils.widgets.text("schema_use", "default", "Schema")
dbutils.widgets.text("volumes_use", "landing", "Volume")

In [0]:
# Step 2: Retrieve widget values
catalog_use = dbutils.widgets.get("catalog_use")  # Retrieve the value of the catalog_use widget
schema_use = dbutils.widgets.get("schema_use")    # Retrieve the value of the schema_use widget
volumes_use = dbutils.widgets.get("volumes_use")  # Retrieve the value of the volumes_use widget

In [0]:
# Step 3: Construct the path to the target Unity Catalog volume
volume_path = f"/Volumes/{catalog_use}/{schema_use}/{volumes_use}/"

In [0]:
# Step 4: Import required libraries
import kagglehub

In [0]:
# Step 5: Download the latest version of the Ames Housing dataset from Kaggle
# The dataset will be downloaded to a local path on the driver
path = kagglehub.dataset_download("shashanknecrothapa/ames-housing-dataset")

print("Path to dataset files:", path)

In [0]:
# Step 6: Copy the downloaded dataset files to the Unity Catalog volume
# This makes the data available for distributed processing in Databricks
try:
    dbutils.fs.cp(f"file:{path}", volume_path, recurse=True)
    print(f"Dataset files successfully copied to {volume_path}")
except Exception as e:
    print(f"Error copying files: {e}")