# Ames Housing Dataset Ingestion Notebook

This notebook demonstrates how to download the Ames Housing dataset from Kaggle, copy it to a Unity Catalog volume in Databricks (AWS), and prepare it for further analysis. The workflow includes setting up widgets for catalog/schema/volume selection, downloading the dataset, and copying it to the desired location.

**Workflow Steps:**
1. Set up widgets for catalog, schema, and volume selection.
2. Retrieve widget values and construct the volume path.
3. Download the dataset using `kagglehub`.
4. Copy the dataset files to the specified Unity Catalog volume.


In [0]:
# Step 1: Set up widgets to specify the target Unity Catalog location for your data
# These widgets let you select the catalog, schema, and volume path where the dataset will be stored

dbutils.widgets.text("catalog_use", "main", "Catalog")
dbutils.widgets.text("schema_use", "default", "Schema")
dbutils.widgets.text("volumes_path_use", "/Volumes/main/default/landing", "Volume Path")

In [0]:
# Step 2: Retrieve the values entered in the widgets above
catalog_use = dbutils.widgets.get("catalog_use")  # Retrieve catalog_use widget value
schema_use = dbutils.widgets.get("schema_use")    # Retrieve schema_use widget value
volumes_path_use = dbutils.widgets.get("volumes_path_use")  # Retrieve volumes_path_use widget value

In [0]:
# Step 3: Import the required library for downloading datasets from Kaggle
import kagglehub

In [0]:
help(kagglehub.dataset_download)

In [0]:
# Step 4: Download the latest version of the Ames Housing dataset from Kaggle
# The dataset will be downloaded to a local path on the driver (not directly to Unity Catalog volume)
path = kagglehub.dataset_download("shashanknecrothapa/ames-housing-dataset", path=volumes_path_use)

print("Path to dataset files:", path)