# Cafe Rewards Offer Dataset
This notebook downloads data from Kaggle.

In [0]:
!pip install kaggle --quiet

Set up Kaggle API credentials. Upload your `kaggle.json` containing your Kaggle username and API key.

In [0]:
import os, json
from pathlib import Path
creds_path = Path('../kaggle.json')
if creds_path.is_file():
    creds = json.loads(creds_path.read_text())
    os.environ['KAGGLE_USERNAME'] = creds['username']
    os.environ['KAGGLE_KEY'] = creds['key']
else:
    print('kaggle.json not found. Please upload it.')

Download the dataset from Kaggle

In [0]:
!kaggle datasets download arshmankhalid/caf-rewards-offer-dataset -p dados --unzip --force

Create Delta tables in the `bronze` schema using PySpark.

In [0]:
import pandas as pd
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
spark.sql("CREATE DATABASE IF NOT EXISTS bronze")

## Load Customers
customers_pdf = pd.read_csv('dados/customers.csv')
customers_sdf = spark.createDataFrame(customers_pdf)
customers_sdf.write.mode('overwrite').saveAsTable('bronze.customers')

## Load Offers
offers_pdf = pd.read_csv('dados/offers.csv')
offers_sdf = spark.createDataFrame(offers_pdf)
offers_sdf.write.mode('overwrite').saveAsTable('bronze.offers')

## Load Events
events_pdf = pd.read_csv('dados/events.csv')
events_sdf = spark.createDataFrame(events_pdf)
events_sdf.write.mode('overwrite').saveAsTable('bronze.events')
