**Prerequisites**
1. Use `dwh` environment: `conda activate dwh`
2. Install required package: `conda install kagglehub`
3. Inititate `dbt` project: `dbt init brazilecom`
4. Go to folder `brazilecom` and run the codes below
5. For data analysis, need to install the following packages if not done before:
   
   5.1. google-cloud-bigquery-storage 
   
   5.2. google-cloud-bigquery
   5.3. dagster-gcp
   

**Part 1: Download Data**

The following code is using Kagglehub API to extract brazilian-ecommerce dataset from Kaggle website and put under project `seeds` folder.

In [1]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("olistbr/brazilian-ecommerce",force_download=True)
print("Path to dataset files:", path)



Downloading from https://www.kaggle.com/api/v1/datasets/download/olistbr/brazilian-ecommerce?dataset_version_number=2...


100%|██████████| 42.6M/42.6M [00:19<00:00, 2.28MB/s]

Extracting files...





Path to dataset files: C:\Users\seeke\.cache\kagglehub\datasets\olistbr\brazilian-ecommerce\versions\2


In [3]:
import shutil
import os
import pandas as pd

# Move all files from the downloaded path to the "data" folder
for file_name in os.listdir(path):
    full_file_name = os.path.join(path, file_name)
    if os.path.isfile(full_file_name):
        #fname=full_file_name.replace("olist_", "")
        #fname=fname.replace("_dataset", "")
        #fname = os.path.basename(fname)
        #shutil.copy(full_file_name, os.path.join("seeds", fname))
        df=pd.read_csv(full_file_name)
        dff=df.iloc[0:100]
        print(dff.shape)
        break

(100, 5)


**Part 2: Extract data from seeds (CSV files) and load into bigquery table**
1. configure dbt_project.yml and profiles.yml for seeds
2. run seeds: `dbt seed`
3. note: you need to exclude order_reviews.csv and geolcation.csv as they are not in use for this project.

**Part 3: Test raw data quality**

Run raw data testing: `dbt test`.

Note: the test cases are in properties.yml under `seeds` folder.

**Part 4. Transform the raw data into Dimention Tables and Fact Table**

Run dbt: `dbt run`.

**Part 5. Generate documentation about this project**
1. Generate documents: `dbt docs generate`
2. View documents locally: `dbt docs serve`