# Hugging Face Integration
* In this notebook we are integrating HuggingFace into the repository, mainly to share the datasets and models easily.
* This notebook is used simply to upload all the datasets and models to HuggingFace and we'll run it occasionally to keep the datasets updated.

## Install Dependencies


In [None]:
%pip install huggingface_hub
%pip install python-dotenv

Collecting python-dotenv
  Using cached python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Downloading python_dotenv-1.1.0-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.1.0
Note: you may need to restart the kernel to use updated packages.


In [9]:
from huggingface_hub import login, HfApi
import os
import sys
import dotenv
from pathlib import Path

In [15]:

dotenv.load_dotenv()
HF_TOKEN = os.getenv("HF_TOKEN")

## login huggingface user
if HF_TOKEN is None:
    print("Please set the HF_TOKEN environment variable. This is you hugging face token")
else:
    print("Logging in...")
    login(HF_TOKEN)

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


Logging in...


In [16]:
## verify login
api = HfApi()
user = api.whoami()
user_name = user['name']
print(f"Logged in as {user_name}")

Logged in as gaurangdave


In [18]:
## create a model repository on huggingface
model_name = "mental_health_ml"
repo_id = f"{user_name}/{model_name}"

## create a model repository
model_repo = api.create_repo(repo_id=repo_id, repo_type="model", exist_ok=True)
print(f"Created repository: {model_repo}")

## create a data repository
data_repo = api.create_repo(repo_id=repo_id, repo_type="dataset", exist_ok=True)
print(f"Created repository: {data_repo}")

Created repository: https://huggingface.co/gaurangdave/mental_health_ml
Created repository: https://huggingface.co/datasets/gaurangdave/mental_health_ml


In [19]:
data_root_dir = Path("..", "data")
models_root_dir = Path("..", "models")


## Upload Models

In [None]:
## upload all the models to the repository

def upload_models_in_dir(model_dir):
    for model in model_dir.iterdir():
        if model.is_dir():
            upload_models_in_dir(model)
        else: 
            filename = model.name
            ## read path relative to the models directory
            path = model.relative_to(models_root_dir)
            path_in_repo = f"{path}"
            api = HfApi()
            api.upload_file(path_or_fileobj=model, repo_id=repo_id, path_in_repo=path_in_repo, repo_type="model")
            print(f"Uploaded {filename} to {path}")

In [7]:
upload_models_in_dir(models_root_dir)

## Upload Data

In [None]:
## upload all the datasets to the repository

def upload_data_in_dir(data_dir):
    for dataset in data_dir.iterdir():
        if dataset.is_dir():
            upload_data_in_dir(dataset)
        else: 
            filename = dataset.name
            ## read path relative to the models directory
            path = dataset.relative_to(data_root_dir)
            path_in_repo = f"{path}"
            api = HfApi()
            api.upload_file(path_or_fileobj=dataset, repo_id=repo_id, path_in_repo=path_in_repo, repo_type="dataset")
            print(f"Uploaded {filename} to {path}")

In [9]:
upload_data_in_dir(data_root_dir)

IN.txt: 100%|██████████| 69.3M/69.3M [00:04<00:00, 16.1MB/s]


Uploaded IN.txt to IN.txt


detailed_in.csv: 100%|██████████| 17.6M/17.6M [00:01<00:00, 17.3MB/s]


Uploaded detailed_in.csv to detailed_in.csv
Uploaded student_depression_dataset.csv to student_depression_dataset.csv
Uploaded in.csv to in.csv
Uploaded X_train.csv to X_train.csv
Uploaded y_train.csv to y_train.csv
Uploaded y_test.csv to y_test.csv
Uploaded X_test.csv to X_test.csv
Uploaded processed_column_names.csv to processed_column_names.csv


## Download Data

In [22]:
os.makedirs(data_root_dir, exist_ok=True)

from huggingface_hub import list_repo_files, hf_hub_download

# List all files in the dataset repo
dataset_files = list_repo_files(repo_id=repo_id, repo_type="dataset")

# Download each file into the data_root_dir, preserving subdirectories
for file_path in dataset_files:
    local_path = data_root_dir / file_path
    local_path.parent.mkdir(parents=True, exist_ok=True)
    hf_hub_download(
        repo_id=repo_id,
        repo_type="dataset",
        filename=file_path,
        local_dir=str(local_path.parent),
        local_dir_use_symlinks=False
    )
    print(f"Downloaded {file_path} to {local_path}")

For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.


Downloaded .gitattributes to ../data/.gitattributes
Downloaded IN.txt to ../data/IN.txt
Downloaded IN.txt to ../data/IN.txt
Downloaded X_test.csv to ../data/X_test.csv
Downloaded X_test.csv to ../data/X_test.csv
Downloaded X_train.csv to ../data/X_train.csv
Downloaded detailed_in.csv to ../data/detailed_in.csv
Downloaded X_train.csv to ../data/X_train.csv
Downloaded detailed_in.csv to ../data/detailed_in.csv
Downloaded in.csv to ../data/in.csv
Downloaded in.csv to ../data/in.csv
Downloaded processed_column_names.csv to ../data/processed_column_names.csv
Downloaded processed_column_names.csv to ../data/processed_column_names.csv
Downloaded student_depression_dataset.csv to ../data/student_depression_dataset.csv
Downloaded student_depression_dataset.csv to ../data/student_depression_dataset.csv
Downloaded y_test.csv to ../data/y_test.csv
Downloaded y_train.csv to ../data/y_train.csv
Downloaded y_test.csv to ../data/y_test.csv
Downloaded y_train.csv to ../data/y_train.csv
