## S3 trained model downloader

**This code will download a given model from S3 to your local computer and later upload it to huggingface**

In [10]:
!pip3 install boto3 huggingface_hub tqdm --upgrade

Collecting boto3
  Downloading boto3-1.36.25-py3-none-any.whl.metadata (6.7 kB)
Collecting huggingface_hub
  Downloading huggingface_hub-0.29.1-py3-none-any.whl.metadata (13 kB)
Collecting botocore<1.37.0,>=1.36.25 (from boto3)
  Downloading botocore-1.36.25-py3-none-any.whl.metadata (5.7 kB)
Collecting s3transfer<0.12.0,>=0.11.0 (from boto3)
  Using cached s3transfer-0.11.2-py3-none-any.whl.metadata (1.7 kB)
Downloading boto3-1.36.25-py3-none-any.whl (139 kB)
Downloading huggingface_hub-0.29.1-py3-none-any.whl (468 kB)
Downloading botocore-1.36.25-py3-none-any.whl (13.4 MB)
   ---------------------------------------- 0.0/13.4 MB ? eta -:--:--
   --- ------------------------------------ 1.0/13.4 MB 5.6 MB/s eta 0:00:03
   ------- -------------------------------- 2.4/13.4 MB 5.6 MB/s eta 0:00:02
   ---------- ----------------------------- 3.4/13.4 MB 5.3 MB/s eta 0:00:02
   ------------- -------------------------- 4.5/13.4 MB 5.3 MB/s eta 0:00:02
   ---------------- --------------------

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
doccano 1.8.4 requires chardet<5.0.0,>=4.0.0, which is not installed.
doccano 1.8.4 requires SQLAlchemy<2.0.0,>=1.4.31, which is not installed.
awscli 1.37.24 requires botocore==1.36.24, but you have botocore 1.36.25 which is incompatible.
doccano 1.8.4 requires pandas<2.0.0,>=1.4.2, but you have pandas 2.2.3 which is incompatible.

[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [8]:
 # S3 bucket and key
bucket_name = 'llama-training-s3'
# For trained model path
# s3_key = 'llama-training-s3/model/pytorch-training-2025-03-06-00-05-14-843/output/model.tar.gz'
# For AWQ quantized model path
s3_key = 'llama-training-s3/quantized/model/pytorch-training-2025-03-06-00-05-14-843/output/model.tar.gz'
hf_token = "YOUR_HUGGINGFACE_KEYS"

In [9]:
#Huggingface id and model name
username = "javiagu"  # Replace with your Hugging Face username
model_name = "KULLM3_pCliNER_bf16_Official_Official-AWQ"  # Replace with your desired model name
local_model_name="kullm_official_official_awq"

**Model Download from S3 and Upload to Huggingface**

**Model Downloading:**

In [10]:
import boto3
import tarfile
import os
import sys
from huggingface_hub import HfApi, HfFolder, upload_folder
from tqdm import tqdm  # For the progress bar

# Initialize S3 client
s3 = boto3.client('s3')

# Use the home directory for storage
home_dir = os.path.expanduser('~')
print("Home directory is:", home_dir)

# Define local paths using the home directory
local_model_tar = os.path.join(home_dir, 'model.tar.gz')
model_dir = os.path.join(home_dir, local_model_name)

# Get the size of the file to download
try:
    response = s3.head_object(Bucket=bucket_name, Key=s3_key)
    total_length = response.get('ContentLength')
except Exception as e:
    print(f"Error retrieving object metadata from S3: {e}")
    sys.exit(1)

# Download model.tar.gz from S3 with progress bar
print("Downloading model from S3...")
try:
    with tqdm(total=total_length, unit='B', unit_scale=True, desc='Downloading model.tar.gz') as pbar:
        def progress_hook(bytes_amount):
            pbar.update(bytes_amount)

        s3.download_file(
            Bucket=bucket_name,
            Key=s3_key,
            Filename=local_model_tar,
            Callback=progress_hook
        )
    print(f"Downloaded model.tar.gz to {local_model_tar}")
except Exception as e:
    print(f"Error downloading file from S3: {e}")
    sys.exit(1)

# Extract the tar.gz file
os.makedirs(model_dir, exist_ok=True)

print("Extracting model files...")
try:
    with tarfile.open(local_model_tar, 'r:gz') as tar:
        tar.extractall(path=model_dir)
    print(f"Extracted model to {model_dir}")
except Exception as e:
    print(f"Error extracting model files: {e}")
    sys.exit(1)


Home directory is: C:\Users\javia
Downloading model from S3...


Downloading model.tar.gz: 100%|███████████████████████████████████████████████████| 5.37G/5.37G [21:14<00:00, 4.21MB/s]


Downloaded model.tar.gz to C:\Users\javia\model.tar.gz
Extracting model files...
Extracted model to C:\Users\javia\kullm_official_official_awq


**Model Uploading**

In [12]:
model_dir="C:/Users/javia/kullm_official_official_awq/KULLM3_pCliNER_bf16_Official_Official"

In [None]:
# Set up Hugging Face API
# Set up Hugging Face API  
HfFolder.save_token(hf_token)
model_repo_id = f"{username}/{model_name}"

api = HfApi()

# Create the repository (if it doesn't exist)
try:
    api.create_repo(
        repo_id=model_repo_id,
        repo_type="model",
        exist_ok=True,
        token=hf_token
    )
    print(f"Repository {model_repo_id} is ready.")
except Exception as e:
    print(f"Error creating repository on Hugging Face Hub: {e}")
    sys.exit(1)

# Upload the model files
print("Uploading model to Hugging Face Hub...")
try:
    upload_folder(
        folder_path=model_dir,
        repo_id=model_repo_id,
        repo_type="model",
        token=hf_token,
        ignore_patterns=["*.ipynb_checkpoints", "*.lock"],
    )
    print("Model uploaded successfully!")
except Exception as e:
    print(f"Error uploading model to Hugging Face Hub: {e}")
    sys.exit(1)


Repository javiagu/KULLM3_pCliNER_bf16_Official_Official-AWQ is ready.
Uploading model to Hugging Face Hub...


model-00001-of-00002.safetensors:   0%|                                                    | 0.00/4.99G [00:00<?, ?B/s]


[A[A[ALFS files:   0%|                                                                        | 0/3 [00:00<?, ?it/s]
[Aenizer.model:   0%|                                                                      | 0.00/493k [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|                                          | 115k/4.99G [00:00<1:21:15, 1.02MB/s]
[Aenizer.model:  27%|████████████████▍                                             | 131k/493k [00:00<00:00, 1.31MB/s]

model-00001-of-00002.safetensors:   0%|                                           | 229k/4.99G [00:00<2:30:10, 554kB/s]
[Aenizer.model:  53%|█████████████████████████████████▍                             | 262k/493k [00:00<00:00, 633kB/s]

[A[A00002-of-00002.safetensors:   0%|                                              | 180k/973M [00:00<38:17, 423kB/s]

[A[A00002-of-00002.safetensors: 

## Alternative: Model uploader

**This code is meant so that if you have a desired model in your pc to upload it to huggingface**

In [None]:
!explorer .

CONTENT CHECK

In [1]:
import os

def list_extracted_files(directory):
    """
    List all files and their sizes in a given directory.
    """
    print(f"{'File/Folder':<80} {'Size (Bytes)':>15}")
    print("-" * 95)

    for root, dirs, files in os.walk(directory):
        for name in files:
            file_path = os.path.join(root, name)
            file_size = os.path.getsize(file_path)
            print(f"{file_path:<80} {file_size:>15}")

    print("\nListing completed.")

# Directory containing the extracted files
model_dir = os.path.join(os.path.expanduser('~'), 'model')

# Check if the directory exists before listing files
if os.path.exists(model_dir):
    print(f"Listing contents of extracted directory: {model_dir}")
    list_extracted_files(model_dir)
else:
    print(f"Directory {model_dir} does not exist. Extraction might have failed.")


Directory C:\Users\javia\model does not exist. Extraction might have failed.


In [2]:
username = "javiagu"  # Replace with your Hugging Face username
model_name = "Exaone_pCliNER_bf16_Official"  # Replace with your desired model name
model_dir = "C:/Users/javia/exaone-no-rope-scaling"

In [3]:
!explorer .

In [None]:
# Set up Hugging Face API
from huggingface_hub import HfApi, HfFolder, upload_folder
HfFolder.save_token(hf_token)
model_repo_id = f"{username}/{model_name}"

api = HfApi()

# Create the repository (if it doesn't exist)
try:
    api.create_repo(
        repo_id=model_repo_id,
        repo_type="model",
        exist_ok=True,
        token=hf_token
    )
    print(f"Repository {model_repo_id} is ready.")
except Exception as e:
    print(f"Error creating repository on Hugging Face Hub: {e}")
    sys.exit(1)

# Upload the model files
print("Uploading model to Hugging Face Hub...")
try:
    upload_folder(
        folder_path=model_dir,
        repo_id=model_repo_id,
        repo_type="model",
        token=hf_token,
        ignore_patterns=["*.ipynb_checkpoints", "*.lock"],
    )
    print("Model uploaded successfully!")
except Exception as e:
    print(f"Error uploading model to Hugging Face Hub: {e}")
    sys.exit(1)

# Clean up local files
print("Cleaning up local files...")
try:
    os.remove(local_model_tar)
    import shutil
    shutil.rmtree(model_dir)
    print("Clean up completed.")
except Exception as e:
    print(f"Error during clean up: {e}")


Repository javiagu/KULLM3_pCliNER_bf16_Official is ready.
Uploading model to Hugging Face Hub...


model-00001-of-00004.safetensors:   0%|                                                    | 0.00/4.97G [00:00<?, ?B/s]
[Ael-00002-of-00004.safetensors:   0%|                                                    | 0.00/4.92G [00:00<?, ?B/s]


[A[A[ALFS files:   0%|                                                                        | 0/5 [00:00<?, ?it/s]

[A[A00003-of-00004.safetensors:   0%|                                                    | 0.00/4.92G [00:00<?, ?B/s]



[A[A[A[Aof-00004.safetensors:   0%|                                                     | 0.00/839M [00:00<?, ?B/s]




model-00001-of-00004.safetensors:   0%|                                          | 98.3k/4.97G [00:00<1:42:54, 804kB/s]
[Ael-00002-of-00004.safetensors:   0%|                                          | 131k/4.92G [00:00<1:06:48, 1.23MB/s]

[A[A00003-of-00004.safetensors:   0%|                                          | 131k/4.92G [00:00<1:03:56, 1.28MB/s]



[A[A[A[Aof-00004.safet