Google Colab:

HuggingFace: https://huggingface.co/facebook/bart-large-mnli

## Libraries Explained

- **dotenv**: Loads environment variables from a `.env` file into the application's environment, helping manage configuration separately from code.

- **huggingface_hub**:
  - **HfApi**: Provides programmatic access to the Hugging Face model hub for uploading, downloading, and managing models.
  - **hf_hub_download**: Simplifies downloading model files from the Hugging Face hub to your local environment.

- **transformers**: Offers pre-trained models for natural language processing tasks. The `pipeline` function specifically provides an easy-to-use interface for common NLP tasks like text generation, sentiment analysis, and question answering.


In [None]:
import os, json, datetime
from datetime import datetime
# from dotenv import load_dotenv

from huggingface_hub import HfApi
from huggingface_hub import hf_hub_download

from transformers import pipeline


# Loading Environment Variables for Hugging Face


This code snippet performs two essential operations:

1. `load_dotenv()` - Loads environment variables from a `.env` file into the application's environment. This is a common pattern for securely storing configuration and sensitive information outside of the source code.

2. `hf_key = os.getenv("HF_TOKEN")` - Retrieves the Hugging Face API token from the environment variables and assigns it to the variable `hf_key`. This token is required for authenticated access to the Hugging Face Hub services, including downloading private models or models with gated access.


In [None]:
hf_key="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"


# Hugging Face Model Reference

[facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli)

# Facebook BART Large MNLI Model

## Model Overview
This reference points to Facebook AI's BART large model fine-tuned on the Multi-Genre Natural Language Inference (MNLI) dataset.

## Key Specifications
- **Architecture**: BART (Bidirectional and Auto-Regressive Transformers)
- **Size**: Large (400M parameters)
- **Fine-tuning**: MNLI (Multi-Genre Natural Language Inference)
- **Developer**: Facebook AI Research (Meta AI)
- **Use Case**: Zero-shot classification and natural language inference

## Capabilities
- Determines textual entailment between premise and hypothesis
- Classifies relationship as entailment, contradiction, or neutral
- Enables zero-shot classification by framing categories as hypotheses
- Generalizes to unseen tasks and domains without additional training



This model serves as a powerful foundation for flexible text classification tasks where predefined categories or training examples may not be available.

In [None]:
hf_reference='facebook/bart-large-mnli'


# Downloading Specific Model Files from Hugging Face Hub


This code snippet demonstrates how to selectively download specific files from a Hugging Face model repository:

1. **File Definition**: First, a list of commonly required files for transformer models is defined, with comments explaining each file's purpose:
   - Vocabulary files for tokenization
   - Configuration files for model architecture
   - Tokenizer files for text preprocessing
   - Model weights in different formats (PyTorch and SafeTensors)

2. **Selective Download**: The code iterates through each file in the list and:
   - Attempts to download it using `hf_hub_download()`
   - Specifies the model repository via `repo_id=hf_reference`
   - Saves files to a local directory structure based on the model name
   - Prints the local path where each file is saved

3. **Error Handling**: The try-except block catches and reports any download failures, allowing the process to continue even if certain files aren't available for the specific model.


In [None]:
# List of required files
required_files = [
    "vocab.txt",          # Vocabulary file (if applicable)
    "vocab.json",          # Vocabulary file (if applicable)
    "config.json",        # Model configuration
    "tokenizer.json",     # Tokenizer configuration (if applicable)
    "merges.txt",         # BPE merge rules file (if applicable)
    "pytorch_model.bin",  # Model weights
    "model.safetensors",  # Alternative model weights format
]


# Download only the required files
for file_name in required_files:
    try:
        print()
        print(f"Attempting to download: {file_name}")
        local_path = hf_hub_download(repo_id=hf_reference, filename=file_name, local_dir=f"models/{hf_reference.split('/')[1]}")
        print(f"Saved to: {local_path}")
    except Exception as e:
        print(f"Could not download {file_name}: {e}")


Attempting to download: vocab.txt
Could not download vocab.txt: 404 Client Error. (Request ID: Root=1-68e0e6b4-3ef06e965702f6136e606017;d885ccd5-f212-44a6-a3e6-042aa67678f1)

Entry Not Found for url: https://huggingface.co/facebook/bart-large-mnli/resolve/main/vocab.txt.

Attempting to download: vocab.json


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


vocab.json: 0.00B [00:00, ?B/s]

Saved to: models/bart-large-mnli/vocab.json

Attempting to download: config.json


config.json: 0.00B [00:00, ?B/s]

Saved to: models/bart-large-mnli/config.json

Attempting to download: tokenizer.json


tokenizer.json: 0.00B [00:00, ?B/s]

Saved to: models/bart-large-mnli/tokenizer.json

Attempting to download: merges.txt


merges.txt: 0.00B [00:00, ?B/s]

Saved to: models/bart-large-mnli/merges.txt

Attempting to download: pytorch_model.bin


pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Saved to: models/bart-large-mnli/pytorch_model.bin

Attempting to download: model.safetensors


model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Saved to: models/bart-large-mnli/model.safetensors



# Creating Zero-shot Classification Pipelines


This code initializes two sentiment analysis pipelines using Hugging Face's `transformers` library:

1. **Cached Model Pipeline**:
   - `hf_model_cache` uses the model identifier directly (`hf_reference`)
   - When this pipeline runs, it will first check the default Hugging Face cache directory on your system
   - If not found in cache, it automatically downloads the model from Hugging Face Hub

2. **Local Model Pipeline**:
   - `hf_model_local` uses the previously downloaded model files
   - Points to the local directory where model files were saved earlier
   - Loads the model from the local files rather than downloading or using cache
   - Path is constructed by extracting just the model name from the reference

Both pipelines provide the same sentiment analysis functionality but differ in where they source the model files from, allowing flexibility between network-dependent and offline usage.


In [None]:
hf_model_cache = pipeline("zero-shot-classification", model=hf_reference)
# hf_model_local = pipeline("zero-shot-classification", model=f"models/{hf_reference.split('/')[1]}")

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


# Zero-Shot Classification of Technology Text


This code performs zero-shot classification on a technology-related statement using a cached Hugging Face model:

### Input Components:
- **Text**: A statement about organizations using "ubiquitous computing fabric" for mission-critical applications
- **Candidate Labels**: Five potential categories for classification
  - education
  - politics
  - technology
  - science
  - cosmology
- **Model**: Pre-loaded/cached Hugging Face model (likely BART or similar transformer)

### Process:
1. The text describes cloud-to-edge computing infrastructure
2. The model analyzes this text against each potential category
3. No training examples are provided (zero-shot approach)
4. The model determines how likely the text belongs to each category

### Expected Output:
The result variable will contain a dictionary with:
- The original input sequence
- Labels ranked by likelihood
- Confidence scores for each label

The model will likely classify this as primarily "technology" related given keywords like "computing fabric," "cloud," "edge," and "applications," with perhaps some association to "science" as a secondary category.

This demonstrates how zero-shot classification can identify content categories without specific training examples for each domain.

In [None]:
text = "organizations continue to choose our ubiquitous computing fabric—from cloud to edge—to run their missioncritical applications"
candidate_labels = ["education", "politics", "technology", "science", "cosmology"]
result = hf_model_cache(text, candidate_labels=candidate_labels)

print(result)

{'sequence': 'organizations continue to choose our ubiquitous computing fabric—from cloud to edge—to run their missioncritical applications', 'labels': ['technology', 'science', 'cosmology', 'education', 'politics'], 'scores': [0.9831323027610779, 0.00678890710696578, 0.00354014546610415, 0.0035089629236608744, 0.0030295816250145435]}


In [None]:
text = "Embeddings encapulsate meaning for models"
candidate_labels = ["data", "machinelearning", "technology", "science", "cosmology"]
result = hf_model_cache(text, candidate_labels=candidate_labels)

print(result)

{'sequence': 'Embeddings encapulsate meaning for models', 'labels': ['data', 'machinelearning', 'technology', 'science', 'cosmology'], 'scores': [0.28148943185806274, 0.27388668060302734, 0.24066248536109924, 0.12597326934337616, 0.0779881626367569]}



# Serialize and Save Model Information from Hugging Face Hub


This code demonstrates how to retrieve, serialize, and save detailed model information from the Hugging Face Hub:

1. **Serialization Function**: The `serialize_object()` function handles complex objects recursively:
   - Converts datetime objects to ISO format strings
   - Transforms objects with `__dict__` attributes into dictionaries
   - Processes nested lists and dictionaries
   - Preserves primitive data types

2. **API Interaction**: Creates an instance of the Hugging Face API client

3. **Model Information**: Fetches comprehensive metadata about the specified model using `api.model_info()`

4. **File Operations**:
   - Extracts the model name from the reference path
   - Creates a JSON file named after the model
   - Serializes the model information and writes it to the file

This allows for local storage of model metadata for later reference or analysis, particularly useful for model governance, versioning, and documentation purposes.


In [None]:
def serialize_object(obj):
    """
    Helper function to serialize custom objects like EvalResult.
    Converts objects with __dict__ attribute to dictionaries and handles datetime objects.
    """
    if isinstance(obj, datetime):
        return obj.isoformat()  # Convert datetime to ISO 8601 string
    elif hasattr(obj, "__dict__"):
        return {key: serialize_object(value) for key, value in obj.__dict__.items()}
    elif isinstance(obj, list):
        return [serialize_object(item) for item in obj]
    elif isinstance(obj, dict):
        return {key: serialize_object(value) for key, value in obj.items()}
    else:
        return obj  # Return the value as-is for primitive types

api = HfApi()
with open(f"models/{hf_reference.split('/')[1]}.json", "w") as json_file:
    json_file.write(json.dumps(serialize_object(api.model_info(hf_reference))))
