
## Libraries Explained

- **dotenv**: Loads environment variables from a `.env` file into the application's environment, helping manage configuration separately from code.

- **huggingface_hub**: 
  - **HfApi**: Provides programmatic access to the Hugging Face model hub for uploading, downloading, and managing models.
  - **hf_hub_download**: Simplifies downloading model files from the Hugging Face hub to your local environment.

- **transformers**: Offers pre-trained models for natural language processing tasks. The `pipeline` function specifically provides an easy-to-use interface for common NLP tasks like text generation, sentiment analysis, and question answering.


In [1]:
import os, json, datetime
from datetime import datetime
from dotenv import load_dotenv

from huggingface_hub import HfApi
from huggingface_hub import hf_hub_download



from transformers import pipeline


# Loading Environment Variables for Hugging Face


This code snippet performs two essential operations:

1. `load_dotenv()` - Loads environment variables from a `.env` file into the application's environment. This is a common pattern for securely storing configuration and sensitive information outside of the source code.

2. `hf_key = os.getenv("HF_TOKEN")` - Retrieves the Hugging Face API token from the environment variables and assigns it to the variable `hf_key`. This token is required for authenticated access to the Hugging Face Hub services, including downloading private models or models with gated access.


In [2]:
load_dotenv()
hf_key=os.getenv("HF_TOKEN")


# Hugging Face Model Reference

[facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn)



# Facebook BART Large CNN Model

## Model Overview
This reference points to Facebook AI's BART large model that has been fine-tuned specifically for summarization tasks using the CNN/Daily Mail dataset.

## Key Specifications
- **Architecture**: BART (Bidirectional and Auto-Regressive Transformers)
- **Size**: Large variant (400M parameters)
- **Fine-tuning**: CNN/Daily Mail news article summarization dataset
- **Developer**: Facebook AI Research (now Meta AI)
- **Primary Task**: Text summarization

## Model Capabilities
- Generates concise and coherent summaries of longer texts
- Particularly effective for news article summarization
- Maintains key information while reducing text length
- Produces abstractive summaries (not just extractive)
- Can handle documents of moderate length (typically up to ~1024 tokens)


This model serves as an excellent option for applications requiring high-quality text summarization capabilities.

In [3]:
hf_reference='facebook/bart-large-cnn'


# Downloading Specific Model Files from Hugging Face Hub


This code snippet demonstrates how to selectively download specific files from a Hugging Face model repository:

1. **File Definition**: First, a list of commonly required files for transformer models is defined, with comments explaining each file's purpose:
   - Vocabulary files for tokenization
   - Configuration files for model architecture
   - Tokenizer files for text preprocessing
   - Model weights in different formats (PyTorch and SafeTensors)

2. **Selective Download**: The code iterates through each file in the list and:
   - Attempts to download it using `hf_hub_download()`
   - Specifies the model repository via `repo_id=hf_reference`
   - Saves files to a local directory structure based on the model name
   - Prints the local path where each file is saved

3. **Error Handling**: The try-except block catches and reports any download failures, allowing the process to continue even if certain files aren't available for the specific model.


In [None]:
# # List of required files
# required_files = [
#     "vocab.txt",          # Vocabulary file (if applicable)
#     "vocab.json",          # Vocabulary file (if applicable)       
#     "config.json",        # Model configuration
#     "tokenizer.json",     # Tokenizer configuration (if applicable)
#     "merges.txt",         # BPE merge rules file (if applicable)
#     "pytorch_model.bin",  # Model weights
#     "model.safetensors",  # Alternative model weights format
# ]


# # Download only the required files
# for file_name in required_files:
#     try:
#         print()
#         print(f"Attempting to download: {file_name}")
#         local_path = hf_hub_download(repo_id=hf_reference, filename=file_name, local_dir=f"models/{hf_reference.split('/')[1]}")
#         print(f"Saved to: {local_path}")
#     except Exception as e:
#         print(f"Could not download {file_name}: {e}")
        
    


# Setting Up BART for Text Summarization

This code initializes a text summarization pipeline using Facebook's BART model fine-tuned on the CNN/Daily Mail dataset.

## Pipeline Configuration

- **Task**: `"summarization"` - Creates concise summaries of longer texts
- **Model**: Uses the model specified in `hf_reference` (facebook/bart-large-cnn)
- **Storage**: The model is cached in `hf_model_cache` for efficient reuse

## Implementation Details

- The `pipeline()` function from Hugging Face's Transformers library provides a streamlined API
- Automatically handles tokenization, model inference, and text generation
- Downloads and caches the model weights on first use
- Makes summarization accessible with minimal code

## Alternative Implementation (Commented)

The commented line shows an alternative approach:
- Would use a locally downloaded version of the model
- Extracts the model name from the reference using string splitting
- Assumes the model exists in a local `models/` directory
- Useful for offline use or environments with limited connectivity

## Example Usage

```python
# Article to summarize
long_text = """
Researchers have discovered a new species of deep-sea fish living at depths of over 
7,000 meters in the Mariana Trench. The discovery was made during an expedition that 
used specialized submersibles to explore the ocean's deepest regions. The fish exhibits 
unique adaptations to the extreme pressure, including a gelatinous body structure and 
specialized cellular components. Scientists believe studying these adaptations could 
provide insights into pressure-resistant biomaterials. The findings were published in 
the journal Nature and represent the deepest living vertebrate species documented to date.
"""

# Generate a summary
summary = hf_model_cache(long_text, max_length=100, min_length=30, do_sample=False)

# Print the summarized text
print(summary[0]['summary_text'])
```

This pipeline enables efficient text summarization for applications such as news digests, document analysis, and content curation.

In [4]:
hf_model_cache = pipeline("summarization", model=hf_reference)
# hf_model_local = pipeline("summarization", model=f"models/{hf_reference.split('/')[1]}")

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

KeyboardInterrupt: 

# Summarizing Cricket News with BART

This code uses the BART-large-CNN model to generate a concise summary of a cricket news article about Team India and captain Rohit Sharma.

## Input Article Content

The text describes:
- Team India's poor performance in the Border-Gavaskar Trophy against Australia
- Potential leadership changes with Rohit Sharma's captaincy under scrutiny
- Reports of a senior player positioning himself as an interim captain
- Rohit's struggling batting form and potential career decisions

## Summarization Configuration

- **Model**: facebook/bart-large-cnn (via the hf_model_cache pipeline)
- **Parameters**:
  - `do_sample=False`: Uses greedy decoding for deterministic output
  - Default max_length (typically 142 tokens or ~100 words)

## Additional Notes

- The commented line shows an alternative approach to dynamically set the summary length
  - Would limit the summary to approximately 1/4 of the word count of the original text
  - Uses `len(txt.split(" "))` to count words, though variable `txt` is undefined (should be `text`)

- The summarized result is stored in the `result` variable but not printed in this code snippet

This demonstrates how the BART model can condense news articles while preserving the key information about team performance and leadership issues.

In [None]:
text = '''
Team India's below-par performance in the Border-Gavaskar Trophy could see big changes in the team and the leadership group. Rohit Sharma's captaincy is under the scanner and the selectors could take a call on him if India fail to reach the World Test Championship final. He has also struggled with the bat and only managed 31 runs in the ongoing series.
Amid India's poor performance in Australia, the Indian Express has reported that a senior player is portraying to be 'Mr Fix-it." The report states that the senior player is ready to project himself as an interim option for captaincy as he isn't convinced about the young players. The report doesn't mention the name of the senior player.
The report adds that Rohit may take a call about his career after the Border-Gavaskar Trophy. He made his ODI and T20I captaincy debut in 2007. Rohit made his Test debut in 2013.
'''

# Summarize the text
# print(hf_model_cache(text, max_length=100, do_sample=False))
result=hf_model_cache(text,  do_sample=False)



In [None]:
len("Rohit Sharma's captaincy is under the scanner and the selectors could take a call on him if India fail to reach the World Test Championship final. He has also struggled with the bat and only managed 31 runs in the ongoing series. The report doesn't mention the name of the senior player.")

In [None]:
print(len(text), len(result[0]['summary_text']))
print(result)


# Serialize and Save Model Information from Hugging Face Hub


This code demonstrates how to retrieve, serialize, and save detailed model information from the Hugging Face Hub:

1. **Serialization Function**: The `serialize_object()` function handles complex objects recursively:
   - Converts datetime objects to ISO format strings
   - Transforms objects with `__dict__` attributes into dictionaries
   - Processes nested lists and dictionaries
   - Preserves primitive data types

2. **API Interaction**: Creates an instance of the Hugging Face API client

3. **Model Information**: Fetches comprehensive metadata about the specified model using `api.model_info()`

4. **File Operations**: 
   - Extracts the model name from the reference path
   - Creates a JSON file named after the model
   - Serializes the model information and writes it to the file

This allows for local storage of model metadata for later reference or analysis, particularly useful for model governance, versioning, and documentation purposes.


In [None]:
def serialize_object(obj):
    """
    Helper function to serialize custom objects like EvalResult.
    Converts objects with __dict__ attribute to dictionaries and handles datetime objects.
    """
    if isinstance(obj, datetime):
        return obj.isoformat()  # Convert datetime to ISO 8601 string
    elif hasattr(obj, "__dict__"):
        return {key: serialize_object(value) for key, value in obj.__dict__.items()}
    elif isinstance(obj, list):
        return [serialize_object(item) for item in obj]
    elif isinstance(obj, dict):
        return {key: serialize_object(value) for key, value in obj.items()}
    else:
        return obj  # Return the value as-is for primitive types

api = HfApi()
with open(f"models/{hf_reference.split('/')[1]}.json", "w") as json_file:
    json_file.write(json.dumps(serialize_object(api.model_info(hf_reference))))
