Introduction to Hugging Face API

What is Hugging Face?


A platform for pre-trained AI models (NLP, Vision, Audio).
Provides an API for inference without needing GPUs.
Model Hub: https://huggingface.co/models


Why Use Hugging Face API?

✅ Access to thousands of models (GPT-2, BERT, T5, Whisper, etc.)

✅ No need for model training or GPUs

✅ Simple REST API for quick prototyping

## Setting Up Hugging Face API Key

 - Steps
    - Get API Key
    - Go to Hugging Face → Settings → Access Tokens
    - Click New Token, set permissions to "Read"
    - Copy the token


In [1]:
!pip install --upgrade transformers



In [None]:
import os
import requests
from transformers import pipeline
from dotenv import load_dotenv
load_dotenv()

API_KEY = os.getenv("HUGGINGFACE_API_KEY")

if not API_KEY:
    raise ValueError("API key not found! Make sure you have a .env file with HUGGINGFACE_API_KEY.")
else:
    print("API key loaded successfully!")
headers = {"Authorization": f"Bearer {API_KEY}"}

  from .autonotebook import tqdm as notebook_tqdm


## 1. Summarization

### Type-1: Direct loading model into runtime

In [8]:
input_text = (
    "Artificial intelligence (AI) refers to the simulation of human intelligence in machines "
    "that are designed to think and learn."
)

ARTICLE = """Generative AI (GenAI) is a transformative branch of artificial intelligence that creates new content—including text, images, code, and audio—by learning patterns from existing data. Utilizing technologies like Large Language Models (LLMs) and diffusion models, GenAI enables natural language interactions, automating complex tasks and enhancing creativity across industries. 
Key Aspects of Generative AI:
How it Works: Generative AI models are trained on massive datasets to learn underlying structures and predict new data instances.
Key Applications:
Content Generation: Drafting articles, marketing copy, and creative writing.
Design & Art: Creating images, videos, and 3D designs, such as using Stable Diffusion.
Software Development: Coding, debugging, and software development, reducing manual effort.
Personalization: Delivering customized, AI-driven experiences in e-commerce and customer service.
Benefits & Impact: GenAI enhances efficiency, drives innovation, and acts as a collaborative tool for professional tasks.
Risks & Challenges: Key concerns include "hallucinations" (generating inaccurate info), data privacy, inherent biases, and potential copyright issues. 
As GenAI matures, it is moving from a novelty tool to a core component in enterprise workflows, fundamentally altering how humans interact with technology. """


In [4]:
# Hugging Face Inference API URL for summarization and generation
SUMMARIZATION_MODEL_1 = "facebook/bart-large-cnn"
SUMMARIZATION_MODEL_2 = "sshleifer/distilbart-cnn-12-6"

In [10]:
def summarize_text(text, model_name):
    summarizer = pipeline("text-generation", model=model_name)
    summary = summarizer(text, max_length=20, min_length=5)
    return summary

In [6]:
summary = summarize_text(input_text, SUMMARIZATION_MODEL_1)
print("Summary:", summary)

Please make sure the generation config includes `forced_bos_token_id=0`. 
Loading weights: 100%|██████████| 316/316 [00:00<00:00, 426.49it/s, Materializing param=model.decoder.layers.11.self_attn_layer_norm.weight]   
[1mBartForCausalLM LOAD REPORT[0m from: facebook/bart-large-cnn
Key                                                       | Status     |  | 
----------------------------------------------------------+------------+--+-
model.encoder.layers.{0...11}.final_layer_norm.bias       | UNEXPECTED |  | 
model.encoder.layers.{0...11}.self_attn_layer_norm.bias   | UNEXPECTED |  | 
model.encoder.layers.{0...11}.self_attn_layer_norm.weight | UNEXPECTED |  | 
model.encoder.layers.{0...11}.fc2.weight                  | UNEXPECTED |  | 
model.encoder.layers.{0...11}.self_attn.k_proj.weight     | UNEXPECTED |  | 
model.encoder.layers.{0...11}.self_attn.v_proj.bias       | UNEXPECTED |  | 
model.encoder.layers.{0...11}.self_attn.q_proj.bias       | UNEXPECTED |  | 
model.encoder.layers.{0

Summary: [{'generated_text': 'Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are designed to think and learn.'}]


In [None]:
summary = summarize_text(input_text, "philschmid/bart-large-cnn-samsum")
print("Summary:", summary)

### Type-2: Inferencing

In [None]:
input_text = (
    "Artificial intelligence (AI) refers to the simulation of human intelligence in machines "
    "that are designed to think and learn."
)

# Hugging Face Inference API URL for summarization and generation
SUMMARIZATION_MODEL_1 = "facebook/bart-large-cnn"
SUMMARIZATION_MODEL_2 = "sshleifer/distilbart-cnn-12-6"

In [None]:
def summarize_text_1(text):
    api_url = f"https://router.huggingface.co/hf-inference/models/{SUMMARIZATION_MODEL_1}"
    payload = {"inputs": text}
    response = requests.post(api_url, headers=headers, json=payload)
    response.raise_for_status()
    summary = response.json()[0]['summary_text']
    return summary
summary = summarize_text_1(input_text)
print("Summary:", summary)

Summary: Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are designed to think and learn. AI is a form of computer science that aims to simulate human intelligence. It is used to create machines that can think, learn and act on their own.


## 2. Named Entity Recognition (NER)


In [7]:
# Loading model
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Elon Musk is the CEO of Tesla and SpaceX, based in the USA."

ner_results = nlp(example)
print(ner_results)

Loading weights: 100%|██████████| 199/199 [00:00<00:00, 576.58it/s, Materializing param=classifier.weight]                                      
[1mBertForTokenClassification LOAD REPORT[0m from: dslim/bert-base-NER
Key                      | Status     |  | 
-------------------------+------------+--+-
bert.pooler.dense.weight | UNEXPECTED |  | 
bert.pooler.dense.bias   | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


[{'entity': 'B-PER', 'score': np.float32(0.9895768), 'index': 1, 'word': 'El', 'start': 0, 'end': 2}, {'entity': 'B-PER', 'score': np.float32(0.8199849), 'index': 2, 'word': '##on', 'start': 2, 'end': 4}, {'entity': 'I-PER', 'score': np.float32(0.9979075), 'index': 3, 'word': 'Mu', 'start': 5, 'end': 7}, {'entity': 'I-PER', 'score': np.float32(0.96267045), 'index': 4, 'word': '##sk', 'start': 7, 'end': 9}, {'entity': 'B-ORG', 'score': np.float32(0.9987417), 'index': 9, 'word': 'Te', 'start': 24, 'end': 26}, {'entity': 'I-ORG', 'score': np.float32(0.9958287), 'index': 10, 'word': '##sla', 'start': 26, 'end': 29}, {'entity': 'I-ORG', 'score': np.float32(0.7535397), 'index': 11, 'word': 'and', 'start': 30, 'end': 33}, {'entity': 'B-ORG', 'score': np.float32(0.9826819), 'index': 12, 'word': 'Space', 'start': 34, 'end': 39}, {'entity': 'I-ORG', 'score': np.float32(0.99901974), 'index': 13, 'word': '##X', 'start': 39, 'end': 40}, {'entity': 'B-LOC', 'score': np.float32(0.9995152), 'index': 1

In [11]:
# Inferencing model
def named_entity_recognition(text):
    API_URL = "https://router.huggingface.co/hf-inference/models/dbmdz/bert-large-cased-finetuned-conll03-english"
    payload = {"inputs": text}
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()[0]#["entity_group"]

print("Named Entities:", named_entity_recognition("Elon Musk is the CEO of Tesla and SpaceX, based in the USA."))

Named Entities: {'entity_group': 'PER', 'score': 0.999356, 'word': 'Elon Musk', 'start': 0, 'end': 9}


## 3. Question Answering (Roberta)


### Loading model

In [15]:
from transformers import DistilBertTokenizer, DistilBertModel
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased-distilled-squad')
model = DistilBertModel.from_pretrained('distilbert-base-cased-distilled-squad')

question, text = "Who is Elon Musk?", "ELON MUSK IS THE CEO OF TESLA"

inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)


Loading weights: 100%|██████████| 100/100 [00:00<00:00, 699.11it/s, Materializing param=transformer.layer.5.sa_layer_norm.weight]   
[1mDistilBertModel LOAD REPORT[0m from: distilbert-base-cased-distilled-squad
Key               | Status     |  | 
------------------+------------+--+-
qa_outputs.bias   | UNEXPECTED |  | 
qa_outputs.weight | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


In [16]:
print(outputs)

BaseModelOutput(last_hidden_state=tensor([[[ 1.1711, -0.5700,  1.2161,  ..., -0.6587, -0.1305, -0.5470],
         [ 1.3225, -0.6659,  1.7093,  ..., -0.7418, -0.3244, -0.6318],
         [ 1.8805, -0.5053,  1.4130,  ..., -0.5677,  0.1197, -0.3915],
         ...,
         [ 1.3974,  0.3189,  1.4370,  ..., -0.3530,  0.0278,  0.2146],
         [ 1.0326, -0.4906,  1.1371,  ..., -0.3295, -0.8474, -0.2820],
         [ 0.9740, -0.8145,  1.0698,  ..., -1.0709,  0.8596, -0.2683]]]), hidden_states=None, attentions=None)


In [17]:
from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
qa_result = qa_pipeline(question=question, context=text)
print("Answer:", qa_result['answer'])
print("Score:", qa_result['score'])

Loading weights: 100%|██████████| 102/102 [00:00<00:00, 672.74it/s, Materializing param=qa_outputs.weight]                                      


Answer: CEO
Score: 0.895262598991394


### Inferencing

In [12]:
def question_answering(question, context):
    API_URL = "https://router.huggingface.co/hf-inference/models/deepset/roberta-base-squad2"
    payload = {"inputs": {"question": question, "context": context}}
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

context_text = "Hugging Face is a company that specializes in natural language processing and AI models. It provides a platform for model training and deployment."
question_text = "What does Hugging Face specialize in?"
print("Answer:", question_answering(question_text, context_text))

Answer: {'score': 0.9522801041603088, 'start': 46, 'end': 87, 'answer': 'natural language processing and AI models'}


## 4. Experimenting with Different Models

How Find Model on Hugging Face Hub?


Go to Hugging Face Model Hub. https://huggingface.co/models
Search for the model you want (e.g., GPT-Neo, Whisper, T5).
Click on the model name to open its page.
Copy the Model ID (it’s usually in the format organization/model-name).


Construct API URL
The general format for Hugging Face inference API calls is:

ttps://router.huggingface.co/hf-inference/models/models/{MODEL_ID}


In [2]:
import os
import requests
from transformers import pipeline
from dotenv import load_dotenv
load_dotenv()

API_KEY = os.getenv("HUGGINGFACE_API_KEY")

if not API_KEY:
    raise ValueError("API key not found! Make sure you have a .env file with HUGGINGFACE_API_KEY.")
else:
    print("API key loaded successfully!")
headers = {"Authorization": f"Bearer {API_KEY}"}


API key loaded successfully!


In [None]:
API_URL = "https://router.huggingface.co/hf-inference/models/facebook/bart-large-cnn"

payload = {"inputs": "AI is revolutionizing industries by automating tasks and improving efficiency. It can also be used to create new products and services to improve quality of life and reduce costs. For more information on how to use AI in your business, visit CNN.com/ArtificialIntelligence."}

response = requests.post(API_URL, headers=headers, json=payload)
print("Summarized Text:", response.json()[0]["summary_text"])


Summarized Text: AI is revolutionizing industries by automating tasks and improving efficiency. It can also be used to create new products and services to improve quality of life and reduce costs. For more information on how to use AI in your business, visit CNN.com/ArtificialIntelligence.


In [4]:
import requests

response = requests.get("https://huggingface.co/api/models?full=true")
models = response.json()
print("Total Models Available:", len(models))

# Print the first 5 model names
for model in models[:10]:
    print(model["id"])


Total Models Available: 1000
zai-org/GLM-5
MiniMaxAI/MiniMax-M2.5
openbmb/MiniCPM-SALA
moonshotai/Kimi-K2.5
Qwen/Qwen3-Coder-Next
Nanbeige/Nanbeige4.1-3B
zai-org/GLM-OCR
openbmb/MiniCPM-o-4_5
inclusionAI/Ming-flash-omni-2.0
mistralai/Voxtral-Mini-4B-Realtime-2602
