# Embeddings for Classification Tasks

## Assigning Labels to Items

* Categorization
    * Example: headlines into topics

* Sentiment Analysis
    * Example: Classifying reviews as positive or negative


> **Embeddings** capture *semantic* meaning

## Classification with Embeddings

* Zero-shot classification:
    * Not using labeled data


Process:

1. Embed class description
    - i.e. Tech, Science, Sport, Business, etc.
2. Embed the item to classify
    - i.e. The article
3. Compute cosine distances
4. Assign the most similar label

In [4]:
from utils import *

topics = [
    {'label':'Tech'},
    {'label':'Science'},
    {'label':'Sport'},
    {'label':'Business'},
]

class_descriptions = [topic['label'] for topic in topics]
class_embeddings = create_embeddings(class_descriptions)

In [5]:
article = {
    "headline": "How NVIDIA GPUs could decide who wins the AI race", 
    "keywords": ["ai","business","computers"]
}

article_text = create_article_text_without_topic(article)
article_embeddings = create_embeddings(article_text)[0]
print(article_text)


    Headline: How NVIDIA GPUs could decide who wins the AI race
    keywords: ai, business, computers
    


In [7]:
closest = find_closest(article_embeddings, class_embeddings)
label = topics[closest['index']]['label']
print(label)

Tech


### Limitation:

* Class descriptions lacked sufficient detail


## Mode Detailed Descriptions

In [10]:
topics = [
    {'label':'Tech', "description":"A news article about technology"},
    {'label':'Science', "description":"A news article about science"},
    {'label':'Sport', "description":"A news article about sports"},
    {'label':'Business', "description":"A news article about business"},
]

In [11]:
class_descriptions = [topic['description'] for topic in topics]
class_embeddings = create_embeddings(class_descriptions)

In [12]:
# Article code remains the same

closest = find_closest(article_embeddings, class_embeddings)
label = topics[closest['index']]['label']
print(label)

Tech
