# Semantic Classification

Semantic classification is a machine learning technique that categorizes text based on its meaning using vector embeddings and similarity matching. This approach offers a compelling alternative to using large language models (LLMs) for classification tasks.

## Semantic Classification vs. LLM Classification

**LLM-based classification** involves sending text to models like GPT-4 or Claude with prompts asking them to categorize content. While powerful, this approach has several limitations:
- **Cost**: API calls for every classification can be expensive at scale
- **Latency**: Network requests and model inference add delay

**Semantic classification** uses vector embeddings to represent text meaning numerically, then applies similarity thresholds to determine categories:
- **Speed**: Near-instantaneous vector similarity calculations
- **Cost-effective**: No API costs after initial setup

## How It Works

Text is converted into high-dimensional vectors that capture semantic meaning. Reference examples for each category are embedded, and new content is classified by measuring vector similarity. Content that falls within a defined distance threshold of category references gets classified accordingly.

In this notebook, we'll build a semantic classifier using RedisVL to identify AI-related content, demonstrating an efficient alternative to LLM-based classification.

## Installing dependencies

We'll need two main libraries for this semantic classification example:

**RedisVL** - A library that provides vector database capabilities and semantic routing functionality on top of Redis. RedisVL handles the vector storage, similarity search, and routing logic that powers our semantic classifier.

In [1]:
@file:DependsOn("com.redis:redisvl:0.0.1")

## Setting up the Text Vectorizer

The vectorizer is responsible for converting text into numerical vector representations that capture semantic meaning. RedisVL provides several vectorizer options such as OpenAI and VertexAI. We're using the HuggingFace Text Vectorizer for this example.

In [2]:
import com.redis.vl.utils.vectorize.SentenceTransformersVectorizer

val vectorizer = SentenceTransformersVectorizer("Xenova/all-MiniLM-L6-v2")

val embedding = vectorizer.embed("What is the capital city of Italy?")

println(embedding.joinToString())

-0.009056281, 0.09096523, -0.051762886, 0.08848378, -0.12719342, -0.0703391, 0.029510844, 0.013291523, -0.057980966, -0.014017097, 0.03739981, -0.13108169, 0.0018671635, 0.03550265, -0.055068597, -0.04273072, 0.0480743, 0.035149302, 0.051385034, 0.008154835, 0.02939507, -0.02790439, 0.04798433, 0.012633902, 0.050369605, 0.03730664, -0.016114296, 0.016826835, -0.05483934, -0.04307148, -0.014681098, 0.0032649112, 0.10389013, -0.085853584, 0.016533818, 0.017277544, -0.012875621, -0.008417194, 0.106101766, -3.3647308E-4, 0.03838455, -0.007070606, 0.064803414, 0.04349774, 0.027908528, -0.004982669, 0.05417708, 0.08491659, 0.01753072, -0.04387867, -0.0089426385, -0.029429087, -0.04308129, -0.0137046715, -0.049384452, 0.079110876, 0.0159977, -0.023842642, 0.010396142, -0.017871607, -0.02013254, -0.029775942, -0.057334274, 0.079562895, 0.017678022, 0.046195857, -0.025770709, -0.052720636, -0.07104178, -0.016904766, 0.005821192, -0.04959368, 0.012194841, -0.06851538, 0.024740597, -0.06589627, -

## Loading references
In this recipe, we're trying to classify posts that are related to artificial intelligence. In order to do so, we will vectorize a couple of hundred examples that have been previously generated for us.

In [4]:
import java.io.File

val artificialIntelligenceReferences = File("../data/1_references.txt")
    .readLines()
    .map { it.trim() }

println(artificialIntelligenceReferences.take(10).joinToString("\n\n"))

Just realized that attention mechanisms in transformers are basically learning to focus like humans do when reading

My grandmother can now video call her grandkids thanks to real-time translation AI and honestly it makes me emotional

The fact that GPT can write code but still struggles with basic arithmetic tells you everything about how these models work

Spent three hours debugging my PyTorch model only to realize I forgot to set it to training mode

Is it just me or does every AI ethics paper end with "more research is needed" without proposing actual solutions

OpenAI's latest model can generate images from text but my autocorrect still thinks "definately" is a word

The attention weights in my transformer model look like abstract art and I'm not sure if that's good or bad

Teaching my kids about AI feels like preparing them for a world I can't even imagine

Diffusion models are basically learning to remove noise step by step which is oddly therapeutic to think about

Why do we c

## Creating a route

Now, let's create a route. A route is a classification category that defines what types of content should be grouped together based on semantic similarity. Each route contains:

- **Reference examples**: Sample text that represents the category you want to classify
- **Distance threshold**: How similar new text must be to the references to match the route
- **Route name**: An identifier for this classification category

In [5]:
import com.redis.vl.extensions.router.Route
import com.redis.vl.extensions.router.SemanticRouter

val artificialIntelligenceRoute = Route.builder()
    .name("artificial_intelligence_references")
    .references(artificialIntelligenceReferences)
    .distanceThreshold(0.7)
    .build()

## Creating the Semantic Router

The SemanticRouter is the central component that orchestrates the classification process. It combines your routes, vectorizer, and Redis storage to provide fast semantic classification capabilities.


In [7]:
import redis.clients.jedis.UnifiedJedis

val jedis = UnifiedJedis()

val router = SemanticRouter.builder()
    .name("ai-router")
    .routes(listOf(artificialIntelligenceRoute))
    .vectorizer(vectorizer)
    .jedis(jedis)
    .build()

## Testing our semantic classification solution

In [8]:
val userQuery = "Redis is a great tool for building applied AI systems because it works well as agent memory"

val match = router.route(userQuery)

println(match)

RouteMatch(name=artificial_intelligence_references, distance=0.589918046313)


In [9]:
val userQuery = "Salerno is a nice place to visit"

val match = router.route(userQuery)

println(match)

RouteMatch(name=null, distance=null)
