# Enriching Filtered Events

In this notebook, we'll enrich the filtered events from the previous notebook with additional information. We'll use a combination of techniques to enrich the events:

1. Topic modeling using a Large Language Model (LLM) to extract topics from the posts


## Topic Modeling with Large Language Models
Topic modeling is a technique used to discover abstract topics in a collection of documents. In this notebook, we'll use a Large Language Model to extract topics from posts. This will allow us to categorize posts and make them more searchable.

### Setting Up the Ollama API Client
We'll use the Spring AI Ollama client to interact with the Ollama API.

Ollama is a tool that allows us to run large language models locally.

In [2]:
%use coroutines
@file:DependsOn("org.springframework.ai:spring-ai-ollama:1.0.0-RC1")

The prompt we'll use for the LLM is designed to extract software-related topics from posts. The prompt includes examples of how to format the output and what types of topics to include.

In [31]:
import java.io.File

val topicModelingSystemPrompt = File("resources/topic-extractor-prompt.txt").readText()

Create the Ollama Chat Model

In [3]:
import org.springframework.ai.ollama.OllamaChatModel
import org.springframework.ai.ollama.api.OllamaApi
import org.springframework.ai.ollama.api.OllamaApi.ChatRequest
import org.springframework.ai.ollama.api.OllamaApi.Message
import org.springframework.ai.ollama.api.OllamaApi.Message.Role
import org.springframework.ai.ollama.api.OllamaOptions

val ollamaApi = OllamaApi.builder()
    .baseUrl("http://localhost:11434")
    .build()

val ollamaOptions = OllamaOptions.builder().model("deepseek-coder-v2").build()

val ollamaChatModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(ollamaOptions)
    .build()

### Creating a Topic Modeling Function
This function takes a post as input and uses the Ollama API to extract topics from the post. The function returns a string of comma-separated topics.

In [32]:
import org.springframework.ai.chat.messages.SystemMessage
import org.springframework.ai.chat.messages.UserMessage
import org.springframework.ai.chat.prompt.Prompt
import dev.raphaeldelio.*

fun extractTopics(post: String, existingTopics: String): String {
    val messages = listOf(
        SystemMessage(topicModelingSystemPrompt),
        UserMessage("Existing topics: $existingTopics"),
        UserMessage("Post: $post")
    )

    val response = ollamaChatModel.call(Prompt(messages))
    return response.result.output.text ?: ""
}

In [5]:
extractTopics("Kotlin is a great programming language for beginners who wants to build Agentic AI apps", "")

 "Kotlin, Programming Languages, AI Applications"

In [8]:
extractTopics("Brazilian samba is a great music genre for dancing", "")

 ""

### Counting how many times a topic appears

Topk is a probabilistic data structure used for counting the number of occurrences of items in a stream. It is particularly useful for counting the number of occurrences of items in a large dataset without storing all the items explicitly.

In [10]:
import redis.clients.jedis.exceptions.JedisDataException
import java.time.LocalDateTime

fun createTopK(): String {
    val windowBucket = LocalDateTime.now().withMinute(0).withSecond(0).withNano(0)
    try {
        jedisPooled.topkReserve("topics-topk:$windowBucket", 15, 3000, 10, 0.9)
    } catch (_: JedisDataException) {
        println("TopK already exists")
    }

    return "topics-topk:$windowBucket"
}

### Creating a Topic Extraction Handler
This function creates a handler that extracts topics from an event's text and stores them in Redis. The topics are stored as a pipe-separated string in the "topics" field of the event's hash.

In [13]:
val extractTopics: (Event) -> Pair<Boolean, String> = { event ->
    val existingTopics = jedisPooled.smembers("topics")
    val topics = extractTopics(event.text, existingTopics.joinToString(", "))
        .replace("\"", "")
        .replace("“", "")
        .replace("”", "")
        .split(",")
        .map { it.trim() }
        .filter { it.isNotBlank() }

    val topKKey = createTopK()
    if (topics.isNotEmpty()) {
        jedisPooled.topkAdd(topKKey, *topics.toTypedArray())
        jedisPooled.hset("post:" + event.uri.replace("at://did:plc:", ""), mapOf("topics" to topics.joinToString("|")))
        jedisPooled.sadd("topics", *topics.toTypedArray())
    }
    Pair(true, "OK")
}

In [14]:
createConsumerGroup("filtered-events", "topic-extraction-example")

In [15]:
runBlocking {
    consumeStream(
        streamName = "filtered-events",
        consumerGroup = "topic-extraction-example",
        consumer = "topic-extraction-1",
        handlers = listOf(printUri, extractTopics),
        ackFunction = ackFn(),
        count = 1,
        limit = 100
    )
}

Got event from at://did:plc:p7ulrp4foqwo3clni7c6le4j/app.bsky.feed.post/3lptgszmsb22e
Got event from at://did:plc:rdawdbanqn3rewsoqwwgn64f/app.bsky.feed.post/3lptgtlav2224
TopK already exists
Got event from at://did:plc:4qx5w4ydgxfqjjulu7iaqcb6/app.bsky.feed.post/3lptgtpegys2g
TopK already exists
Got event from at://did:plc:4aqdizbvkawdfz3yo4yatync/app.bsky.feed.post/3lptgubcl322e
TopK already exists
Got event from at://did:plc:qxtnhnigpszcx6hhl3pwzbyk/app.bsky.feed.post/3lptgukodts2z
TopK already exists
Got event from at://did:plc:zppl6erazdke37t467jo7vru/app.bsky.feed.post/3lptguh54sc2v
TopK already exists
Got event from at://did:plc:zef4ypw24j2yicvwpkr6pza7/app.bsky.feed.post/3lptguecjf22n
TopK already exists
Got event from at://did:plc:zef4ypw24j2yicvwpkr6pza7/app.bsky.feed.post/3lptgukn6qc2n
TopK already exists
Got event from at://did:plc:ajzerqtewuljmsh2ehh7qfqj/app.bsky.feed.post/3lptgunwbzc2t
TopK already exists
Got event from at://did:plc:2k27drz6ks6t7zlkzdpobwyw/app.bsky.feed

## Creating a Redis Search Index
In this section, we'll create a Redis Search index to make the enriched events searchable. Redis Search is a module that adds full-text search capabilities to Redis. It allows us to search for events based on their text, topics, and other fields.

### Creating the Index Schema in Code
Now we'll create the index schema in code. We'll use the Jedis client to create the schema and the index.

The following schema defines the fields that will be indexed. The schema includes:
- Text fields for full-text search
- Tag fields for exact matching
- Vector fields for semantic search

```
FT.CREATE postIdx ON HASH PREFIX 1 post: SCHEMA
        parentUri     TEXT
        topics        TAG SEPARATOR "|"
        time_us       TEXT
        langs         TAG
        uri           TEXT
        operation     TAG
        did           TAG
        timeUs        NUMERIC
        rkey          TAG
        rootUri       TEXT
        text          TEXT
```

In [16]:
import redis.clients.jedis.search.IndexDefinition
import redis.clients.jedis.search.IndexOptions
import redis.clients.jedis.search.Schema
import redis.clients.jedis.search.schemafields.VectorField.VectorAlgorithm

val schema = Schema()
    .addTextField("parentUri", 1.0)
    .addTagField("topics", "|")
    .addTextField("time_us", 1.0)
    .addTagField("langs")
    .addTextField("uri", 1.0)
    .addTagField("operation")
    .addTagField("did")
    .addNumericField("timeUs")
    .addTagField("rkey")
    .addTextField("rootUri", 1.0)
    .addTextField("text", 1.0)

// Define index options (e.g., prefix)
val rule = IndexDefinition()
    .setPrefixes("post:")

// Create the index
try {
    jedisPooled.ftCreate("postIdx", IndexOptions.defaultOptions().setDefinition(rule), schema)
} catch (e: JedisDataException) {
    println("Index already exists")
}

### Searching the Index
Now that we have created the index, we can search for events based on their topics, text, and other fields. In this example, we'll search for events with the topic "Samba".

Redis Search uses a query language similar to SQL. For example, to search for events with the topic "machine_learning", we would use the query `@topics:{machine_learning}`.

Exact Matching Search

In [38]:
//FT.SEARCH postIdx "@topics:{machine_learning}"
val result = jedisPooled.ftSearch(
    "postIdx",
    "@topics:{OpenAI}"
)

result.documents.forEach { post ->
    println(post.get("topics"))
    println(post.get("text"))
    println("\n")
}

Full Text Search

In [35]:
//FT.SEARCH postIdx "@text:Open source"
val result = jedisPooled.ftSearch(
    "postIdx",
    "@text:from general robots"
)

result.documents.forEach { post ->
    println(post.get("text"))
    println("\n")
}

I should develop some weird technology again, it’s been a bit… 

aside from general robots, androids, the colony ships, gene splicing and etc, there’s Maxwellian Energy, AIRs/AIRtanks, Bullet Colonies, AS (Artificial Soul)s… I should think of more wacky fake science terms too




Querying the TopK

In [34]:
import redis.clients.jedis.params.ScanParams

val jedisScanFn = { cursor: String ->
    jedisPooled.scan(cursor, ScanParams().match("topics-topk:*"), "TopK-TYPE")
}

val keys = mutableListOf<String>()
var lastCursor = "0"
do {
    val result = jedisScanFn.invoke(lastCursor)
    lastCursor = result.cursor
    keys.addAll(result.result)
} while (lastCursor != "0")

keys.forEach {
    println(it)
    val count = jedisPooled.topkListWithCount(it)
    println(count)
}

topics-topk:2025-05-23T12:00
{AI=25, Generative Models=21, Machine Learning=18, Prompt Engineering=12, Artificial Intelligence=10, AI Data Security=5, OpenAI=5, Text-to-Image=3, Long-Term Strategy=2, Writing Tools=2, Mathematics=2, History=1, Wave Equations=1, Infinite Energy Hunger=1, Commercial Dependency=1}
topics-topk:2025-05-23T13:00
{Generative Models=6, Prompt Engineering=4, AI=4, OpenAI=2, AI Tooling=2, Azure Cloud=1, HPC=1, Web Application Development=1, Supercomputers=1, AI Model=1}
