# Build with SpringAI: Chat Memory and Data Integration

**Spring AI Workshop - Session #3**
*November 27, 2025*

**Speaker:** Levani Gasviani

## Workshop Agenda

1. **Introduction** - Why LLMs forget context, Chat Memory vs Chat History
2. **Architecture** - ChatMemory & ChatMemoryRepository interfaces, Built-in implementations
3. **Memory in ChatClient** - Built-in advisors, Usage examples
4. **Advisors API** - Core components, flow, and ordering
5. **Custom Advisors** - Logging and Re-Reading (Re2) examples
6. **Q&A**
---

## Discussion Questions
 
**Let's start with these questions:**
 
* Have you ever chatted with an AI that forgot what you just said?

* Why does that happen?

* What makes a conversation feel natural?

--- 

## The Core Problem
 
### LLMs are Stateless
 
**Key Point:** Large Language Models don't remember previous conversations.
 
**Each request is independent** - like starting fresh every time.
 
**Spring AI provides Chat Memory to solve this.**
 
---

## Chat Memory vs Chat History
 
### Understanding the Difference
 
**Chat Memory**

* Information the LLM uses to maintain context

* Keeps what's **relevant** for the current conversation

* Smart and focused
 
**Chat History**

* Complete record of all messages

* Stores **everything** that was said

* For audit and review purposes (FrontEnd needs it for UX)

### Example

# Your Conversation
```
├─ Message 1: "Hi, I'm planning a trip to Tbilisi"
├─ Message 2: "What's the weather like?"
├─ Message 3: "Should I bring an umbrella?"
├─ Message 4: "What about restaurants?"
└─ Message 5: "Any vegetarian options?"

              ↓                                    ↓

      CHAT MEMORY                          CHAT HISTORY
      ───────────                          ────────────
   Keeps relevant context                Stores everything
         for AI                            permanently

   • Messages 1, 2, 3                    • All 5 messages
   • Trip + weather context              • Complete record
   • Fed to LLM                          • Audit/analytics
   • Managed by strategy                 • Historical reference
     (last N, token limit, etc.)
```
---

## The Architecture
 
### Two Main Components
 
**1. ChatMemory**

* Decides what to remember

* Chooses which messages matter

* Implements strategies:

  * Keep last N messages

  * Keep messages for time period

  * Keep messages within token limit
 
**2. ChatMemoryRepository**

* Stores messages

* Retrieves messages

* Simple responsibility: save and fetch
 
---

## Interfaces

### 1. ChatMemory Interface

```java
package org.springframework.ai.chat.memory;

public interface ChatMemory {
    
    /**
     * Save the specified message in the chat memory for the specified conversation.
     */
    default void add(String conversationId, Message message) {
        this.add(conversationId, List.of(message));
    }
    
    /**
     * Save the specified messages in the chat memory for the specified conversation.
     */
    void add(String conversationId, List<Message> messages);
    
    /**
     * Get the messages in the chat memory for the specified conversation.
     */
    List<Message> get(String conversationId, int lastN);
    
    /**
     * Clear the chat memory for the specified conversation.
     */
    void clear(String conversationId);
}
```

### 2. ChatMemoryRepository Interface
```java
package org.springframework.ai.chat.memory;

public interface ChatMemoryRepository {
    
    /**
     * Add messages to the repository for a specific conversation.
     */
    void add(String conversationId, List<Message> messages);
    
    /**
     * Retrieve the last N messages for a specific conversation.
     */
    List<Message> get(String conversationId, int lastN);
    
    /**
     * Delete all messages for a specific conversation.
     */
    void delete(String conversationId);
}
```

## How They Work Together

```

User Message

    ↓

ChatMemory (decides what to keep)

    ↓

ChatMemoryRepository (stores it)

```
 
---

# Built-in ChatMemoryRepository Implementations in Spring AI

Spring AI provides the following built-in repository implementations for storing chat memory:

## 1. InMemoryChatMemoryRepository
- **Storage**: In-memory using `ConcurrentHashMap`
- **Use Case**: Development, testing, or temporary storage
- **Persistence**: No (data lost on restart)
- **Dependency**: Built-in (no additional dependency required)

## 2. JdbcChatMemoryRepository
- **Storage**: Relational databases via JDBC
- **Use Case**: Applications requiring persistent storage in SQL databases
- **Persistence**: Yes
- **Supported Databases**: 
  - PostgreSQL
  - MySQL / MariaDB
  - SQL Server
  - HSQLDB
  - Oracle Database
- **Dependency**:
```xml
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-jdbc</artifactId>
</dependency>
```

## 3. CassandraChatMemoryRepository
- **Storage**: Apache Cassandra
- **Use Case**: High availability, durability, scale, with TTL support
- **Persistence**: Yes (time-series schema with auditing capabilities)
- **Special Features**: Time-to-live (TTL) for automatic message expiration
- **Dependency**:
```xml
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-cassandra</artifactId>
</dependency>
```

## 4. Neo4jChatMemoryRepository
- **Storage**: Neo4j graph database
- **Use Case**: Applications leveraging graph capabilities for chat memory
- **Persistence**: Yes
- **Special Features**: Messages stored as nodes and relationships
- **Dependency**:
```xml
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-neo4j</artifactId>
</dependency>
```

## 5. CosmosDBChatMemoryRepository
- **Storage**: Azure Cosmos DB NoSQL API
- **Use Case**: Globally distributed, highly scalable applications
- **Persistence**: Yes
- **Special Features**: Uses conversation ID as partition key for efficiency
- **Dependency**:
```xml
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-cosmos-db</artifactId>
</dependency>
```

## 6. MongoChatMemoryRepository
- **Storage**: MongoDB
- **Use Case**: Applications requiring flexible, document-oriented storage
- **Persistence**: Yes
- **Special Features**: Optional TTL for automatic message expiration
- **Dependency**:
```xml
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-mongodb</artifactId>
</dependency>
```

---

**Note**: By default, Spring AI auto-configures `InMemoryChatMemoryRepository` if no other repository is configured. If you add a dependency for any of the persistent repositories, Spring AI will automatically use that instead.

# Memory in Chat Client

Spring AI provides built-in Advisors to configure memory behavior in ChatClient:

## Spring AI Built-in Advisors

### 1. MessageChatMemoryAdvisor
- Retrieves conversation history from memory and adds it as a collection of messages to the prompt
- Maintains conversation history structure
- **Note:** Not all AI Models support this approach

### 2. PromptChatMemoryAdvisor
- Retrieves conversation history from memory
- Appends history to the system prompt as plain text

### 3. VectorStoreChatMemoryAdvisor
- Retrieves conversation history from vector store
- Appends history to the system message as plain text
- Useful for efficiently searching and retrieving relevant information from large datasets


### Usage Example
```java
ChatMemory chatMemory = MessageWindowChatMemory.builder().build();

ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(MessageChatMemoryAdvisor.builder(chatMemory).build())
    .build();

String conversationId = "007";
chatClient.prompt()
    .user("Do I have license to code?")
    .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId))
    .call()
    .content();
```

## MessageChatMemoryAdvisor: Step-by-Step Process
 
**Step 1: Get Conversation ID**

- Extracts the conversation ID from the request context

- Uses default if none provided
 
**Step 2: Retrieve Chat History**

- Calls `chatMemory.get(conversationId)`

- Returns a list of previous messages (UserMessage, AssistantMessage, etc.)
 
**Step 3: Merge Messages**

- Creates a new list starting with **memory messages** (history)

- Adds **current prompt instructions** (new user message, system message, etc.)
 
**Step 4: Create New Request**

- Mutates the original request

- Replaces the prompt's messages with the merged list

- Returns the processed request
 
**Step 5: Add to Memory (in `after()` method)**

- After LLM responds, the assistant's response is added to memory

  

 
## Visual Flow

```

┌─────────────────────────────────────────┐
│  User sends: "What's my name?"          │
└───────────────┬─────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────┐
│  MessageChatMemoryAdvisor.before()      │
├─────────────────────────────────────────┤
│ 1. Get conversationId = "007"           │
│ 2. Retrieve from ChatMemory:            │
│    [                                    │
│      UserMessage("My name is Bond"),    │
│      AssistantMessage("Hello Bond!")    │
│    ]                                    │
│ 3. Merge with current prompt:           │
│    [                                    │
│      UserMessage("My name is Bond"),  ← │ From memory
│      AssistantMessage("Hello Bond!"), ← │ From memory
│      UserMessage("What's my name?")   ← │ Current prompt
│    ]                                    │
└───────────────┬─────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────┐
│  Sent to LLM with full context          │
└───────────────┬─────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────┐
│  LLM Response: "Your name is Bond"      │
└───────────────┬─────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────┐
│  MessageChatMemoryAdvisor.after()       │
├─────────────────────────────────────────┤
│  Add AssistantMessage to memory         │
│  chatMemory.add(conversationId,         │
│    AssistantMessage("Your name..."))    │
└─────────────────────────────────────────┘

```


## Code Implementation

```java

public ChatClientRequest before(ChatClientRequest chatClientRequest, AdvisorChain advisorChain) {
 
    String conversationId = getConversationId(chatClientRequest.context(), this.defaultConversationId);
 
    
    // 1. Retrieve the chat memory for the current conversation

    List<Message> memoryMessages = this.chatMemory.get(conversationId);
 
    
    // 2. Advise the request messages list

    List<Message> processedMessages = new ArrayList<>(memoryMessages);

    processedMessages.addAll(chatClientRequest.prompt().getInstructions());
 
    
    // 3. Create a new request with the advised messages

    ChatClientRequest processedChatClientRequest = chatClientRequest.mutate()

        .prompt(chatClientRequest.prompt().mutate().messages(processedMessages).build())

        .build();
 
    
    // 4. Add the new user message to the conversation memory

    // (happens after this method)

    return processedChatClientRequest;

}

```
 

# Advisors API

The Advisors API provides a way to intercept, modify, and enhance AI interactions in Spring applications.

## Key Benefits
- Encapsulate recurring Generative AI patterns
- Transform data sent to and from LLMs
- Provide portability across various models and use cases

## Usage Example
```java
ChatMemory chatMemory = ... // Initialize your chat memory store
VectorStore vectorStore = ... // Initialize your vector store

var chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(
        MessageChatMemoryAdvisor.builder(chatMemory).build(), // chat-memory advisor
        QuestionAnswerAdvisor.builder(vectorStore).build()    // RAG advisor
    )
    .build();

var conversationId = "678";

String response = this.chatClient.prompt()
    .advisors(advisor -> advisor.param(ChatMemory.CONVERSATION_ID, conversationId))
    .user(userText)
    .call()
    .content();
```

## Core Components

<div style="text-align: center;">
    <img src="https://docs.spring.io/spring-ai/reference/_images/advisors-api-classes.jpg" alt="Advisor Architecture Diagram" width="800">
</div>

## Advisor Chain Flow
<div style="text-align: center;">
    <img src="https://docs.spring.io/spring-ai/reference/_images/advisors-flow.jpg" alt="Advisor Architecture Diagram" width="800">
</div>


1. Spring AI creates a `ChatClientRequest` from user's `Prompt` with an empty advisor `context` object

2. Each advisor processes the request and can:
   - Modify the request and pass it to the next advisor
   - Block the request by not calling the next entity (must fill out the response)

3. The final advisor (provided by framework) sends the request to the `Chat Model`

4. The Chat Model's response passes back through the advisor chain and converts to `ChatClientResponse` (includes the shared advisor `context` instance)

5. Each advisor can process or modify the response

6. The final `ChatClientResponse` is returned to the client by extracting the `ChatCompletion`

# Advisor Order

Execution order is determined by the `getOrder()` method.

## Key Points

- Advisors with lower order values are executed first
- The advisor chain operates as a stack:
  - First advisor processes the request first
  - Same advisor processes the response last

## Spring Ordered Interface
```java
public interface Ordered {

    /**
     * Constant for the highest precedence value.
     * @see java.lang.Integer#MIN_VALUE
     */
    int HIGHEST_PRECEDENCE = Integer.MIN_VALUE;

    /**
     * Constant for the lowest precedence value.
     * @see java.lang.Integer#MAX_VALUE
     */
    int LOWEST_PRECEDENCE = Integer.MAX_VALUE;

    /**
     * Get the order value of this object.
     * <p>Higher values are interpreted as lower priority. As a consequence,
     * the object with the lowest value has the highest priority (somewhat
     * analogous to Servlet {@code load-on-startup} values).
     * <p>Same order values will result in arbitrary sort positions for the
     * affected objects.
     * @return the order value
     * @see #HIGHEST_PRECEDENCE
     * @see #LOWEST_PRECEDENCE
     */
    int getOrder();
}
```

## Use Case: First on Both Input and Output

1. Use separate advisors for each side
2. Configure with different order values
3. Share state between them using advisor context

# API Overview

Main interfaces in `org.springframework.ai.chat.client.advisor.api`:

## Base Interface
```java
public interface Advisor extends Ordered {
    String getName();
}
```

## Advisor Types

### CallAdvisor (Synchronous)
```java
public interface CallAdvisor extends Advisor {
    ChatClientResponse adviseCall(
        ChatClientRequest chatClientRequest, CallAdvisorChain callAdvisorChain);
}
```

### StreamAdvisor (Reactive)
```java
public interface StreamAdvisor extends Advisor {
    Flux<ChatClientResponse> adviseStream(
        ChatClientRequest chatClientRequest, StreamAdvisorChain streamAdvisorChain);
}
```

## Advisor Chains

### CallAdvisorChain
```java
public interface CallAdvisorChain extends AdvisorChain {
    // Invokes the next CallAdvisor in the chain
    ChatClientResponse nextCall(ChatClientRequest chatClientRequest);
    
    // Returns all CallAdvisor instances in this chain
    List<CallAdvisor> getCallAdvisors();
}
```

### StreamAdvisorChain
```java
public interface StreamAdvisorChain extends AdvisorChain {
    // Invokes the next StreamAdvisor in the chain
    Flux<ChatClientResponse> nextStream(ChatClientRequest chatClientRequest);
    
    // Returns all StreamAdvisor instances in this chain
    List<StreamAdvisor> getStreamAdvisors();
}
```

# Implementing an Advisor

Implement either `CallAdvisor` or `StreamAdvisor` (or both):
- `nextCall()` for non-streaming
- `nextStream()` for streaming

## Example: Logging Advisor

A simple advisor that logs requests and responses without modifying them. Supports both non-streaming and streaming.
```java
public class SimpleLoggerAdvisor implements CallAdvisor, StreamAdvisor {

    private static final Logger logger = LoggerFactory.getLogger(SimpleLoggerAdvisor.class);

    @Override
    public String getName() {
        return this.getClass().getSimpleName();
    }

    @Override
    public int getOrder() {
        return 0;
    }

    @Override
    public ChatClientResponse adviseCall(ChatClientRequest chatClientRequest, 
                                         CallAdvisorChain callAdvisorChain) {
        logRequest(chatClientRequest);
        
        ChatClientResponse chatClientResponse = callAdvisorChain.nextCall(chatClientRequest);
        
        logResponse(chatClientResponse);
        
        return chatClientResponse;
    }

    @Override
    public Flux<ChatClientResponse> adviseStream(ChatClientRequest chatClientRequest,
                                                  StreamAdvisorChain streamAdvisorChain) {
        logRequest(chatClientRequest);
        
        Flux<ChatClientResponse> chatClientResponses = streamAdvisorChain.nextStream(chatClientRequest);
        
        return new ChatClientMessageAggregator()
            .aggregateChatClientResponse(chatClientResponses, this::logResponse);
    }

    private void logRequest(ChatClientRequest request) {
        logger.debug("request: {}", request);
    }

    private void logResponse(ChatClientResponse chatClientResponse) {
        logger.debug("response: {}", chatClientResponse);
    }
}
```

**Notes:**
- `getName()` provides a unique advisor name
- `getOrder()` controls execution order (lower values execute first)
- `MessageAggregator` aggregates Flux responses for logging the entire response (read-only operation)

## Re-Reading (Re2) Advisor

Based on "Re-Reading Improves Reasoning in Large Language Models", this technique augments the input prompt:
```
{Input_Query}
Read the question again: {Input_Query}
```

### Implementation
```java
public class ReReadingAdvisor implements BaseAdvisor {

    private static final String DEFAULT_RE2_ADVISE_TEMPLATE = """
            {re2_input_query}
            Read the question again: {re2_input_query}
            """;

    private final String re2AdviseTemplate;
    private int order = 0;

    public ReReadingAdvisor() {
        this(DEFAULT_RE2_ADVISE_TEMPLATE);
    }

    public ReReadingAdvisor(String re2AdviseTemplate) {
        this.re2AdviseTemplate = re2AdviseTemplate;
    }

    @Override
    public ChatClientRequest before(ChatClientRequest chatClientRequest, 
                                     AdvisorChain advisorChain) {
        String augmentedUserText = PromptTemplate.builder()
            .template(this.re2AdviseTemplate)
            .variables(Map.of("re2_input_query", 
                chatClientRequest.prompt().getUserMessage().getText()))
            .build()
            .render();

        return chatClientRequest.mutate()
            .prompt(chatClientRequest.prompt().augmentUserMessage(augmentedUserText))
            .build();
    }

    @Override
    public ChatClientResponse after(ChatClientResponse chatClientResponse, 
                                     AdvisorChain advisorChain) {
        return chatClientResponse;
    }

    @Override
    public int getOrder() {
        return this.order;
    }

    public ReReadingAdvisor withOrder(int order) {
        this.order = order;
        return this;
    }
}
```

**Notes:**
- `before()` augments the user's input query with the Re-Reading technique
- `getOrder()` controls execution order (lower values execute first)

# Summary

## What We Learned

**Chat Memory solves the stateless problem** - LLMs don't remember, but Spring AI does it for you

**Two simple components:**
- `ChatMemory` - decides what to keep
- `ChatMemoryRepository` - stores and retrieves

**Ready-to-use solutions:**
- Multiple storage options (InMemory, JDBC, MongoDB, etc.)
- Built-in advisors for chat clients
- Easy integration with ChatClient API

**Extensible design:**
- Create custom advisors for your needs
- Chain multiple advisors together
- Control execution order

## Key Takeaway

Spring AI makes conversational AI simple - focus on your application logic, let the framework handle the memory.

---

**Thank you!**