# Exploring GenAI in Research and Evaluation

## Using Large Language Models (LLMs) for making sense of large volumes of unstructured qualitative data: building a local-RAG. 

[Simone Lombardini](https://www.simonelombardini.com/home-page) - November 2025


As artificial intelligence continues to evolve, researchers and evaluators are increasingly asking how these tools can enhance their work without compromising the rigour and ethics. Over the past months, I've been exploring how generative AI tools can be applied to evaluation practice. 

In this short post, I’ll share my experience in using Large Language Models (LLMs) to make sense of large volumes of unstructured qualitative data.

In particular, I’ve asked myself:
1) How can I use Large Language Models (LLMs) to interpret large amounts of unstructured qualitative data? 
2) How can I do this while maintaining data security and confidentiality of the information in the text?
   
Here's what I've learned.


<b><u> What is a Large Language Model?

Large Language Models (LLMs), like GPT-4, Claude, and Gemini, are AI systems that can read, summarise, analyse, and write text across a wide range of topics and formats. LLMs can answer questions, summarise documents, analyse and explain complex topics; basically, they work with language in surprisingly human-like ways. 

These models are trained on vast amounts of text by learning patterns about how language works, grammar, facts, reasoning, and writing styles. Their knowledge depends on the information they’ve been trained on or the information they access from the web.  

In my example, I wanted to explore the use of LLMs to enquire large amounts of qualitative data collected from Focus Group Discussions (FGDs). In particular, I wanted to make sure the information and responses provided were based on the actual information provided in the FGDs transcripts. 



<b><u> How do I add my own data to these LLMs? 

This can be done in different ways. It can either:
- Retrain a model;  
- Provide documents as a context;
- Build a Retrieval-Augmented Generation (RAG) system.

Retraining a model involves deep, permanent learning of the new material. However, it also requires significant computer power and time, and it becomes obsolete as soon as some new information is added. 

Providing documents as a context requires sharing information with an external system. It requires having access to AI assistants, which ensures data privacy and integrity. I will cover this in a separate post. 

I will focus here on the third option, building a RAG system. 

<b><u> What is a RAG?

RAG stands for <b>Retrieval-Augmented Generation</b>. It's a technique that combines information retrieval with AI text generation to enhance the accuracy and knowledge of language models.

Instead of relying solely on the knowledge baked into a language model during training, RAG works in two steps: First, when you ask a question, the system searches through a database or collection of documents to find relevant information (<b>retrieval</b>). Second, the LLM uses both your question and the retrieved information to generate a response (<b>the generation</b>).

RAG addresses several limitations of standard language models. It provides <b>up-to-date information</b>, connecting the model to specific documents rather than being limited to training data. It <b>reduces hallucinations</b> by grounding responses in actual retrieved documents. It is <b>domain-specific</b> by giving the model access to specialised information included in documents, technical manuals, or research papers.



<b><u> How do I ensure that no information in my transcript documents is ever shared with external service providers?

While a RAG can be implemented by calling an API to external providers, I wanted to make sure that <b>no information was ever shared with external service providers</b>. This is to <b>ensure the security and integrity</b> of the information contained in the transcripts. 

I therefore set up a <b>local RAG system</b>. The main advantage of a local RAG is that all the documents and data stay on my machine, and nothing is sent to external servers. Once set up, it even works offline and does not require API costs or usage fees. This is particularly popular for businesses handling confidential data or individuals who prioritise privacy. 

The main disadvantage, however, is that the size of the model that can be run on the RAG is given by the hardware available on your machine. Without any hardware investment, I was able to run only small models, which may be less capable than cloud-based options like GPT-4 or Claude.

<b><u> How does a local RAG system work? 

This usually involves: 
1) <b>Local language model</b>: Open-source models that you run on your own hardware (like Llama, Mistral, or Phi); 
2) <b>Vector database</b>: Tools to store your document embeddings locally (like ChromaDB, FAISS, or Qdrant); 
3) <b>Document processing</b>: Scripts to chunk and embed your documents into the vector database; 
4) <b>Orchestration</b>: Frameworks like LangChain or LlamaIndex to connect everything together.


<b> How do I run a LLM on my own machine?

To do this, I used [Ollama](https://ollama.com/). Ollama is a free and open-source AI platform that allows users to run LLMs directly on their devices. It allows you to download and install several different LLMs. Each model has its own advantages and disadvantages. The choice of which model to use depends upon several factors. I will explore in a separate post how the different models I used performed. 



<b> What is a vector database?

A vector database is a specialized database designed to store and search embeddings. While traditional databases are great at exact matches (for example, find all users where name = 'John', they can't efficiently answer questions such as "<i>Find examples of challenges raised by the participants in the focus group discussion</i>". Vector databases solve this problem.

When you ask a question, they first convert your question to an <b>embedding</b>. The vector database calculates the distance to all stored vectors. It then returns the closest matches. Think of it like: "<i>Find the five documents whose meaning is closest to my question</i>".
Popular vector databases for local/self-hosted RAGs include <b>ChromaDB</b> (simple, popular for local RAG), <b>FAISS</b> (Facebook's library, very fast), <b>Qdrant</b> (feature-rich with good performance) and <b>weaviate</b> (open-source with lots of features). 

In my example, I have used ChromaDB. 



<b> What are the embeddings?

Embeddings are a way to convert text (or images, audio, etc.) into lists of numbers that capture the <i>meaning</i> of that content. These numbers allow computers to understand and compare semantic similarity. Instead of just seeing words as text, embeddings represent them as points in a multi-dimensional space. Words or phrases with similar meanings end up close together in this space. Embeddings let AI understand that "happy" is closer to "joyful" than to "sad", something keyword matching could never do.

Actual embeddings typically have: <b>384 dimensions</b>: Smaller, faster models; <b>768 dimensions</b>: Common size (BERT, sentence-transformers); <b>1536 dimensions</b>: OpenAI's embeddings. 

In my example, I have chosen <b>nomic-embed-text</b>, which is an open-source embedding model created by Nomic AI. It's become quite popular for local RAG systems because it offers excellent performance while being fully open and runnable on your own hardware.



<b> What is a LangChain?

LangChain is an open-source framework that helps developers build applications powered by large language models (LLMs). Think of it as a toolkit that simplifies connecting AI models to other data sources and tools.

For this step I actually used [LangFlow](https://www.langflow.org/). LangFlow is essentially a graphical interface for LangChain. It provides you a visual interface to allow you to drag components around and generate LangChain code behind the scenes. 


<b><u> How does a local RAG actually looks  like in LangFlow

Once downloaded LangFlow and built the local RAG which has basically two main components. A document ingestion component and a query/response component. 

A <u>document ingestion</u> component includes: 
- <b>File loader</b>: Points to text file(s);
- <b>Text splitter</b>: Breaks documents into smaller chunks of 1000 words;
- <b>Embedding model</b>: Converts text to vectors (nomic-embed-text);
- <b>Vector store</b>: Stores embeddings (ChromaDB)


![rag_2](image/rag_2.png)

A <u>query/response</u> side:
- <b>Chat input</b>: To allow the user to ask questions;
- <b>Retriever</b>: Searches your local vector database (Chroma DB);
- <b>Prompt</b>: Structures how the question is retrieved and sent to the LLM;
- <b>Ollama</b>: Run the LLM locally;
- <b>Output node</b>: Displays the answers.


![rag_1](image/rag_1.png)

<b><u> How do you run LangFlow?

It is advised to run LangFlow in a virtual environment. To do this, open a Terminal and run: