RAGvis: Reliable and Cost-Effective Exploratory Data Analysis via Retrieval-Augmented Generation

This repository contains the official implementation of our EMNLP'25 paper: "Reliable and Cost-Effective Exploratory Data Analysis via Graph-Guided RAG".

RAGvis is a novel, two-stage Retrieval-Augmented Generation (RAG) framework designed to automate Exploratory Data Analysis (EDA). It is built to address the limitations of Large Language Model (LLM)-only approaches, which can struggle with accuracy and reliability, particularly on private or less-common datasets.

How It Works

RAGvis operates in two primary stages:

Offline Knowledge Graph Semantic Enrichment: In this stage, a knowledge graph is first built from a large collection of EDA notebooks. This graph is then enriched with structured EDA semantics. This process is guided by an LLM using an empirically-developed taxonomy of EDA operations.
Online EDA Notebook Generation: When presented with a new, unseen dataset, RAGvis performs the following steps:
- Retrieves relevant EDA operations from the knowledge graph.
- Aligns these retrieved operations with the structure of the new dataset.
- Refines the aligned operations through LLM reasoning.
- Generates and verifies executable Python code using a self-correcting agent.

Code and Datasets Coming Soon!

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
benchmarks		benchmarks
docs		docs
embeddings		embeddings
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAGvis: Reliable and Cost-Effective Exploratory Data Analysis via Retrieval-Augmented Generation

How It Works

Code and Datasets Coming Soon!

About

Uh oh!

Releases

Packages

Uh oh!

google/ragvis

Folders and files

Latest commit

History

Repository files navigation

RAGvis: Reliable and Cost-Effective Exploratory Data Analysis via Retrieval-Augmented Generation

How It Works

Code and Datasets Coming Soon!

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages