Skip to content

google/ragvis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

RAGvis: Reliable and Cost-Effective Exploratory Data Analysis via Retrieval-Augmented Generation

This repository contains the official implementation of our EMNLP'25 paper: "Reliable and Cost-Effective Exploratory Data Analysis via Graph-Guided RAG".

RAGvis is a novel, two-stage Retrieval-Augmented Generation (RAG) framework designed to automate Exploratory Data Analysis (EDA). It is built to address the limitations of Large Language Model (LLM)-only approaches, which can struggle with accuracy and reliability, particularly on private or less-common datasets.

RAGvis Framework Diagram


How It Works

RAGvis operates in two primary stages:

  1. Offline Knowledge Graph Semantic Enrichment: In this stage, a knowledge graph is first built from a large collection of EDA notebooks. This graph is then enriched with structured EDA semantics. This process is guided by an LLM using an empirically-developed taxonomy of EDA operations.

  2. Online EDA Notebook Generation: When presented with a new, unseen dataset, RAGvis performs the following steps:

    • Retrieves relevant EDA operations from the knowledge graph.
    • Aligns these retrieved operations with the structure of the new dataset.
    • Refines the aligned operations through LLM reasoning.
    • Generates and verifies executable Python code using a self-correcting agent.

Code and Datasets Coming Soon!


This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

About

No description, website, or topics provided.

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published