Skip to content

Using large language models for exploratory data analysis

License

Notifications You must be signed in to change notification settings

rawar/ix-eda-llm

Repository files navigation

Using large language models for exploratory data analysis

This repository contains examples of exploratory data analysis using large language models (LLMs). Open source tools such as Langchain, Vanna.ai, Pandas.ai and Cube are used to interact with data structures in natural language and thus make the use of SQL or the Pandas dataframe syntax obsolete.

The paper LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models from Microsoft Research shows how metadata and the schema description plus efficient prompting can be used to query data and even visualize it. Here you can easily understand the basic procedure of data in context (schema, DDL, samples, etc.) and the prompting towards the language model.

All examples use the Northwind database as a basis.

The following files can be found in this repository:

The Cube Notebbok can only run locally and requires Docker and the docker-compose file to build a local PostgreSQL instance with the Northwind database and the cube application.

About

Using large language models for exploratory data analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published