# LIDA - Automatic Generation of Visualizations and Infographics using Large Language Models

LIDA is a library for generating data visualizations and data-faithful infographics. LIDA is grammar agnostic (will work with any programming language and visualization libraries e.g. matplotlib, seaborn, altair, d3 etc) and works with multiple large language model providers (OpenAI, PaLM, Cohere, Huggingface). Details on the components of LIDA are described in the [paper here](https://arxiv.org/abs/2303.02927). See the project page [here](https://microsoft.github.io/lida/) for updates!.


## LIDA with OpenAI model
This tutorial focus on the implementation with OpenAI models  


## Getting Started | Installation
If you intend to use lida with local huggingface models, you will need to install the `transformers` library.
```bash
pip install lida[transformers]
```

## The LIDA Python API

Lida offers a manager class that exposes core functionality of the LIDA system. This tutorial will show you how to use the manager class to create visualizations based on a dataset leveraging on opensource model on hugging face.


### Open source LLM Backends
By default, LIDA uses the `openai` backend but in this tutorial, we will use a LLM from hugging face as the backend. First, we set up the backend with `text_gen` parameter in the `Manager` class. For a list of supported models and how to configure them, see the [llmx documentation](https://github.com/victordibia/llmx).

In [1]:
from lida import Manager, TextGenerationConfig, llm
import warnings

# Ignore warnings for cleaner output
warnings.filterwarnings('ignore')

# Clear CUDA cache to release GPU memory
import torch
torch.cuda.empty_cache()

In [4]:
# Initialize Lida Manager with Hugging Face model provider
text_gen = llm(provider="hf",
               model="mistralai/Mistral-7B-v0.1",
               device_map="auto")

lida_manager = Manager(text_gen=text_gen)

Got to HFTextGenerator


Loading checkpoint shards: 100%|████████████████████████████████████| 2/2 [00:57<00:00, 28.83s/it]


In [7]:
# Generate summary from a CSV file
summary = lida_manager.summarize("https://raw.githubusercontent.com/uwdata/draco/master/data/cars.csv")

In [8]:
# Configure text generation settings for goals
textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="Mistral-7B-v0.1", use_cache=True)


In [10]:
# Generate goals based on the summary
goals = lida_manager.goals(summary, n=2, textgen_config=textgen_config)

In [12]:
# Display the generated goals
for goal in goals:
    display(goal)


### Goal 0
---
**Question:** What is the distribution of Cyl?

**Visualization:** `histogram of Cyl`

**Rationale:** This tells about the distribution of Cyl



### Goal 1
---
**Question:** What is the distribution of Weight?

**Visualization:** `histogram of Weight`

**Rationale:** This tells about the distribution of Weight


In [None]:
# Visualize data using a specific library and goal
i = 1
library = "seaborn"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)

# Generate visualization based on the selected goal
charts = lida_manager.visualize(summary=summary, goal=goals[i], textgen_config=textgen_config, library=library)

# Display the generated visualization
charts[0]