# 1. Project LIDA

LIDA is a library for generating data visualizations and data-faithful infographics. LIDA is grammar agnostic (will work with any programming language and visualization libraries e.g. matplotlib, seaborn, altair, d3 etc) and works with multiple large language model providers (OpenAI, Azure OpenAI, PaLM, Cohere, Huggingface).Details on the components of LIDA are described in [this paper](https://arxiv.org/abs/2303.02927) - star [this project](https://aka.ms/lida/github) for updates. 

LIDA _treats visualizations as code_ and provides a clean api for generating, executing, editing, explaining, evaluating and repairing visualization code. Here are some tasks you can execute with LIDA.

- ✅ Data Summarization
- ✅ Goal Generation
- ✅ Visualization Generation
- ⬜️ Visualization Editing
- ✅ Visualization Explanation
- ⬜️ Visualization Evaluation and Repair
- ✅ Visualization Recommendation
- ⬜️ Infographic Generation (beta) # pip install lida[infographics]

![LIDA Modules illustrated](https://github.com/microsoft/lida/raw/main/docs/images/lidamodules.jpg)

## 1. Data Summarization
Given a dataset, generate a compact summary of that data in a compact natural language representation that serves as context for subsequent tasks. The goal of the summarizer is to _produce an dense-but-compact information summary for a given dataset that is useful as grounding context for visualization tasks_. The grounding context is defined as one that contains information an analyst would need to understand the dataset and the tasks that can be performed on it.

See [paper](https://arxiv.org/pdf/2303.02927.pdf) for details

In [None]:
# Setup
from lida import Manager, TextGenerationConfig , llm  

csvfile = "./../data/kaggle/IPL-2022.csv"
lida = Manager(text_gen = llm("openai")) # palm, cohere .
textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="gpt-3.5-turbo-0301", use_cache=True)

In [None]:
# Summarize
summary = lida.summarize(csvfile)
summary_data = list(summary.keys())
for keys in summary_data:
    print(keys, ":", summary[keys])

## 2. Goal Generation

Given the dataset "context" generated by the summarizer, the LLM must now _generate a question (hypothesis), a visualization (that addresses the question) and a rationale (for that visualization)_. The research found that requiring the LLM to produce a rationale led to more semantically meaningful goals.

The generation API takes these parameters - the summary, the number of goals to generate (n) and a persona (optional) that influences the tone or context for the goals generated. And the textgen_config that configures parameters for the given model.

See [paper](https://arxiv.org/pdf/2303.02927.pdf) for details

In [None]:
# generate 5 goals from the summary - with the persona is a fan of the Mumbai team
goals = lida.goals(summary, n=5, textgen_config=textgen_config, persona="fam of the Mumbai team who wants to see their stats") # exploratory data analysis

# create a list of dictionaries containing the goal information
import pandas as pd
goal_list = []
for goal in goals:
    display(goal)

In [None]:

# generate 10 goals from the summary with default persona
goals = lida.goals(summary, n=10, textgen_config=textgen_config,) # exploratory data analysis

# create a list of dictionaries containing the goal information
import pandas as pd
goal_list = []
for goal in goals:
    goal_dict = {'Question': goal.question, 'Visualization': goal.visualization, 'Rationale': goal.rationale}
    goal_list.append(goal_dict)
df = pd.DataFrame(goal_list)
display(df)

## 3. Visualization Generation

In [None]:
# Visualize A Goal 
charts = lida.visualize(summary=summary, goal=goals[0]) # exploratory data analysis
print("Charts length:", len(charts))
charts[0]

In [None]:
# Visualize a Goal - and specify a library
target = goals[2]
library = "matplotlib"
charts = lida.visualize(summary=summary, goal=target, library=library) # exploratory data analysis
charts[0]

In [None]:
# Visualize it again - and specify a different library and textgen_config (change temperature)
target = goals[2]
library = "seaborn"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=target,library=library,textgen_config=textgen_config) # exploratory data analysis
charts[0]

In [None]:
# Use natural language user query instead of pre-formulated goal
user_query = "What is the frequency of toss decisions based on team ?"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=user_query, textgen_config=textgen_config)  
charts[0]

## 4. Visualization Explanation

In [None]:
# Explain visualization
explanation = lida.explain(code=charts[0].code)
for obj in explanation[0]:
    display(obj)


In [None]:
# Edit visualization - modify using natural language -- insufficient tokens in model to run this
#instructions = ["change the color to green", "translate the title to french"]
# edited_charts = lida.edit(code=charts[0],  summary=summary, instructions=instructions)

## 5. Visualization Recommendation

In [None]:
# Recommend 3 visualizations 
recommendations = lida.recommend(code=charts[0].code, summary=summary, n=3,  textgen_config=textgen_config)

for chart in recommendations:
    display(chart) 

In [None]:
user_query = "Who won the most cricket games? Use a colorful palette. Show the x-axis labels in a vertical orientation. Increase height of chart by 15%"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=user_query, textgen_config=textgen_config, library="matplotlib")  
charts[0]