<div id="colab_button\">
    <h1>LaVague: Quick-tour guide</h1>
    <a target="_blank\" href="https://colab.research.google.com/github/lavague-ai/lavague/blob/main/docs/docs/get-started/quick-tour.ipynb">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
    </div>

## Introduction

LaVague is an open-source framework allowing users to leverage AI to turn natural language instructions into executable code to automate UI actions, such as filling in a form, etc.

In this quick tour, we are going to show you step-by-step how can you can set-up and use LaVague to perform a few example actions on webpages. We will create and launch a Gradio demo at the end of the notebook where you can test out using LaVague interactively.

> Pre-requisites: Note, if you are running the notebook locally, you will need python (test on python>=3.8) and pip installed.

> Note, this notebook uses remote inference with the HuggingFace API. For local inference, see the [local quick-tour](./local-quick-tour.ipynb) (coming soon).

> If you prefer to run LaVague as a Python script, you can do so by executing the `huggingface_api.py` script in the `gradio_demos` folder. However, you will still need to install the necessary webdriver for Selenium - instructions to do so are detailed in the following step.

## Initial set-up

### Installing driver for Selenium

In this example, we will generate code using [Selenium](https://www.selenium.dev/) to perform user interface actions.

Selenium requires a driver to interface with the chosen browser (Chrome, Firefox, etc.)

We therefore first need to download the Chrome driver.

⚠️ For instructions on how to install a driver on a different OS, [see the Selenium documentation](https://selenium-python.readthedocs.io/installation.html#drivers)

> Note that while we use Selenium for this example. We hope to integrate different automation tools such as Playwright at a later date.

In [1]:
# If you are missing any apt packages uncomment and run this command first:
# !sudo apt update

!sudo apt install -y ca-certificates fonts-liberation unzip \
libappindicator3-1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 \
libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 \
libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 \
libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 \
libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 \
libxrandr2 libxrender1 libxss1 libxtst6 lsb-release wget xdg-utils

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'libgcc-s1' instead of 'libgcc1'
fonts-liberation is already the newest version (1:1.07.4-11).
libasound2 is already the newest version (1.2.6.1-1ubuntu1).
libasound2 set to manually installed.
libatk-bridge2.0-0 is already the newest version (2.38.0-3).
libatk-bridge2.0-0 set to manually installed.
libatk1.0-0 is already the newest version (2.36.0-3build1).
libatk1.0-0 set to manually installed.
libcairo2 is already the newest version (1.16.0-5ubuntu2).
libcairo2 set to manually installed.
libfontconfig1 is already the newest version (2.13.1-4.2ubuntu5).
libfontconfig1 set to manually installed.
libnspr4 is already the newest version (2:4.32-3build1).
libnspr4 set to manually installed.
libxcb1 is already the newest version (1.14-3ubuntu3).
libxcb1 set to manually installed.
libxcomposite1 is already the newest version (1:0.4.5-1build2).
libxcomposite1 set to manually insta

In [2]:
!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chrome-linux64.zip
!wget https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chromedriver-linux64.zip
!unzip chrome-linux64.zip
!unzip chromedriver-linux64.zip
!rm chrome-linux64.zip chromedriver-linux64.zip

--2024-03-20 12:38:04--  https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chrome-linux64.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.2.207, 74.125.137.207, 2607:f8b0:4023:c0d::cf
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.2.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149157879 (142M) [application/zip]
Saving to: ‘chrome-linux64.zip’


2024-03-20 12:38:05 (200 MB/s) - ‘chrome-linux64.zip’ saved [149157879/149157879]

--2024-03-20 12:38:05--  https://storage.googleapis.com/chrome-for-testing-public/122.0.6261.94/linux64/chromedriver-linux64.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.2.207, 74.125.137.207, 2607:f8b0:4023:c0d::cf
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.2.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8597995 (8.2M) [application/zip]
Saving to: ‘chromedriver-linux64.

### Installing LaVague

We now need to download the LaVague PyPi package, which contains the `ActionEngine` module dedicated to handling all the key AI operations and the `CommandCenter` module, which orchestrates the whole workflow.

In [3]:
!pip install lavague

Collecting lavague
  Downloading lavague-1.0.4.post2-py3-none-any.whl (19 kB)
Collecting llama-index==0.10.19 (from lavague)
  Downloading llama_index-0.10.19-py3-none-any.whl (5.6 kB)
Collecting llama-index-agent-openai==0.1.5 (from lavague)
  Downloading llama_index_agent_openai-0.1.5-py3-none-any.whl (12 kB)
Collecting llama-index-cli==0.1.9 (from lavague)
  Downloading llama_index_cli-0.1.9-py3-none-any.whl (25 kB)
Collecting llama-index-core==0.10.19 (from lavague)
  Downloading llama_index_core-0.10.19-py3-none-any.whl (15.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.3/15.3 MB[0m [31m59.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-embeddings-azure-openai==0.1.5 (from lavague)
  Downloading llama_index_embeddings_azure_openai-0.1.5-py3-none-any.whl (3.0 kB)
Collecting llama-index-embeddings-huggingface==0.1.4 (from lavague)
  Downloading llama_index_embeddings_huggingface-0.1.4-py3-none-any.whl (7.7 kB)
Collecting llama-index-embedding

### HuggingFace set-up

⚠️ For remote inference with the Hugging Face inference api, you will need to provide a HuggingFace user access token with `read` access in the code block below!

> If you don't have a HuggingFace user access token, you can get one for free by creating a HuggingFace account and following the instructions [here](https://huggingface.co/docs/hub/en/security-tokens).

> Alternatively, you can run the notebook in local inference mode (the model will be downloaded and run locally instead of via an API) with our [local quick-tour](./local-quick-tour.ipynb) (coming soon).

In [5]:
# Add your HuggingFace Token below!
HF_TOKEN = "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"

# If you prefer, you can first set your HF_TOKEN an environment variable, or secret in a Google Colab and run this code instead
# try:
#   from google.colab import userdata
#   HF_TOKEN = userdata.get('HF_TOKEN')
# except:
#   import os
#   HF_TOKEN = os.environ["HF_TOKEN"]

## Running LaVague

### Initial config

Now we are ready to initialize our `CommandCenter` class with the following arguments:

- An instance of `ActionEngine` with a LlamaIndex LLM, embedding model and prompt template. For this example, we will use the default HuggingFace API `LLM` (Nous-Hermes-2-Mixtral-8x7B-DPO) supplied with our HF token, the default `embedding` (bge-small-en-v1.5) and the default prompt template.
- The path to our chrome-linux64/Chrome folder
- The path to our chromedriver-linux64/chromedriver folder

In [6]:
from lavague import ActionEngine, CommandCenter
from lavague.defaults import HuggingfaceApiLLM, DefaultEmbedder
from lavague.prompts import DEFAULT_PROMPT


commandCenter = CommandCenter(
    ActionEngine(HuggingfaceApiLLM(token=HF_TOKEN), DefaultEmbedder(), DEFAULT_PROMPT),
    chromePath="chrome-linux64/chrome",
    chromedriverPath="chromedriver-linux64/chromedriver",
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

### Launching LaVague

We are now ready to launch an interactive Gradio demo which will allow us to execute natural language instructions on a site of our choice.

To do this, we use the `commandCenter.run()` method, passing it the URL of the website we wish to perform actions on and three default instructions which will appear in the interactive Gradio page this will generate.

In [7]:
commandCenter.run(
    "https://huggingface.co",
    [
        "Click on the Datasets item on the menu, between Models and Spaces",
        "Click on the search bar 'Filter by name', type 'The Stack', and press 'Enter'",
        "Scroll by 500 pixels",
    ],
)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://910e91ce4208c6b368.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()
  value_counts = pd.Series(tokens).value_counts()


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://910e91ce4208c6b368.gradio.live


⚠️ You will need to interact with the generated Gradio demo to perform automated actions.

First, you should by click in the URL textbox and press enter. Then, you should select your chosen default natural language instruction or write your own, and again click within the instruction textbox and press enter.

At this point Selenium code in Python is generated by our LLM, which is then executed to perform the desired action on the website.

The action will then be visibly executed in the visual interface and you can also check out the code LaVague executed to perform this action on the right-hand side of the Gradio page.

> Note you can open the Gradio interface in your browser using the URL displayed in the cell output below.


That brings us to the end of this quick-tour. If you have any questions, join us on the LaVague Discord [here](https://discord.com/invite/SDxn9KpqX9).