# LLooM: Getting Started - Template Notebook

Last Updated: April 2024

### Installation
First, install the LLooM Python package, available on PyPI as [`text_lloom`](https://pypi.org/project/text_lloom/). We recommend setting up a virtual environment with [venv](https://docs.python.org/3/library/venv.html#creating-virtual-environments) or [conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands).

In [None]:
!pip install text_lloom

### Imports

In [None]:
import os
import pandas as pd
import numpy as np
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

LLooM uses the OpenAI API under the hood to support its core operators (using GPT-3.5 and GPT-4). You'll first need to locally set the `OPENAI_API_KEY` variable to use your own account.

In [None]:
# Please enter in your OpenAI key "sk-123xyz" below.
os.environ["OPENAI_API_KEY"] = "TODO: Insert your key here!"

In [None]:
# Import the LLooM package:
import text_lloom.workbench as wb

### Load data
For this example, we'll be using a sample dataset of 100 **Facebook posts** from **political** pages, gathered via CrowdTangle. The main columns we'll be using in our analysis are the following:
- `doc_id`: Unique ID for each post
- `text`: The text of the Facebook post
- `Page Category`: The category of the Facebook page
- `Likes`: The number of "likes" that the post received

In [None]:
# We'll load data from an existing CSV
data_link = "https://michelle123lam.github.io/lloom/data/political_fb_posts_100.csv"
df = pd.read_csv(data_link)

In [None]:
# Preview of dataframe
display(df[["doc_id", "text"]].head())

## v1: Manual mode

This notebook shows two example workflows: **v1: Manual mode**, or **v2: Auto mode**. We recommend starting with **v1: Manual mode** to survey the LLooM concepts and get a sense for the underlying functions. 

### Create a LLooM instance
Then, after loading your data as a Pandas DataFrame, create a new LLooM instance. You will need to specify the name of the column that contains your input text documents (`text_col`). The ID column (`id_col`) is optional.

In [None]:
# Set up the LLooM instance with the specified dataset
l = wb.lloom(
    df=df,
    text_col="text",
    id_col="doc_id",  # Optional
)

### Run concept generation
Next, you can go ahead and start the concept induction process by generating concepts. You can omit the `seed` parameter if you do not want to use a seed.

In [None]:
cur_seed = None  # Optionally replace with string
await l.gen(seed=cur_seed)

In [None]:
# View cost/time summary
l.summary()

### Review concepts

Review the generated concepts and select concepts to inspect further:

In [None]:
l.select()

In [None]:
# You can also double-check on your selected concepts with this command
l.show_selected()

### Score concepts
Then, apply these concepts to the full dataset with `score()`. This function will score all documents with respect to each concept to indicate the extent to which the document matches the concept inclusion criteria.

In [None]:
# Run concept scoring
score_df = await l.score()

In [None]:
# View cost/time summary
l.summary()

### Visualize results
Now, you can visualize the results in the main **LLooM Workbench** view. An interactive widget will appear when you run the `vis` function:
![LLooM Workbench UI](../media/lloom_workbench_ui.png)

The **Concept Overview (A)** provides a high-level summary. Click on a concept row in the **Concept Matrix (B)** to see its **Detail View (C)**, or click on a slice column to see its corresponding Detail View.

In [None]:
# Visualize concept results
# Group data by the number of likes (automatically binned) with slice_col
l.vis(slice_col="Likes")

In [None]:
# Visualize concept results
# Group data by page category with slice_col
l.vis(slice_col="Page Category")

### (Optional) Try normalizing by slice or by concept


In [None]:
l.vis(slice_col="Likes", norm_by="slice")

In [None]:
l.vis(slice_col="Likes", norm_by="concept")

### (Optional) Add manual concept
You may also manually add your own custom concepts by providing a name and prompt. This will automatically score the data by that concept. Re-run the `vis()` function to see the new concept results.

In [None]:
# Add a custom concept with the given name and prompt
await l.add(
    name="Your new concept name",
    prompt="Your new concept criteria prompt",  # Ex: "Does the text include [...]?"
)

In [None]:
# Visualize concept results
l.vis(slice_col="Likes")

### (Optional) Submit your results
**🖼️ ✨ Submit your work for a chance to be featured on our site!**

If you'd like to share what you've done with LLooM or would like your work featured in a gallery of results, please submit your LLooM instance with the `submit()` function! If your submission is selected, we'll reach out to you to follow up and hear more about your work with LLooM.

In [None]:
l.submit()  # You will be prompted to provide a few details about your analysis

### (Optional) Export and/or save results

In [None]:
# Export the results to a dataframe
export_df = l.export_df()

In [None]:
export_df.head()

In [None]:
# Save the lloom to a pickle file
l.save(folder="your/path/here", file_name="your_file_name")

## v2: Auto mode

LLooM also provides a one-function **auto** mode that grants less control, but simplifies the generation and scoring process into a single function. You can try out this version with the functions below.

### Create a LLooM instance
Then, after loading your data as a Pandas DataFrame, create a new LLooM instance. You will need to specify the name of the column that contains your input text documents (`text_col`). The ID column (`id_col`) is optional.

In [None]:
# Set up the LLooM instance with the specified dataset
l = wb.lloom(
    df=df,
    text_col="text",
    id_col="doc_id",  # Optional
)

### Run concept generation
Next, you can go ahead and start the concept induction process by generating concepts. You can omit the `seed` parameter if you do not want to use a seed.

In [None]:
cur_seed = None  # Optionally replace with string
score_df = await l.gen_auto(seed=cur_seed, max_concepts=5)

In [None]:
# View cost/time summary
l.summary()

### Visualize results
Now, you can visualize the results in the main **LLooM Workbench** view. An interactive widget will appear when you run the `vis` function:
![LLooM Workbench UI](../media/lloom_workbench_ui.png)

The **Concept Overview (A)** provides a high-level summary. Click on a concept row in the **Concept Matrix (B)** to see its **Detail View (C)**, or click on a slice column to see its corresponding Detail View.

In [None]:
# Visualize concept results
# Group data by the number of likes (automatically binned) with slice_col
l.vis(slice_col="Likes")

In [None]:
# Visualize concept results
# Group data by page category with slice_col
l.vis(slice_col="Page Category")

### (Optional) Try normalizing by slice or by concept


In [None]:
l.vis(slice_col="Likes", norm_by="slice")

In [None]:
l.vis(slice_col="Likes", norm_by="concept")

### (Optional) Add manual concept
You may also manually add your own custom concepts by providing a name and prompt. This will automatically score the data by that concept. Re-run the `vis()` function to see the new concept results.

In [None]:

# Add a custom concept with the given name and prompt
await l.add(
    name="Your new concept name",
    prompt="Your new concept criteria prompt",  # Ex: "Does the text include [...]?"
)

In [None]:
# Visualize concept results
l.vis(slice_col="Likes")

### (Optional) Submit your results
**🖼️ ✨ Submit your work for a chance to be featured on our site!**

If you'd like to share what you've done with LLooM or would like your work featured in a gallery of results, please submit your LLooM instance with the `submit()` function! If your submission is selected, we'll reach out to you to follow up and hear more about your work with LLooM.

In [None]:
l.submit()  # You will be prompted to provide a few details about your analysis

### (Optional) Export and/or save results

In [None]:
# Export the results to a dataframe
export_df = l.export_df()

In [None]:
export_df.head()

In [None]:
# Save the lloom to a pickle file
l.save(folder="your/path/here", file_name="your_file_name")