<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-assets/phoenix/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://docs.arize.com/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q">Community</a>
    </p>
</center>
<h1 align="center">Phoenix Quickstart</h1>

In this quickstart, you will:

- Download curated training and production datasets and load them into pandas DataFrames
- Compute embeddings using a pre-trained model
- Define a schema to describe the format of your data
- Launch Phoenix and explore the app

**Note: You can run this quickstart with or without a GPU.**

Let's get started!

## 1. Install Dependencies and Import Libraries

In [None]:
%pip install -q arize-phoenix "arize[AutoEmbeddings]"

In [None]:
from arize.pandas.embeddings import EmbeddingGenerator, UseCases
import pandas as pd
import phoenix as px
import torch

## 2. Download the Data

Download the curated datasets.

In [None]:
train_df = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/nlp/sentiment-classification-language-drift/sentiment_classification_language_drift_training.parquet",
)
prod_df = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/nlp/sentiment-classification-language-drift/sentiment_classification_language_drift_production.parquet",
)

## 3. Compute Embeddings

Compute embeddings using a pre-trained model if CUDA is available, otherwise, use pre-computed embeddings.

In [None]:
if torch.cuda.is_available():
    print("CUDA is available, computing embeddings for text.")
    generator = EmbeddingGenerator.from_use_case(
        use_case=UseCases.NLP.SEQUENCE_CLASSIFICATION,
        model_name="distilbert-base-uncased",
    )
    train_df["text_vector"] = generator.generate_embeddings(text_col=train_df["text"])
    prod_df["text_vector"] = generator.generate_embeddings(text_col=prod_df["text"])
else:
    print("CUDA is unavailable, using pre-computed embeddings.")

## 4. Launch Phoenix

### a) Define Your Schema
To launch Phoenix with your data, you first need to define a schema that tells Phoenix which columns of your DataFrames correspond to features, predictions, actuals (i.e., ground truth), embeddings, etc.

The trickiest part is defining embedding features. In this case, each embedding feature has two pieces of information: the embedding vectors in the "text_vector" column and the text in the "text" column.

Define a schema for your data.

In [None]:
schema = px.Schema(
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="pred_label",
    actual_label_column_name="label",
    embedding_feature_column_names={
        "text_embedding": px.EmbeddingColumnNames(
            vector_column_name="text_vector",
            raw_data_column_name="text",
        ),
    },
)

### b) Define Your Datasets
Next, define your primary and reference datasets. In this case, your reference dataset contains training data and your primary dataset contains production data.

In [None]:
prod_ds = px.Dataset(prod_df, schema)
train_ds = px.Dataset(train_df, schema)

### c) Create a Phoenix Session

In [None]:
session = px.launch_app(prod_ds, train_ds)

### d) Launch the Phoenix UI

You can open Phoenix by copying and pasting the output of `session.url` into a new browser tab.

In [None]:
session.url

Alternatively, you can open the Phoenix UI in your notebook with

In [None]:
session.view()

## 5. Explore the App

Click on "text_embedding" in the "Embeddings" section to visualize your embedding data. What insights can you uncover from this page?

## 6. Close the App

When you're done, don't forget to close the app.

In [None]:
px.close_app()