# FiftyOne: Tabular Datasets

## 0. General Guidelines for FiftyOne

FiftyOne provides a powerful UI that enables you to perform visual error analysis on your datasets. Here are some concrete steps you can take using the FiftyOne UI to analyze the errors in the iris dataset after having trained a logistic regression model:

1. **Sort by Prediction Confidence**:
   - If your model outputs confidence scores for predictions, you can sort samples based on this score to identify those that the model was least sure about. This can give insight into borderline cases or potential areas of confusion.

2. **Filter by Label Mismatches**:
   - Use the FiftyOne UI to create a view where the ground truth labels don't match the predicted labels. This will allow you to quickly spot the instances where your model made mistakes.
   - In the sidebar, you'll see options to filter by fields. Use the ground truth and prediction fields to find mismatches.

3. **Visualize Confusion Matrix**:
   - While the FiftyOne UI doesn't provide a built-in confusion matrix, you can quickly generate one using `sklearn` and then use it to guide your error analysis in the UI.
   - After creating a confusion matrix, focus on the largest off-diagonal values. These represent the most common mistakes your model is making. Use the FiftyOne UI to filter samples that fall into these categories and inspect them.

4. **Examine Feature Values**:
   - The iris dataset has four features for each sample: sepal length, sepal width, petal length, and petal width. For any mistakes the model makes, examine these feature values in the FiftyOne UI. Are there patterns to the mistakes? For instance, are most errors happening when the petal length is within a certain range?

5. **Use the Plots Feature**:
   - FiftyOne has a powerful plots feature that allows you to visualize the distribution of various fields. Use this to visualize the distribution of confidence scores, or perhaps to visualize how the four features of the iris dataset relate to prediction errors.

6. **Tag Samples of Interest**:
   - As you identify interesting samples or patterns of errors, you can tag these samples directly in the FiftyOne UI. This will allow you to easily return to them later or to pull them up programmatically in your notebook for further analysis.

7. **Session Views**:
   - As you work, the FiftyOne app tracks the views you create. This means you can always go back to a previous view or state. It's like having a history or breadcrumbs feature for your data exploration.


## 1. Load the Iris Dataset & Split

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X = iris.data
y = iris.target
class_names = iris.target_names

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

## 2. Train a Logistic Regression Model

In [2]:
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)

In [3]:
# Make the predictions
y_pred = clf.predict(X_test)
y_probs = clf.predict_proba(X_test)


## 3. Error Analysis with FiftyOne

In [4]:
#!pip install fiftyone

In [5]:
import numpy as np
import fiftyone as fo
import fiftyone.brain as fob
from fiftyone import ViewField as F
from sklearn.decomposition import PCA

In [6]:
DATASET_NAME = "Iris_Error_Analysis"

In [15]:
# Delete dataset if already created one before
# Make sure to close/stop the borwser app (CLI: Ctrl+C)
print(fo.list_datasets())
if DATASET_NAME in fo.list_datasets():
    try:
        fo.delete_dataset(DATASET_NAME)
    except:
        pass
print(fo.list_datasets())

['Iris_Error_Analysis']
[]


In [16]:
# Optional: Compute embeddings: PCA
pca = PCA(n_components=2)
pca = pca.fit(X_train)
pca_result = pca.transform(X_test)

In [17]:
# Create a FiftyOne sample collection
samples = []
for true_label, predicted_label, probs, features, pca in zip(y_test, y_pred, y_probs, X_test, pca_result):
    sample = fo.Sample(
        filepath="none",  # No image paths for the iris dataset
        predictions=fo.Classification(
            label=class_names[predicted_label], 
            confidence=max(probs),
            logits=probs.tolist()
        ),
        ground_truth=fo.Classification(label=class_names[true_label]),
        # Optional: features
        sepal_length=features[0],
        sepal_width=features[1],
        petal_length=features[2],
        petal_width=features[3],
        # Optional: PCA embeddings
        #pca_1=pca[0],
        #pca_2=pca[1]
        pca_embedding=pca.tolist()
    )
    samples.append(sample)

In [18]:
# Create a FiftyOne dataset
dataset = fo.Dataset(name=DATASET_NAME)
dataset = dataset.add_samples(samples)
# After adding the samples to the dataset
# we need to launch the UI -- see next section
# http://localhost:5151

 100% |███████████████████| 45/45 [165.8ms elapsed, 0s remaining, 271.5 samples/s] 


In [26]:
# Optional: Compute and visualize the PCA embeddings
# 
dataset = fo.load_dataset(DATASET_NAME)
fob.compute_visualization(dataset,
                          points=pca_result,
                          brain_key="pca_viz")

<fiftyone.brain.visualization.VisualizationResults at 0x232105660a0>

### Launch the App

We can launch the FiftyOne UI in several ways; the two most common ways:

1. With code in our environment:

    ```python
    session = fo.launch_app(dataset, desktop=True) # Browser: http://localhost:5151
    session = fo.launch_app(dataset, desktop=False) # Embedded in Jupyter
    ```

2. In the CLI:

    ```bash
    # fiftyone app launch <dataset_name>
    (label) fiftyone app launch "Iris_Error_Analysis"
    # Browser: http://localhost:5151
    ```

In [32]:
# Launch the FiftyOne app
#session = fo.launch_app(dataset, desktop=True) # Browser: http://localhost:5151
#session = fo.launch_app(dataset, desktop=False) # Embedded in Jupyter

### Basic Usage of the App

- Left panel: select tags / labels / primitives (features added in code)
- Main frame: we can visualize
  - Samples
  - Histograms
  - Embeddings

![Web UI](../assets/web_ui.png)