<div id="singlestore-header" style="display: flex; background-color: rgba(235, 249, 245, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/browser.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">SingleStore Notebooks</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">Demonstrate some common AI function usecases</h1>
    </div>
</div>

<div class="alert alert-block alert-warning">
    <b class="fa fa-solid fa-exclamation-circle"></b>
    <div>
        <p><b>Note</b></p>
        <p>You can use your existing Standard or Premium workspace with this Notebook.</p>
    </div>
</div>


This feature is currently in **Private Preview**. Please reach out to support@singlestore.com to confirm if this feature can be enabled in your org.

This Jupyter notebook will help you:
1. Load the Amazon Fine Foods Reviews dataset from Kaggle
2. Store the data in SingleStore
3. Demonstrate powerful AI Functions for text processing and analysis

**Prerequisites**: Ensure AI Functions are installed on your deployment (AI Services > AI & ML Functions).

## Create some simple tables

This setup establishes a basic relational structure to store some reviews for restaurants. Ensure you have selected a database and have CREATE permissions to create/delete tables.

In [1]:
%%sql
CREATE DATABASE IF NOT EXISTS temp;
USE temp;

In [2]:
%%sql
DROP TABLE IF EXISTS reviews;

CREATE TABLE IF NOT EXISTS reviews (
    Id INT PRIMARY KEY,
    ProductId VARCHAR(20),
    UserId VARCHAR(50),
    ProfileName VARCHAR(255),
    HelpfulnessNumerator INT,
    HelpfulnessDenominator INT,
    Score INT,
    Time BIGINT,
    Summary TEXT,
    Text TEXT
);

## Install the required packages

In [3]:
!pip install -q httplib2 kagglehub pandas

## Download and Load Dataset

In [4]:
import kagglehub
import pandas as pd

# Download the Amazon Fine Foods Reviews dataset from Kaggle
print("Downloading dataset from Kaggle...")
path = kagglehub.dataset_download("snap/amazon-fine-food-reviews")
print(f"Dataset downloaded to: {path}")

# Read the CSV file
df = pd.read_csv(f"{path}/Reviews.csv")

# Display dataset info
print(f"\nDataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print("\nFirst few rows:")
df.head()

## Load Data into SingleStore

In [5]:
import singlestoredb as s2

# Create SQLAlchemy engine instead of regular connection
engine = s2.create_engine(database='temp')

# Take a sample of 10,000 reviews for demo purposes
sample_df = df.head(10000).copy()

print(f"Loading {len(sample_df)} reviews into SingleStore...")

# Write dataframe to SingleStore table using SQLAlchemy engine
sample_df.to_sql(
    'reviews',
    con=engine,  # Use engine instead of connection
    if_exists='append',
    index=False,
    chunksize=1000
)

print("Data loaded successfully!")

 ## Verify Data Load

In [6]:
%%sql
-- Check the number of reviews loaded
SELECT COUNT(*) as total_reviews FROM reviews;

## Sample Data Preview

In [7]:
%%sql
-- View sample reviews
SELECT Id, ProductId, Score, Summary, LEFT(Text, 100) as Review_Preview
FROM reviews
LIMIT 10;

## AI Functions Demonstrations

Now let's explore the power of SingleStore AI Functions for text analysis and processing.
Ensure that AI functions are enabled for the org and you are able to list the available AI functions

In [8]:
%%sql
SHOW functions in cluster;

In [9]:
%%sql
-- AI_COMPLETE: Ask general questions and get LLM-powered completions
SELECT cluster.AI_COMPLETE(
    'What is SingleStore?'
) AS completion;

In [10]:
%%sql
-- AI_SENTIMENT: Analyze sentiment of customer reviews for a specific product
-- WHERE ProductId = <Your choice>
-- Remember to specify the datbase name. In this example 'temp' is the Database name
SELECT
    Id,
    ProductId,
    Score,
    LEFT(Text, 80) as Review_Snippet,
    cluster.AI_SENTIMENT(Text) AS sentiment
FROM temp.reviews
WHERE ProductId = 'B000NY8ODS'
LIMIT 10;

In [11]:
%%sql
-- Aggregate sentiment analysis across products
-- Using CTE to filter and prepare data first
WITH filtered_reviews AS (
    SELECT
        ProductId,
        Text
    FROM temp.reviews
    WHERE ProductId IN (
        SELECT ProductId
        FROM temp.reviews
        GROUP BY ProductId
        HAVING COUNT(*) >= 5
    )
    LIMIT 100
),
grouped_reviews AS (
    SELECT
        ProductId,
        COUNT(*) as review_count,
        GROUP_CONCAT(Text SEPARATOR '. ') as combined_text
    FROM filtered_reviews
    GROUP BY ProductId
    LIMIT 5
)
SELECT
    ProductId,
    review_count,
    cluster.AI_SENTIMENT(combined_text) as overall_sentiment
FROM grouped_reviews;

In [12]:
%%sql
-- AI_SUMMARIZE: Create concise summaries of lengthy reviews
-- Filter long reviews first using CTE
WITH long_reviews AS (
    SELECT
        Id,
        ProductId,
        Text,
        LEFT(Text, 150) as Original_Review
    FROM temp.reviews
    WHERE LENGTH(Text) > 200
    LIMIT 5
)
SELECT
    Id,
    ProductId,
    Original_Review,
    cluster.AI_SUMMARIZE(
        Text,
        'aifunctions_chat_default',
        15
    ) AS summary
FROM long_reviews;

In [13]:
%%sql
-- AI_CLASSIFY: Classify customer feedback into categories
-- Filter negative reviews first using CTE
WITH negative_reviews AS (
    SELECT
        Id,
        ProductId,
        Text,
        LEFT(Text, 100) as Review_Text
    FROM temp.reviews
    WHERE Score <= 3
    LIMIT 10
)
SELECT
    Id,
    ProductId,
    Review_Text,
    cluster.AI_CLASSIFY(
        Text,
        '[quality, price, shipping, taste]'
    ) AS classification
FROM negative_reviews;

In [14]:
%%sql
-- AI_EXTRACT: Extract specific information from reviews
-- Filter positive reviews first using CTE
WITH positive_reviews AS (
    SELECT
        Id,
        ProductId,
        Text,
        LEFT(Text, 100) as Review_Text
    FROM temp.reviews
    WHERE Score >= 4
    LIMIT 10
)
SELECT
    Id,
    ProductId,
    Review_Text,
    cluster.AI_EXTRACT(
        Text,
        'Does this customer indicate they will buy this product again? Answer with yes, no, or unclear only'
    ) AS repeat_purchase_intent
FROM positive_reviews;

In [15]:
%%sql
-- AI_EXTRACT: Identify reviews with high churn risk
-- Filter low-rated reviews first using CTE
WITH low_rated_reviews AS (
    SELECT
        Id,
        ProductId,
        Score,
        Text,
        LEFT(Text, 120) as Review_Text
    FROM temp.reviews
    WHERE Score <= 2
    LIMIT 10
)
SELECT
    Id,
    ProductId,
    Score,
    Review_Text,
    cluster.AI_EXTRACT(
        Text,
        'Is this customer at high risk of not purchasing again? Answer with high, medium, or low only'
    ) AS churn_risk
FROM low_rated_reviews;

In [16]:
%%sql
-- AI_TRANSLATE: Translate text between languages
-- Filter reviews with substantial summaries first using CTE
WITH translatable_reviews AS (
    SELECT
        Id,
        Summary as Original_English
    FROM temp.reviews
    WHERE Score = 5
    AND Summary IS NOT NULL
    AND LENGTH(Summary) > 20
    LIMIT 5
)
SELECT
    Id,
    Original_English,
    cluster.AI_TRANSLATE(
        Original_English,
        'english',
        'spanish'
    ) AS spanish_translation
FROM translatable_reviews;

In [17]:
%%sql
-- Combined AI Functions: Comprehensive product analysis
-- Filter to products with multiple reviews first
WITH popular_products AS (
    SELECT ProductId
    FROM temp.reviews
    GROUP BY ProductId
    HAVING COUNT(*) >= 10
    LIMIT 5
),
product_reviews AS (
    SELECT
        r.ProductId,
        r.Text,
        r.Score,
        LEFT(r.Text, 80) as Review_Sample
    FROM temp.reviews r
    INNER JOIN popular_products p ON r.ProductId = p.ProductId
    LIMIT 10
)
SELECT
    ProductId,
    Score,
    Review_Sample,
    cluster.AI_SENTIMENT(Text) as sentiment,
    cluster.AI_CLASSIFY(Text, '[quality, value, taste, packaging]') as category,
    cluster.AI_SUMMARIZE(Text, 'aifunctions_chat_default', 10) as brief_summary
FROM product_reviews;

## Cleanup

In [18]:
%%sql
DROP TABLE IF EXISTS reviews;
DROP DATABASE IF EXISTS temp;

<div id="singlestore-footer" style="background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px"></div>
<div><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png" style="padding: 0px; margin: 0px; height: 24px"/></div>