# Analyzing Customer Reviews with Gemini 1.0 Pro

## Data Preparation

Start by using BigQuery's integration with Google Cloud's ML and AI APIs to prepare your data for submission to the Gemini model.

### Translate

Use the following query to translate your queries from their raw text into a unified language (in this case, English) using the [Cloud Translation API](https://cloud.google.com/translate). This doesn't require us to know the source language, but simply define the target language and continue. Check out the documentation [learn more](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-translate) about using the `ML.TRANSLATE` function in BigQuery, including additional use cases.

In [None]:
%%bigquery results
CREATE OR REPLACE TABLE `data-quality-demo-next-24.cymbal_sports.reviews_joined_translated`
AS
SELECT
  reviews.* EXCEPT (review_text, text_content,ml_translate_result, ml_translate_status),
  review_text AS review_text_raw,
  REGEXP_REPLACE(TRIM(JSON_VALUE(ml_translate_result, '$.translations[0].translated_text')), r'([a-zA-Z0-9\s]*)&#39;([a-zA-Z0-9\s]*)', "\\1'\\2") AS review_text,
  language_name_en AS review_language,
FROM
  ML.TRANSLATE(MODEL `data-quality-demo-next-24.cymbal_sports.translate`,
    (
    SELECT
      *,
      review_text AS text_content
    FROM
     `data-quality-demo-next-24.cymbal_sports.raw_reviews_joined`),
    STRUCT('translate_text' AS translate_mode, 'en' AS target_language_code)
  ) reviews
LEFT JOIN
  `data-quality-demo-next-24.cymbal_sports.iso_639_codes` iso ON TRIM(JSON_VALUE(reviews.ml_translate_result, '$.translations[0].detected_language_code')) = iso.iso_639_1

In [None]:
results

### Sentiment analysis

Run the following query to analyze the sentiment of each review using the [Cloud Natural Language API](https://cloud.google.com/natural-language). This allows us to determine the sentiment of each review, as well as the magnitude of this sentiment. We can break this down further too in order to understand which sentences express the sentiment most strongly, and more. Check out the documentation [learn more](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-understand-text) about using the `ML.UNDERSTAND_TEXT` function in BigQuery, including additional use cases.

In [None]:
%%bigquery results
CREATE OR REPLACE TABLE `data-quality-demo-next-24.cymbal_sports.cleaned_reviews`
AS
SELECT
  * EXCEPT (text_content, ml_understand_text_result, ml_understand_text_status),
  CASE
    WHEN CAST(JSON_VALUE(ml_understand_text_result, '$.document_sentiment.score') AS FLOAT64) > 0 THEN "positive"
    WHEN CAST(JSON_VALUE(ml_understand_text_result, '$.document_sentiment.score') AS FLOAT64) < 0 THEN "negative"
    WHEN CAST(JSON_VALUE(ml_understand_text_result, '$.document_sentiment.score') AS FLOAT64) = 0 THEN "neutral"
    ELSE "unknown"
  END AS sentiment,
  CAST(JSON_VALUE(ml_understand_text_result, '$.document_sentiment.magnitude') AS FLOAT64) AS sentiment_magnitude,
  CAST(JSON_VALUE(ml_understand_text_result, '$.document_sentiment.score') AS FLOAT64) AS sentiment_score,
FROM
  ML.UNDERSTAND_TEXT(MODEL `data-quality-demo-next-24.cymbal_sports.nlp`,
    (
    SELECT
      *,
      review_text AS text_content
    FROM
      `data-quality-demo-next-24.cymbal_sports.reviews_joined_translated`),
    STRUCT('ANALYZE_SENTIMENT' AS nlu_option)
  )

In [None]:
results

### Text Parsing

Parse the text from your customer service policy (a PDF exported to PNGs) into BigQuery using the [Vision AI API](https://cloud.google.com/vision). This performs OCR (Optical Character Recognition) and returns the text into a single cell that we can pass to Gemini later. This can also be done using [Document AI](https://cloud.google.com/document-ai), which is particularly useful for bringing data from highly-structured documents (like tax forms and driver's licenses) into BigQuery. Check out the documentation [learn more](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-annotate-image) about using the `ML.ANNOTATE_IMAGE` function in BigQuery, including additional use cases.

In [None]:
%%bigquery results
CREATE OR REPLACE TABLE `data-quality-demo-next-24.cymbal_sports.complete_service_policy`
AS
with hold AS(
  SELECT
    REGEXP_REPLACE(REGEXP_REPLACE(JSON_VALUE(ml_annotate_image_result.full_text_annotation.text), r'\n', ' '), r'•|●', '') AS text_content,
    CAST(REGEXP_EXTRACT(uri, r'^.*Page_([0-9]{2})\.png$') AS INT64) AS page_number,
    *
  FROM
    ML.ANNOTATE_IMAGE(MODEL `data-quality-demo-next-24.cymbal_sports.vision_ai`,
      TABLE `data-quality-demo-next-24.cymbal_sports.service_policy`,
      STRUCT(['DOCUMENT_TEXT_DETECTION'] AS vision_features)) reviews)

SELECT ARRAY_TO_STRING(ARRAY(SELECT TRIM(text_content, "Internal Only For use by Cymbal Sports employees only") FROM hold WHERE page_number > 2 ORDER BY page_number)," ") AS service_policy_text, 1.0 AS version_number

## Resolving customer issues

Use the following query to call the Python [remote function](https://cloud.google.com/bigquery/docs/remote-functions) to pass the appropriate information to the Gemini model and get back a recommendation for action to resolve the issue, plus recommended communications to the user in both the review language and English. 

In [None]:
%%bigquery results

WITH
  hold AS (
  SELECT
    PARSE_JSON(`data-quality-demo-next-24.cymbal_sports_lineage.gemini_analysis` (review_id)) AS response
  FROM
    `data-quality-demo-next-24.cymbal_sports.cleaned_reviews`
  WHERE
    sentiment = "negative"
    AND uri IS NOT NULL
    AND review_id = "142")

SELECT
  STRING(response.issue_resolution) AS issue_resolution,
  STRING(response.response_user_language) AS email_user_language,
  STRING(response.response_translated) AS email_translated
FROM
  hold

In [None]:
results

## Review usage lineage

This demo also highlights how BigQuery and Google Cloud's Generative AI tooling can help practitioners capture and review their team's AI usage over time. [Cloud Pub/Sub](https://cloud.google.com/pubsub) is used to capture metadata from Gemini invocations, including the prompt & model response, version of the model and customer service policy used, and embeddings of key inputs that support multimodal similarity search in BigQuery that allow us to identify patterns over time.

### Review prompt data

Let's start by looking at the information for prompts that were sent to Gemini through Cloud Functions. Cloud Pub/Sub captures this data and writes it directly to BigQuery. You can query this data to get a sense for model performance, or to examine long-term trends in issues by comparing embeddings for images and text included in reviews.

In [None]:
%%bigquery results

SELECT * FROM `data-quality-demo-next-24.cymbal_sports_lineage.reviews_prompts`

In [None]:
results

### Clean and review response data

You can review your response data to get a sense for the model's performance. This includes the text generated by the model, as well as associated [safety attributes](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes). These attributes can be configured to block responses that exceed the defined parameters, and you can also evaluate the safety attributes for each response using [BigQuery's JSON support](https://cloud.google.com/bigquery/docs/json-data). Here, we have written the safety attributes output to a column with a JSON data type, and are using [BigQuery's dot-notation](https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#JSONPath_format) to parse out results for each individual category. 

In [None]:
%%bigquery results

SELECT
  * EXCEPT (safety_attributes),
  STRUCT(safety_attributes[0].blocked AS blocked,
    safety_attributes[0].probability_score AS probability_score,
    safety_attributes[0].severity_score AS severity_score) AS hate_speech,
  STRUCT(safety_attributes[1].blocked AS blocked,
    safety_attributes[1].probability_score AS probability_score,
    safety_attributes[1].severity_score AS severity_score) AS dangerous_content,
  STRUCT(safety_attributes[2].blocked AS blocked,
    safety_attributes[2].probability_score AS probability_score,
    safety_attributes[2].severity_score AS severity_score) AS harassment,
  STRUCT(safety_attributes[3].blocked AS blocked,
    safety_attributes[3].probability_score AS probability_score,
    safety_attributes[3].severity_score AS severity_score) AS sexually_explicit,
  safety_attributes
FROM
  `data-quality-demo-next-24.cymbal_sports_lineage.reviews_responses`


In [None]:
results