# Generate a Synthetic Text Data Quality Report with Gretel Evaluate

This notebook walks through the process of generating a Text SQS report using Gretel Evaluate.
To run this notebook, you will need an API key from the Gretel console, at https://console.gretel.cloud.

<a target="_blank" href="https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/evaluate/generate_text_quality_report_with_evaluate.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Getting started


In [None]:
%%capture
!pip install -U gretel-client

In [None]:
import IPython
import pandas as pd

from gretel_client.evaluation.text_quality_report import TextQualityReport
from gretel_client import configure_session


In [None]:
# Specify your Gretel API Key

pd.set_option("max_colwidth", None)

configure_session(api_key="prompt", cache="yes", validate=True)

# Load and preview the datasets

Specify a real-world dataset and a synthetic dataset to evaluate. The synthetic data was generated from the real-world data. These can be local files or web locations.

For demonstration purposes, we'll use an ecommerce review dataset as our real-world data. Our synthetic data is the corresponding data generated by Gretel's gpt-x model.

In [None]:
# Load and preview real-world data


real_data = "https://gretel-datasets.s3.us-west-2.amazonaws.com/Text-dataset/ecommerce/ecommerce_train.csv"
real_df = pd.read_csv(real_data)
real_df



In [None]:
# Load and preview synthetic data

synthetic_data = "https://gretel-datasets.s3.us-west-2.amazonaws.com/Text-dataset/ecommerce/ecommerce_synthetic.csv"

synth_df = pd.read_csv(synthetic_data)
synth_df

# Create a Text Quality Report

Now, we will task a worker running in the Gretel cloud to generate a Text Quality Report using a temporary project.

In [None]:
# Comparing text quality

report = TextQualityReport(data_source=synth_df,
                           ref_data=real_df,
                           target='review_text',
                           record_count=len(synth_df))
report.run()

# View results
Synthetic Text Data Quality Score (Text SQS)

In [None]:
report.peek()

# Quality Report as HTML

In [None]:
IPython.display.HTML(report.as_html, metadata=dict(isolated=True))

# Quality Report as Dictionary

In [None]:
report.as_dict