![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FWorking+With%2FEmbeddings&file=BQML+Autoencoder+As+Table+Embedding.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Working%20With/Embeddings/BQML%20Autoencoder%20As%20Table%20Embedding.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FWorking%2520With%2FEmbeddings%2FBQML%2520Autoencoder%2520As%2520Table%2520Embedding.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Working%20With/Embeddings/BQML%20Autoencoder%20As%20Table%20Embedding.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Working%20With/Embeddings/BQML%20Autoencoder%20As%20Table%20Embedding.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# BigQuery ML Autoencoder As Table Embedding

Autoencoders are a type of neural network designed for unsupervised learning. They learn efficient representations (encodings) of input data by compressing it into a smaller representation called the "latent space." This latent space captures the most essential features of the input.

> The autoencoder also learns to reconstruct the original input data from this compressed representation using a decoder. The training process involves minimizing the difference (loss) between the original input and the reconstructed output. This comparison between input and reconstructed output serves as a form of supervision, even though the task itself is considered unsupervised.

With [BigQuery ML](https://cloud.google.com/bigquery/docs/bqml-introduction) you can train an [autoencoder](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-autoencoder) on tabular data.  Prediction with the [`ML.PREDICT`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict#autoencoder_models) function will return the latent space, encoder result, from the trained model.  These representations can be used as embeddings to do tasks like matching similar rows to a query row.

For a detailed review of BigQuery ML Autoencoders, check out the end-to-end workflow in this repository: [BQML Autoencoder with Anomaly Detection](../../03%20-%20BigQuery%20ML%20%28BQML%29/03i%20-%20BQML%20Autoencoder%20with%20Anomaly%20Detection.ipynb)

**Prerequisites:**
-  [01 - BigQuery - Table Data Source](../../01%20-%20Data%20Sources/01%20-%20BigQuery%20-%20Table%20Data%20Source.ipynb)

---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user(project_id = PROJECT_ID)
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name)
packages = [
    ('google.cloud.bigquery', 'google-cloud-bigquery'),
    ('bigframes', 'bigframes'),
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [4]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

inputs:

In [12]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [26]:
REGION = 'us-central1'
EXPERIMENT = 'bqml-autoencoder'
SERIES = 'working-with-embeddings'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

BQ_MODEL = f'{SERIES}-{EXPERIMENT}'

packages:

In [18]:
import numpy as np
from google.cloud import bigquery
import bigframes.pandas as bpd

clients:

In [19]:
bq = bigquery.Client(project = PROJECT_ID)

---
## Review Source Data

The data source here was prepared in [01 - BigQuery - Table Data Source](../../01%20-%20Data%20Sources/01%20-%20BigQuery%20-%20Table%20Data%20Source.ipynb).

This is a table of 284,807 credit card transactions classified as fradulant or normal in the column `Class`.  In order protect confidentiality, the original features have been transformed using [principle component analysis (PCA)](https://en.wikipedia.org/wiki/Principal_component_analysis) into 28 features named `V1, V2, ... V28` (float).  Two descriptive features are provided without transformation by PCA:
- `Time` (integer) is the seconds elapsed between the transaction and the earliest transaction in the table
- `Amount` (float) is the value of the transaction

The data preparation included added splits for machine learning with a column named `splits` with 80% for training (`TRAIN`), 10% for validation (`VALIDATE`) and 10% for testing (`TEST`).  Additionally, a unique identifier was added to each transaction, `transaction_id`.  

### Review BigQuery table:

In [23]:
source_data = bq.query(f'SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} LIMIT 5').to_dataframe()
source_data

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,35337,1.092844,-0.01323,1.359829,2.731537,-0.707357,0.873837,-0.79613,0.437707,0.39677,...,-0.167647,0.027557,0.592115,0.219695,0.03697,0.010984,0.0,0,a1b10547-d270-48c0-b902-7a0f735dadc7,TEST
1,60481,1.238973,0.035226,0.063003,0.641406,-0.260893,-0.580097,0.049938,-0.034733,0.405932,...,-0.057718,0.104983,0.537987,0.589563,-0.046207,-0.006212,0.0,0,814c62c8-ade4-47d5-bf83-313b0aafdee5,TEST
2,139587,1.870539,0.211079,0.224457,3.889486,-0.380177,0.249799,-0.577133,0.179189,-0.120462,...,0.180776,-0.060226,-0.228979,0.080827,0.009868,-0.036997,0.0,0,d08a1bfa-85c5-4f1b-9537-1c5a93e6afd0,TEST
3,162908,-3.368339,-1.980442,0.153645,-0.159795,3.847169,-3.516873,-1.209398,-0.292122,0.760543,...,-1.171627,0.214333,-0.159652,-0.060883,1.294977,0.120503,0.0,0,802f3307-8e5a-4475-b795-5d5d8d7d0120,TEST
4,165236,2.180149,0.218732,-2.637726,0.348776,1.063546,-1.249197,0.942021,-0.547652,-0.087823,...,-0.176957,0.563779,0.730183,0.707494,-0.131066,-0.090428,0.0,0,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,TEST


In [24]:
source_data.dtypes

Time                Int64
V1                float64
V2                float64
V3                float64
V4                float64
V5                float64
V6                float64
V7                float64
V8                float64
V9                float64
V10               float64
V11               float64
V12               float64
V13               float64
V14               float64
V15               float64
V16               float64
V17               float64
V18               float64
V19               float64
V20               float64
V21               float64
V22               float64
V23               float64
V24               float64
V25               float64
V26               float64
V27               float64
V28               float64
Amount            float64
Class               Int64
transaction_id     object
splits             object
dtype: object

### Review the number of records for each level of Class (VAR_TARGET) for each of the data splits:

In [25]:
bq.query(f'SELECT splits, class, count(*) as count FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} GROUP BY splits, class').to_dataframe()

Unnamed: 0,splits,class,count
0,TEST,0,28455
1,TEST,1,47
2,TRAIN,0,227664
3,TRAIN,1,397
4,VALIDATE,0,28196
5,VALIDATE,1,48


---
## Train Model

Use BigQuery ML to train unsupervised autoencoder model:
- [Autoencoder](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-autoencoder) with BigQuery ML (BQML)
- This uses the `splits` column that notebook `01` created to subset to the training data
    - not directly used by the `AUTOENCODER` training but used to subset to the `splits = 'TRAIN'` data for training
    
This example includes the [training options](https://cloud.google.com/bigquery-ml/docs/create_vertex) to register the resulting model in the [Vertex AI Model Registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction).

In [41]:
query = f"""
CREATE OR REPLACE MODEL `{BQ_PROJECT}.{BQ_DATASET}.{BQ_MODEL}`
OPTIONS (
        model_type = 'AUTOENCODER',
        activation_fn = 'RELU',
        batch_size = 30,
        dropout = .5,
        early_stop = TRUE,
        hidden_units = [128, 64, 8, 64, 128],
        max_iterations = 30,
        min_rel_progress = 0.001,
        optimizer = 'ADAM'
    ) AS
SELECT * EXCEPT(Class, splits, transaction_id),
FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
WHERE splits = 'TRAIN'
"""
print(query)


CREATE OR REPLACE MODEL `statmike-mlops-349915.fraud.working-with-embeddings-bqml-autoencoder`
OPTIONS (
        model_type = 'AUTOENCODER',
        activation_fn = 'RELU',
        batch_size = 30,
        dropout = .5,
        early_stop = TRUE,
        hidden_units = [128, 64, 8, 64, 128],
        max_iterations = 30,
        min_rel_progress = 0.001,
        optimizer = 'ADAM'
    ) AS
SELECT * EXCEPT(Class, splits, transaction_id),
FROM `statmike-mlops-349915.fraud.fraud_prepped`
WHERE splits = 'TRAIN'



In [42]:
job = bq.query(query = query)
job.result()
(job.ended-job.started).total_seconds()

1497.002

In [45]:
job.total_bytes_processed/1e6 #mb

36225.650368

### Evaluate Model

Calcuate evaluation statistics with [ML.EVALUATE](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-evaluate):

In [47]:
bq.query(
    query = f"""
        SELECT *
        FROM ML.EVALUATE(
            MODEL `{BQ_PROJECT}.{BQ_DATASET}.{BQ_MODEL}`,
            (SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` WHERE splits = 'TRAIN')
        )
        """
).to_dataframe()

Unnamed: 0,mean_absolute_error,mean_squared_error,mean_squared_log_error
0,0.446546,0.613318,0.029254


In [50]:
bq.query(
    query = f"""
        SELECT *
        FROM ML.EVALUATE(
            MODEL `{BQ_PROJECT}.{BQ_DATASET}.{BQ_MODEL}`,
            (SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` WHERE splits = 'VALIDATE')
        )
        """
).to_dataframe()

Unnamed: 0,mean_absolute_error,mean_squared_error,mean_squared_log_error
0,0.443835,0.611037,0.028819


In [49]:
bq.query(
    query = f"""
        SELECT *
        FROM ML.EVALUATE(
            MODEL `{BQ_PROJECT}.{BQ_DATASET}.{BQ_MODEL}`,
            (SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` WHERE splits = 'TEST')
        )
        """
).to_dataframe()

Unnamed: 0,mean_absolute_error,mean_squared_error,mean_squared_log_error
0,0.4462,0.59642,0.029364


### Predictions

Retrieve the latent space prediction with [ML.PREDICT](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-predict).

In [54]:
query = f"""
SELECT [latent_col_1, latent_col_2, latent_col_3, latent_col_4, latent_col_5, latent_col_6, latent_col_7, latent_col_8] as embedding, transaction_id
FROM ML.PREDICT (
    MODEL `{BQ_PROJECT}.{BQ_DATASET}.{BQ_MODEL}`,
    (SELECT *
    FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
    WHERE splits = 'TEST'
    LIMIT 5)
)
"""
pred = bq.query(query = query).to_dataframe()
pred

Unnamed: 0,embedding,transaction_id
0,"[4.121513843536377, 0.0, 3.663341999053955, 1....",a1b10547-d270-48c0-b902-7a0f735dadc7
1,"[2.5058867931365967, 0.0, 5.8338775634765625, ...",814c62c8-ade4-47d5-bf83-313b0aafdee5
2,"[3.6094603538513184, 0.0, 1.5513304471969604, ...",d08a1bfa-85c5-4f1b-9537-1c5a93e6afd0
3,"[9.577651977539062, 0.0, 4.330973148345947, 7....",802f3307-8e5a-4475-b795-5d5d8d7d0120
4,"[1.8276915550231934, 0.0, 3.39656925201416, 3....",c8a5b93a-1598-4689-80be-4f9f5df0b8ce


---
## Embeddings

Use the predicted latent space as embeddings and add this back to the data for use in vector search. Use [`ML.NORMALIZER`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-normalizer) function to normalize the embeddings. 

With BigQuery vector indexes we can [create vector indexes](https://cloud.google.com/bigquery/docs/vector-index) and search these with the [VECTOR_SEARCH](https://cloud.google.com/bigquery/docs/vector-search#use_the_vector_search_function_with_an_index) function.

In [105]:
query = f'''
CREATE OR REPLACE TABLE `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_embedding` AS
WITH embed AS (
    SELECT ML.NORMALIZER([latent_col_1, latent_col_2, latent_col_3, latent_col_4, latent_col_5, latent_col_6, latent_col_7, latent_col_8]) as embedding, transaction_id
    FROM ML.PREDICT (
        MODEL `{BQ_PROJECT}.{BQ_DATASET}.{BQ_MODEL}`,
        (SELECT *
        FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`)
    )
)
SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
LEFT JOIN embed
USING (transaction_id)
'''
job = bq.query(query)
job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7f391f7c66b0>

In [106]:
bq.query(f'SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_embedding` LIMIT 5').to_dataframe()

Unnamed: 0,transaction_id,Time,V1,V2,V3,V4,V5,V6,V7,V8,...,V23,V24,V25,V26,V27,V28,Amount,Class,splits,embedding
0,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,165236,2.180149,0.218732,-2.637726,0.348776,1.063546,-1.249197,0.942021,-0.547652,...,-0.176957,0.563779,0.730183,0.707494,-0.131066,-0.090428,0.0,0,TEST,"[0.1768796040377733, 0.0, 0.32871182379322017,..."
1,a92e2777-3576-49ef-93c4-039d8dae0b5c,87579,-0.785344,1.741169,1.346336,3.179615,0.844314,1.683495,-0.260267,-1.369504,...,0.059226,0.190115,-0.505865,0.169853,0.179458,0.227019,0.0,0,TEST,"[0.5155690327565788, 0.0, 0.15862668183034437,..."
2,1259d8f5-8b63-49c1-96e3-a7c4acc11d0d,82965,-2.873991,0.246882,0.09617,0.840097,-0.349486,-0.862593,-0.676568,1.253418,...,-0.444221,0.138491,0.339719,0.694393,-0.447539,-0.490653,0.0,0,TEST,"[0.2800952872260901, 0.0, 0.5907896504687579, ..."
3,1bba1447-cda1-4112-8489-9fee0e43f35e,121735,1.828342,0.291645,0.211882,4.272862,-0.54454,-0.356532,-0.214252,0.007353,...,0.202234,0.91402,-0.04482,0.080333,-0.007181,-0.040065,0.0,0,TEST,"[0.360851194137567, 0.0, 0.17919098323425908, ..."
4,c14fad8a-6f28-475f-8bb7-902d63d44f50,63268,-0.845755,1.339517,1.930412,1.170314,0.865342,-0.067552,0.953945,0.073735,...,-0.432091,0.013305,0.61058,-0.047337,-0.00252,0.049496,0.0,0,TEST,"[0.5457825525180773, 0.0, 0.41189690122358213,..."


### Create The Vector Index

To efficiently do a vector search it can be helpful to create a vector index for the embedding column.  This is not required though as a [brute force search](https://cloud.google.com/bigquery/docs/vector-search#use_the_vector_search_function_with_brute_force) is possible.  

In [None]:
query = f'''
CREATE VECTOR INDEX row_index ON `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_embedding`(embedding)
OPTIONS(index_type = 'IVF', distance_type = 'COSINE')
'''
job = bq.query(query)
job.result()

### Check The Vector Index Status


In [119]:
query = f'''
SELECT *
FROM `{BQ_PROJECT}.{BQ_DATASET}.INFORMATION_SCHEMA.VECTOR_INDEXES`
WHERE index_status = 'ACTIVE'
    AND table_name = '{BQ_TABLE}_embedding'
'''
bq.query(query).to_dataframe()

Unnamed: 0,index_catalog,index_schema,table_name,index_name,index_status,creation_time,last_modification_time,last_refresh_time,disable_time,disable_reason,ddl,coverage_percentage,unindexed_row_count,total_logical_bytes,total_storage_bytes
0,statmike-mlops-349915,fraud,fraud_prepped_embedding,row_index,ACTIVE,2024-05-02 19:23:59.785000+00:00,2024-05-02 19:23:59.785000+00:00,2024-05-04 01:43:46.419000+00:00,NaT,,CREATE VECTOR INDEX `row_index` ON `statmike-m...,100,0,27921310,25426651


### Get Matches With Vector Search

In [135]:
query = f'''
SELECT
    query.transaction_id AS transaction_id,
    base.transaction_id AS match,
    distance
FROM
    VECTOR_SEARCH(
        TABLE `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_embedding`,
        'embedding',
        (SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_embedding` WHERE splits = 'TEST' LIMIT 1),
        top_k => 10,
        distance_type => 'COSINE',
        options => '{{"fraction_lists_to_search": 0.03}}'
    )
ORDER BY distance
'''
result = bq.query(query).to_dataframe()
result

Unnamed: 0,transaction_id,match,distance
0,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,0.0
1,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,f0854157-86dc-469e-9100-7b0298776b82,4.2e-05
2,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,88e42fbb-2a4d-42aa-8122-ff8718856ff4,9e-05
3,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,8e369bc9-5778-4ad6-b24b-d7d88863058d,0.000108
4,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,bf09fbb1-3f49-4717-8011-7df94d2b6845,0.000118
5,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,f104c52d-0e7b-4247-923d-4b0cae01ba32,0.000139
6,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,f8b1c30d-e862-4cb0-90eb-c66186bdfa3a,0.000154
7,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,19f50eec-3812-4fc0-8499-b840168683ca,0.000185
8,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,787bcbf5-3bbb-41ce-910f-0f62d341fbe8,0.000202
9,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,a62a3b93-0380-4c59-b74e-1349cba08012,0.000207


In [136]:
result['transaction_id'][0]

'c8a5b93a-1598-4689-80be-4f9f5df0b8ce'

In [137]:
result['match'][0]

'c8a5b93a-1598-4689-80be-4f9f5df0b8ce'

### Get Matches With Vector Search: Brute Force

Rather than using the index, find the exact nearest neighbor by searching all of the embeddings.

In [138]:
query = f'''
SELECT
    query.transaction_id AS transaction_id,
    ARRAY_AGG(base.transaction_id ORDER BY distance) AS matches
FROM
    VECTOR_SEARCH(
        TABLE `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_embedding`,
        'embedding',
        (SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_embedding` WHERE splits = 'TEST' LIMIT 5),
        top_k => 5,
        options => '{{"use_brute_force":true}}'
    )
GROUP BY transaction_id
'''
result = bq.query(query).to_dataframe()
result

Unnamed: 0,transaction_id,matches
0,c14fad8a-6f28-475f-8bb7-902d63d44f50,"[c14fad8a-6f28-475f-8bb7-902d63d44f50, 747e106..."
1,a92e2777-3576-49ef-93c4-039d8dae0b5c,"[a92e2777-3576-49ef-93c4-039d8dae0b5c, 3e32ef3..."
2,1259d8f5-8b63-49c1-96e3-a7c4acc11d0d,"[1259d8f5-8b63-49c1-96e3-a7c4acc11d0d, 466a84c..."
3,1bba1447-cda1-4112-8489-9fee0e43f35e,"[1bba1447-cda1-4112-8489-9fee0e43f35e, 427d9d2..."
4,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,"[c8a5b93a-1598-4689-80be-4f9f5df0b8ce, f085415..."


In [139]:
result['transaction_id'][0]

'c14fad8a-6f28-475f-8bb7-902d63d44f50'

In [140]:
result['matches'][0]

array(['c14fad8a-6f28-475f-8bb7-902d63d44f50',
       '747e1065-5f62-47ae-ba40-1cdac12e935f',
       'ba0741b4-22ea-4dcb-bcec-3556ba75c812',
       'd289f368-1dae-4e98-b289-0b7010362bcd',
       '5042b5cc-6cf6-43e6-a244-aee5718522de'], dtype=object)

In [141]:
result['matches'][0][0]

'c14fad8a-6f28-475f-8bb7-902d63d44f50'