# Semantic Search & RAG with Amazon OpenSearch Service & SageMaker

Notes:
run on ml.t3.2xlarge

## Trail Guide for Document Search & Summarization

### Suggested Trail Marker Times:

1. 0:30 min - Deploy LLMs (one for embeddings and one+ for text summarization)

2. 0:50 min – Download, Extract, & Ingest arxiv metadata dataset into Pandas Dataframe

3. 1:15 – OpenSearch domain created, index defined/created, and connection made

4. 1:45 – Vectorize the abstract for 10,000 documents using the GPT-J Embedding model

5. 2:00 – Load 10,000 articles into OpenSearch index including abstract_vector, title, categories, update_date, abstract

6. 2:15 – Return Relevant Semantic Search Results in a scored table

7. 2:30 – Return the high level bullet points of the 3 most relevant articles for a search.

(of course feel free to ask one of us if you need a hint)

---
### What are embeddings and vectors?

![word_vector2](word2vec.png)
    

# Setup

### Install OpenSearch ML Python library

In [3]:
!pip install -q boto3
!pip install -q requests
# !pip install -q requests-aws4auth
!pip install -q opensearch-py
!pip install -q tqdm
# !pip install -q boto3
!pip install -q install transformers[torch]
!pip install -q transformers
# !pip install -q sentence-transformers rank_bm25

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpi

# 1. Deploy LLMs (embedding and text summarization)

## Quick Code scripts to deploy LLM models to SageMaker Inference Endpoints
## First, check available Jumpstart Models and use dropdown selector
### LLM Models in Jumpstart (all models from huggingface)


In [None]:
import IPython
from ipywidgets import Dropdown
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
from sagemaker.jumpstart.filters import And

# Mistral 6b embedding model as default
# llm_model_id, llm_model_version = "huggingface-textgeneration-open-llama", "*"
llm_model_id, llm_model_version = "huggingface-llm-mistral-7b", "*"

filter_value = And( "framework == huggingface")
text_generation_models = list_jumpstart_models(filter=filter_value)

llm_dropdown = Dropdown(
    value=llm_model_id,
    options=text_generation_models,
    description="Sagemaker Pre-Trained Text Generation Models:",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)
display(IPython.display.Markdown("## Select a pre-trained Text Generation model from the dropdown below"))
display(llm_dropdown)

### Deploy LLM Model

In [None]:
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.serializers import JSONSerializer

llm_model = JumpStartModel(model_id=llm_dropdown.value, model_version=llm_model_version)#, instance_type='ml.g5.12xlarge')
llm_predictor = llm_model.deploy()

### Jumpstart Embedding Model Dropdown Selector (task == textembedding)

In [None]:
import IPython
from ipywidgets import Dropdown
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
from sagemaker.jumpstart.filters import And

# GPT-J-6b embedding model as default
embed_model_id, embed_model_version = "huggingface-textembedding-gpt-j-6b-fp16", "*"

filter_value = And("task == textembedding", "framework == huggingface")
embedding_models = list_jumpstart_models(filter=filter_value)
embedding_models

embed_dropdown = Dropdown(
    value=embed_model_id,
    options=embedding_models,
    description="Sagemaker Pre-Trained Text Embedding Models:",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)
display(IPython.display.Markdown("## Select a pre-trained embedding model from the dropdown below"))
display(embed_dropdown)

### Deploy Model

In [None]:
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.serializers import JSONSerializer

embedding_model = JumpStartModel(model_id=embed_dropdown.value, model_version=embed_model_version)
embedding_predictor = embedding_model.deploy()

### deploy meta-llama/Llama-2-13b-hf from Jumpstart

In [None]:
# SageMaker JumpStart provides APIs as part of SageMaker SDK that allow you to deploy and fine-tune models in network isolation using scripts that SageMaker maintains.

from sagemaker.jumpstart.model import JumpStartModel


model = JumpStartModel(model_id="meta-textgeneration-llama-2-13b")
example_payloads = model.retrieve_all_examples()

# You must manually accept the end-user license agreement (EULA) to deploy the model.
# accept_eula = False
accept_eula = True

if accept_eula:
    predictor = model.deploy(accept_eula=accept_eula)

    for payload in example_payloads:
        response = predictor.predict(payload.body)
        prompt = payload.body[payload.prompt_key]
        generated_text = response[0]["generated_text"]
        print("\nInput\n", prompt, "\n\nOutput\n\n", generated_text, "\n\n===============")


## Select desired LLM from deployed endpoints

In [27]:
import boto3
from botocore.exceptions import NoCredentialsError, ClientError

def list_sagemaker_endpoints():
    try:
        # Create a SageMaker client
        sagemaker_client = boto3.client('sagemaker')

        # List SageMaker endpoints
        response = sagemaker_client.list_endpoints(
            SortBy='CreationTime',
            SortOrder='Descending'
        )

        return [endpoint['EndpointName'] for endpoint in response.get('Endpoints', [])]

    except NoCredentialsError:
        print("No AWS credentials found. Please configure your AWS credentials.")
        return []
    except ClientError as e:
        print(f"An error occurred: {e}")
        return []

def select_endpoints(all_endpoints):
    print("Available SageMaker Endpoints:")
    for i, endpoint in enumerate(all_endpoints, 1):
        print(f"{i}. {endpoint}")

    llm_index = int(input("Select the number for the LLM model endpoint: ")) - 1
    embedding_index = int(input("Select the number for the Embedding model endpoint: ")) - 1

    return all_endpoints[llm_index], all_endpoints[embedding_index]

# List all available endpoints
all_endpoints = list_sagemaker_endpoints()

# Let the user select the LLM and Embedding endpoints
if all_endpoints:
    llm_endpoint_name, embedding_endpoint_name = select_endpoints(all_endpoints)
    print(f"Selected LLM Endpoint: {llm_endpoint_name}")
    print(f"Selected Embedding Endpoint: {embedding_endpoint_name}")
else:
    print("No endpoints available to select.")

Available SageMaker Endpoints:
1. meta-textgeneration-llama-2-13b-2024-01-15-04-27-30-881
2. huggingface-pytorch-tgi-inference-2024-01-15-03-52-46-512
3. hf-llm-mistral-7b-2024-01-12-22-18-38-063
4. opensearch-gen-ai-llm-falcon-7b-bf16-acf36e80
5. opensearch-gen-ai-embedding-gpt-j-6b-acf36e80
6. RagEnginesSageMakerModelMultiAB24AEndpoint6DA7D681-7QzzdmVCz76E


Select the number for the LLM model endpoint:  3
Select the number for the Embedding model endpoint:  5


Selected LLM Endpoint: hf-llm-mistral-7b-2024-01-12-22-18-38-063
Selected Embedding Endpoint: opensearch-gen-ai-embedding-gpt-j-6b-acf36e80


# 2. Download and explore ARXIV Metadata dataset

links:

- https://huggingface.co/datasets/arxiv_dataset

- https://www.kaggle.com/datasets/Cornell-University/arxiv?resource=download

### Download arxiv dataset from S3 (using cloudfront as a CDN)

In [3]:
import os
import requests

def download_large_file(url, destination):
    """
    Downloads a file from a given URL in chunks and saves it to a destination file.
    """
    
    # Create directory if it does not exist
    os.makedirs(os.path.dirname(destination), exist_ok=True)
    
    with requests.get(url, stream=True) as response:
        response.raise_for_status()
        with open(destination, 'wb') as file:
            for chunk in response.iter_content(chunk_size=8192): 
                # If you have a chunk, write it to file
                if chunk:
                    file.write(chunk)
    return destination

# URL of the large file
url = 'https://d3bd8cre3dxl8.cloudfront.net/archive.zip'

# Destination file path
destination = 'arxiv/archive.zip'

# Download the file
try:
    download_large_file(url, destination)
    print(f'File successfully downloaded to {destination}')
except Exception as e:
    print(f'An error occurred: {e}')


File successfully downloaded to arxiv/archive.zip


### Unzip

In [8]:
import zipfile
import os
import time

# start time
start_time = time.time()

# Path to the downloaded zip file
zip_file_path = 'arxiv/archive.zip'

# Directory to save the extracted file
output_dir = 'arxiv'
output_file = 'arxiv-metadata-oai-snapshot.json'

# Unzip the file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extract(output_file, output_dir)

print(f'File extracted to {output_dir}/{output_file}')

end_time = time.time()
duration = end_time - start_time
print("Time taken: {:.2f} seconds".format(duration))


File extracted to arxiv/arxiv-metadata-oai-snapshot.json
Time taken: 49.61 seconds


### Alternate Download (Optional - No need to repeat if above worked)

In [9]:
# uses private AWS account public S3 bucket
!aws s3 ls --no-sign-request s3://0-arxiv-dataset/ --human-readable --summarize

2024-01-05 21:01:59    3.7 GiB arxiv-metadata-oai-snapshot.json

Total Objects: 1
   Total Size: 3.7 GiB


In [10]:
!aws s3 cp --no-sign-request s3://0-arxiv-dataset/arxiv-metadata-oai-snapshot.json ./arxiv/arxiv-metadata-oai-snapshot.json

download: s3://0-arxiv-dataset/arxiv-metadata-oai-snapshot.json to arxiv/arxiv-metadata-oai-snapshot.json


### Explore the dataset and load into pandas dataframe

In [3]:
# takes about 120 seconds with ml.t3.2xlarge
import pandas as pd
import time

def process_arxiv_data_pandas(file_path):
    start_time = time.time()

    # Read the data line by line
    df = pd.read_json(file_path, lines=True)

    # Show a sample entry
    sample_entry = df.iloc[0]
    print("Sample Entry:\n", sample_entry)

    # Compute the number of entries
    num_entries = df.shape[0]
    print("Total Entries:", num_entries)

    # Compute the maximum length of the abstracts
    max_abstract_length = df['abstract'].str.len().max()
    print("Max Abstract Length:", max_abstract_length)

    end_time = time.time()
    duration = end_time - start_time
    print("Time taken: {:.2f} seconds".format(duration))
    
    return df

# Usage
df = process_arxiv_data_pandas('arxiv/arxiv-metadata-oai-snapshot.json')


Sample Entry:
 id                                                        0704.0001
submitter                                            Pavel Nadolsky
authors           C. Bal\'azs, E. L. Berger, P. M. Nadolsky, C.-...
title             Calculation of prompt diphoton production cros...
comments                    37 pages, 15 figures; published version
journal-ref                                Phys.Rev.D76:013009,2007
doi                                      10.1103/PhysRevD.76.013009
report-no                                          ANL-HEP-PR-07-12
categories                                                   hep-ph
license                                                        None
abstract            A fully differential calculation in perturba...
versions          [{'version': 'v1', 'created': 'Mon, 2 Apr 2007...
update_date                                              2008-11-26
authors_parsed    [[Balázs, C., ], [Berger, E. L., ], [Nadolsky,...
Name: 0, dtype: object
Total Entr

# 3. Create & Deploy Opensearch Service and Vector Index

Opensearch is a fork of the popular ElasticSearch and can be used as a vector database that supports knn search.

https://opensearch.org/docs/latest/search-plugins/knn/index/
https://docs.aws.amazon.com/opensearch-service/latest/developerguide/knn.html

## Create Domain

In [4]:
import boto3
from botocore.exceptions import ClientError
import json
from datetime import datetime

class DateTimeEncoder(json.JSONEncoder):
    """ Custom encoder for encoding datetime objects for JSON serialization. """
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        return json.JSONEncoder.default(self, obj)

def create_opensearch_domain(domain_name, instance_type, instance_count, volume_size):
    client = boto3.client('opensearch')
    sts_client = boto3.client('sts')
    region = boto3.session.Session().region_name

    # Get AWS account ID and user ARN
    identity_info = sts_client.get_caller_identity()
    account_id = identity_info["Account"]
    user_arn = identity_info["Arn"]

    # Define an access policy with the current user's ARN
    access_policy = {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": user_arn
                },
                "Action": "es:*",
                "Resource": f"arn:aws:es:{region}:{account_id}:domain/{domain_name}/*"
            },
            # Additional statements for other services like SageMaker
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "sagemaker.amazonaws.com"
                },
                "Action": "es:ESHttp*",
                "Resource": f"arn:aws:es:{region}:{account_id}:domain/{domain_name}/*"
            }
            # ... additional statements as needed ...
        ]
    }

    try:
        response = client.create_domain(
            DomainName=domain_name,
            EngineVersion='OpenSearch_2.11',  # Specify your desired version
            ClusterConfig={
                'InstanceType': instance_type,
                'InstanceCount': instance_count,
                'DedicatedMasterEnabled': False,
                'ZoneAwarenessEnabled': False,
            },
            EBSOptions={
                'EBSEnabled': True,
                'VolumeType': 'gp2',
                'VolumeSize': volume_size,
            },
            AccessPolicies=json.dumps(access_policy),
            NodeToNodeEncryptionOptions={
                'Enabled': True
            },
            EncryptionAtRestOptions={
                'Enabled': True
            }
        )
        print(f"Domain {domain_name} created.")
        return response
    except ClientError as e:
        print(f"An error occurred: {e}")
        return None

# Example usage
domain_name = 'my-opensearch-domain'
instance_type = 'c5.large.search'  # Example instance type
instance_count = 1
volume_size = 10  # in GB

response = create_opensearch_domain(domain_name, instance_type, instance_count, volume_size)

if response:
    # Use the custom DateTimeEncoder for json.dumps
    pretty_response = json.dumps(response, indent=4, cls=DateTimeEncoder)
    print("Domain creation initiated:\n", pretty_response)


Domain my-opensearch-domain3 created.
Domain creation initiated:
 {
    "ResponseMetadata": {
        "RequestId": "2bc5be72-7e97-4ce2-b6c1-68f4587ed67f",
        "HTTPStatusCode": 200,
        "HTTPHeaders": {
            "x-amzn-requestid": "2bc5be72-7e97-4ce2-b6c1-68f4587ed67f",
            "content-type": "application/json",
            "content-length": "2730",
            "date": "Tue, 16 Jan 2024 21:26:04 GMT"
        },
        "RetryAttempts": 0
    },
    "DomainStatus": {
        "DomainId": "730335522976/my-opensearch-domain3",
        "DomainName": "my-opensearch-domain3",
        "ARN": "arn:aws:es:us-east-1:730335522976:domain/my-opensearch-domain3",
        "Created": true,
        "Deleted": false,
        "Processing": true,
        "UpgradeProcessing": false,
        "EngineVersion": "OpenSearch_2.11",
        "ClusterConfig": {
            "InstanceType": "c5.large.search",
            "InstanceCount": 1,
            "DedicatedMasterEnabled": false,
            "Zon

## Check Opensearch Domain Status

In [6]:
import boto3
import time

def list_opensearch_domains(client):
    response = client.list_domain_names()
    domain_names = [domain['DomainName'] for domain in response['DomainNames']]
    return domain_names

def check_opensearch_cluster_status(client, cluster_name):
    start_time = time.time()

    while True:
        response = client.describe_domain(DomainName=cluster_name)
        status = response['DomainStatus']['Processing']

        if not status:
            end_time = time.time()
            elapsed_time = end_time - start_time
            print(f"Cluster '{cluster_name}' is ready!")
            print(f"Total time elapsed: {elapsed_time:.2f} seconds")
            break
        else:
            print(f"Cluster '{cluster_name}' is still being set up. Checking again in 60 seconds.")
            time.sleep(60)

if __name__ == "__main__":
    client = boto3.client('opensearch')

    print("Fetching list of OpenSearch domains...")
    domains = list_opensearch_domains(client)

    if domains:
        print("Available OpenSearch Domains:")
        for idx, domain in enumerate(domains, start=1):
            print(f"{idx}. {domain}")

        choice = int(input("Enter the number of the domain you want to check: ")) - 1

        if 0 <= choice < len(domains):
            selected_domain = domains[choice]
            check_opensearch_cluster_status(client, selected_domain)
        else:
            print("Invalid selection.")
    else:
        print("No OpenSearch domains available.")

Fetching list of OpenSearch domains...
Available OpenSearch Domains:
1. opensearchservi-28hd2an8kq9o
2. my-opensearch-domain3
3. my-opensearch-domain


Enter the number of the domain you want to check:  2


Cluster 'my-opensearch-domain3' is still being set up. Checking again in 60 seconds.
Cluster 'my-opensearch-domain3' is still being set up. Checking again in 60 seconds.
Cluster 'my-opensearch-domain3' is still being set up. Checking again in 60 seconds.
Cluster 'my-opensearch-domain3' is still being set up. Checking again in 60 seconds.
Cluster 'my-opensearch-domain3' is still being set up. Checking again in 60 seconds.
Cluster 'my-opensearch-domain3' is still being set up. Checking again in 60 seconds.
Cluster 'my-opensearch-domain3' is still being set up. Checking again in 60 seconds.
Cluster 'my-opensearch-domain3' is still being set up. Checking again in 60 seconds.
Cluster 'my-opensearch-domain3' is still being set up. Checking again in 60 seconds.
Cluster 'my-opensearch-domain3' is still being set up. Checking again in 60 seconds.
Cluster 'my-opensearch-domain3' is still being set up. Checking again in 60 seconds.
Cluster 'my-opensearch-domain3' is still being set up. Checking a

## Retrieve Opensearch Endpoints

In [None]:
import boto3
from botocore.exceptions import ClientError

def list_opensearch_domains():
    client = boto3.client('opensearch')

    domain_summary = []

    try:
        # List all domain names
        domain_names_response = client.list_domain_names()
        domain_names = [domain_info['DomainName'] for domain_info in domain_names_response['DomainNames']]

        for domain_name in domain_names:
            # Describe each domain
            domain_response = client.describe_domain(DomainName=domain_name)
            domain_status = domain_response['DomainStatus']

            # Determine domain operational status
            if domain_status.get('Created', False) and not domain_status.get('Deleted', False):
                if domain_status.get('Processing', False):
                    operational_status = 'Waiting'
                else:
                    operational_status = 'Ready'
            else:
                operational_status = 'Not available or deleted'

            # Get endpoint information
            endpoint = domain_status.get('Endpoint') or domain_status.get('Endpoints', 'Not available')

            # Store domain summary information
            domain_summary.append({
                "Domain Name": domain_name,
                "Status": operational_status,
                "Endpoint": endpoint
            })

    except ClientError as e:
        print(f"An error occurred: {e}")
        return

    # Display the summary as a formatted table
    print(f"{'Domain Name':30s} {'Status':20s} {'Endpoint'}")
    print('-' * 80)  # Print a separator line
    for domain in domain_summary:
        print(f"{domain['Domain Name']:30s} {domain['Status']:20s} {domain['Endpoint']}")
        aos_host = domain['Endpoint']
    return aos_host

# Example usage
aos_host = list_opensearch_domains()
print("aos_host:", aos_host)


## Create an OpenSearch cluster connection.
Next, we'll use Python API to set up connection with Amazon Opensearch Service domain.

Note: if you're using a region other than us-east-1, please update the region in the code below.

In [4]:
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import boto3

print (aos_host)

def create_opensearch_connection(aos_host, region='us-east-1'):
    """
    Create a connection to an OpenSearch cluster from AWS SageMaker Studio.

    :param aos_host: The hostname of the OpenSearch cluster.
    :param region: AWS region where the OpenSearch cluster is located. Default is 'us-east-1'.
    :return: An OpenSearch client object.
    """
    # Retrieve AWS credentials automatically
    credentials = boto3.Session().get_credentials()
    auth = AWSV4SignerAuth(credentials, region)

    aos_client = OpenSearch(
        hosts=[{'host': aos_host, 'port': 443}],
        http_auth=auth,
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection
    )

    return aos_client

# Example usage
# aos_host = 'your-opensearch-cluster-endpoint'
aos_client = create_opensearch_connection(aos_host)
print(aos_client)

search-my-opensearch-domain-nnbajtgulf76kasehvbmxzzjbu.us-east-1.es.amazonaws.com
<OpenSearch([{'host': 'search-my-opensearch-domain-nnbajtgulf76kasehvbmxzzjbu.us-east-1.es.amazonaws.com', 'port': 443}])>


## Define and create an index in Amazon Opensearch Service 
We are defining an index with english analyzer which will strip the common stopwords like `the`, `is`, `a`, `an`, etc..

We will use the aos_client connection we initiated ealier to create an index in Amazon OpenSearch Service

We'll define the index with 4 fields: the first field 'knn_vector' holds the vector representation of the abstract, the second is the "title", third field is "categories", and fourth is "update_date".

To create the index, we first define the index in JSON, then use the aos_client connection we initiated ealier to create the index in OpenSearch.

### Define index with GPT-J-6b embeddings (4096 dimensions in vector)

In [6]:
knn_index = {
    "settings": {
        "index.knn": True,
        "index.knn.space_type": "cosinesimil",
        "analysis": {
            "analyzer": {
                "default": {
                    "type": "standard",
                    "stopwords": "_english_"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "abstract_vector": {
                "type": "knn_vector",
                "dimension": 4096,
                "store": True
            },
            "title": {
                "type": "text",
                "store": True
            },
            "categories": {
                "type": "text",
                "store": True
            },
            "update_date": {
                "type": "date",
                "store": True
            },
            "abstract": {
                "type": "text",
                "store": True
            }
        }
    }
}

### (Optional - skip if using different Embedding) Define index with BERT embeddings (658 dimensions in vector)

In [15]:
knn_index = {
    "settings": {
        "index.knn": True,
        "index.knn.space_type": "cosinesimil",
        "analysis": {
            "analyzer": {
                "default": {
                    "type": "standard",
                    "stopwords": "_english_"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "abstract_vector": {
                "type": "knn_vector",
                "dimension": 768,
                "store": True
            },
            "title": {
                "type": "text",
                "store": True
            },
            "categories": {
                "type": "text",
                "store": True
            },
            "update_date": {
                "type": "date",
                "store": True
            }
        }
    }
}

### (Optional - skip if using different Embedding) Define Index with Titan Embeddings (1536 dimensions in vector)

In [5]:
1536
knn_index = {
    "settings": {
        "index.knn": True,
        "index.knn.space_type": "cosinesimil",
        "analysis": {
            "analyzer": {
                "default": {
                    "type": "standard",
                    "stopwords": "_english_"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "abstract_vector": {
                "type": "knn_vector",
                "dimension": 1536,
                "store": True
            },
            "title": {
                "type": "text",
                "store": True
            },
            "categories": {
                "type": "text",
                "store": True
            },
            "update_date": {
                "type": "date",
                "store": True
            }
        }
    }
}

## Create index in Opensearch

If for any reason you need to recreate your dataset, you can uncomment and execute the following to delete any previously created indexes. If this is the first time you're running this, you can skip this step.

In [7]:
# aos_client.indices.delete(index="metadata_arxiv")
# df = pd.DataFrame()

{'acknowledged': True}

Using the above index definition, we now need to create the index in Amazon OpenSearch

In [8]:
index_name = "metadata_arxiv"
aos_client.indices.create(index=index_name,body=knn_index,ignore=400)

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'metadata_arxiv'}

Let's verify the created index information

In [9]:
aos_client.indices.get(index=index_name)

{'metadata_arxiv': {'aliases': {},
  'mappings': {'properties': {'abstract': {'type': 'text', 'store': True},
    'abstract_vector': {'type': 'knn_vector',
     'store': True,
     'dimension': 4096},
    'categories': {'type': 'text', 'store': True},
    'title': {'type': 'text', 'store': True},
    'update_date': {'type': 'date', 'store': True}}},
  'settings': {'index': {'replication': {'type': 'DOCUMENT'},
    'number_of_shards': '5',
    'provided_name': 'metadata_arxiv',
    'knn.space_type': 'cosinesimil',
    'knn': 'true',
    'creation_date': '1705293837108',
    'analysis': {'analyzer': {'default': {'type': 'standard',
       'stopwords': '_english_'}}},
    'number_of_replicas': '1',
    'uuid': 'q-CSBsHAQqaRS7uf_Nv2yQ',
    'version': {'created': '136327827'}}}}}

# 4. Generate embeddings of the Article's abstract

Lets try GPT-J-6b embeddings to generate vector representations of the abstract data for each article

Please note that GPT-J-6b only allows 2048 tokens and out maximimum abstract length was 6091 characters, where a token can be:
1. Whole words: Especially for common words.
2. Subwords or parts of words: For less common words, which might be broken down into smaller, more frequent subwords.
3. Individual characters: Particularly for rare words or unusual character combinations that aren't captured by whole words or subwords in the model's vocabulary.

## Employ Embedding Model

### List SageMaker Endpoints

In [11]:
import boto3
from botocore.exceptions import NoCredentialsError, ClientError

# Get the current AWS region
aws_region = boto3.session.Session().region_name
print(f"Current AWS Region: {aws_region}")

def list_sagemaker_endpoints():
    try:
        # Create a SageMaker client
        sagemaker_client = boto3.client('sagemaker')

        # List SageMaker endpoints
        response = sagemaker_client.list_endpoints(
            SortBy='CreationTime',
            SortOrder='Descending'
        )

        return [endpoint['EndpointName'] for endpoint in response.get('Endpoints', [])]

    except NoCredentialsError:
        print("No AWS credentials found. Please configure your AWS credentials.")
        return []
    except ClientError as e:
        print(f"An error occurred: {e}")
        return []

def select_endpoints(all_endpoints):
    print("Available SageMaker Endpoints:")
    for i, endpoint in enumerate(all_endpoints, 1):
        print(f"{i}. {endpoint}")

    llm_index = int(input("Select the number for the LLM model endpoint: ")) - 1
    embedding_index = int(input("Select the number for the Embedding model endpoint: ")) - 1

    return all_endpoints[llm_index], all_endpoints[embedding_index]

# List all available endpoints
all_endpoints = list_sagemaker_endpoints()

# Let the user select the LLM and Embedding endpoints
if all_endpoints:
    llm_endpoint_name, embedding_endpoint_name = select_endpoints(all_endpoints)
    print(f"Selected LLM Endpoint: {llm_endpoint_name}")
    print(f"Selected Embedding Endpoint: {embedding_endpoint_name}")
else:
    print("No endpoints available to select.")

Current AWS Region: us-east-1
Available SageMaker Endpoints:
1. meta-textgeneration-llama-2-13b-2024-01-15-04-27-30-881
2. huggingface-pytorch-tgi-inference-2024-01-15-03-52-46-512
3. hf-llm-mistral-7b-2024-01-12-22-18-38-063
4. opensearch-gen-ai-llm-falcon-7b-bf16-acf36e80
5. opensearch-gen-ai-embedding-gpt-j-6b-acf36e80
6. RagEnginesSageMakerModelMultiAB24AEndpoint6DA7D681-7QzzdmVCz76E


Select the number for the LLM model endpoint:  1
Select the number for the Embedding model endpoint:  5


Selected LLM Endpoint: meta-textgeneration-llama-2-13b-2024-01-15-04-27-30-881
Selected Embedding Endpoint: opensearch-gen-ai-embedding-gpt-j-6b-acf36e80


## Lets try vectorizing the abstract on a subset of 1000 articles

### Using GPT-J-6B (max 2048 tokens)

In [12]:
# 1000 articles in ~92 seconds
import boto3
import pandas as pd
import time
import json

no_of_articles = 1000
start_time = time.time()

# Name of your SageMaker endpoint
# Load or define env variables
if embedding_endpoint_name:
    print(embedding_endpoint_name)
else:
    embedding_endpoint_name = "your-sagemaker-endpoint-name"
    
if aws_region:
    print(aws_region)
else:
    aws_region = "your-aws-region"

# Establish a connection to the SageMaker runtime
client = boto3.client('runtime.sagemaker', region_name=aws_region)

def vectorize_text_sagemaker(text):
    # Prepare the payload with the expected parameter names
    payload = json.dumps({"text_inputs": text})  # text should already be a list of strings

    # Sending the text to the SageMaker endpoint
    response = client.invoke_endpoint(EndpointName=embedding_endpoint_name,
                                      ContentType='application/json',
                                      Body=payload)
    # Receiving the response and converting it
    response_body = json.loads(response['Body'].read().decode())
    return response_body['embedding'][0]

# Select the subset of your DataFrame
df_subset = df.head(no_of_articles)

# Apply the vectorization function
vectorized_abstracts = df_subset['abstract'].apply(vectorize_text_sagemaker)

print("Vectorized Abstracts Successfully")

end_time = time.time()
time_vectorization = end_time - start_time
print(f"Time taken for vectorization: {time_vectorization} seconds")
print(vectorized_abstracts)

opensearch-gen-ai-embedding-gpt-j-6b-acf36e80
us-east-1
Vectorized Abstracts Successfully
Time taken for vectorization: 92.34969973564148 seconds
0      [0.007603085599839687, 0.004107543732970953, 0...
1      [0.006115400698035955, 0.0024362001568078995, ...
2      [0.016317538917064667, -0.003693857230246067, ...
3      [0.009021581150591373, -0.011483951471745968, ...
4      [0.00858279038220644, -0.0020114851649850607, ...
                             ...                        
995    [-0.0017040536040440202, 0.01018516719341278, ...
996    [0.020026983693242073, 0.008944433182477951, -...
997    [0.0017824446549639106, 0.0012971345568075776,...
998    [-0.014983518049120903, 0.00807292852550745, 0...
999    [0.014853671193122864, 0.010220940224826336, 0...
Name: abstract, Length: 1000, dtype: object


### Using BERT (max 512 tokens)

In [88]:
# 100 articles takes ~16 seconds
from transformers import BertTokenizer, BertModel
import torch

no_of_articles = 100
start_time = time.time()

# Initialize the BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

def vectorize_text(text):
    inputs = tokenizer(text, return_tensors='pt', truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).squeeze().numpy()

# Vectorize the first 10 abstracts
df_subset = df.head(no_of_articles)
vectorized_abstracts = df_subset['abstract'].apply(vectorize_text)
bert_vectors = vectorized_abstracts
print("Vectorized Abstracts Successfully")

end_time = time.time()
time_vectorization = end_time - start_time
print(f"Time taken for vectorization: {time_vectorization} seconds")
print(vectorized_abstracts)

Vectorized Abstracts Successfully
Time taken for vectorization: 16.427900075912476 seconds
0     [-0.3771481, -0.048133407, 0.34441498, -0.1936...
1     [-0.40378582, 0.014585821, 0.15648036, -0.2119...
2     [-0.41898176, 0.16085128, 0.643277, -0.0904780...
3     [-0.19868158, -0.043408114, 0.02741245, -0.195...
4     [-0.12638369, 0.19934435, 0.40047324, -0.39668...
                            ...                        
95    [-0.40263885, 0.017360615, 0.21860191, -0.2216...
96    [-0.35702378, -0.053870752, -0.10470553, -0.47...
97    [-0.24074006, -0.13507345, 0.1704656, -0.20295...
98    [-0.18630911, 0.1520963, 0.47631258, -0.330984...
99    [-0.36923856, -0.0116866045, 0.2750462, -0.330...
Name: abstract, Length: 100, dtype: object


### Using Titan Embeddings and Bedrock (8192 max tokens)

In [9]:
# 100 articles takes ~16 seconds
import os
import json
import boto3
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import time

no_of_articles = 100
start_time = time.time()

print(boto3.__version__)

# Initialize AWS Session for Titan Embeddings
session = boto3.Session(
    profile_name=os.environ.get("BWB_PROFILE_NAME")
)
bedrock = boto3.client(service_name='bedrock-runtime')

def get_titan_embedding(text):
    try:
        body = json.dumps({"inputText": text})
        model_id = 'amazon.titan-embed-text-v1'
        mime_type = 'application/json'
        response = bedrock.invoke_model(body=body, modelId=model_id, accept=mime_type, contentType=mime_type)
        response_body = json.loads(response.get('body').read())
        
        # Check if embedding is in the response
        if 'embedding' in response_body:
            embedding = response_body.get('embedding')
            return np.array(embedding)
        else:
            print("Embedding not found in the response")
            return None
    except Exception as e:
        print(f"Error in getting Titan embedding: {e}")
        return None

# Function to vectorize text using Titan Embeddings
def vectorize_text(text):
    embedding = get_titan_embedding(text)
    return np.array(embedding)

# Assuming 'df' is a DataFrame with an 'abstract' column
df_subset = df.head(no_of_articles)
vectorized_abstracts = df_subset['abstract'].apply(get_titan_embedding)
titan_vectors = vectorized_abstracts
print("Vectorized Abstracts Successfully")

end_time = time.time()
time_vectorization = end_time - start_time
print(f"Time taken for vectorization: {time_vectorization} seconds")
print(vectorized_abstracts)

1.33.9
Vectorized Abstracts Successfully
Time taken for vectorization: 16.373608589172363 seconds
0     [0.064941406, 0.77734375, -0.08203125, -0.1796...
1     [0.18457031, 0.5234375, 0.080078125, 0.0363769...
2     [0.63671875, -0.18457031, 0.17773438, -0.41210...
3     [-0.2578125, 0.032226562, -0.16113281, -0.1337...
4     [0.390625, 0.171875, -0.09716797, -0.12207031,...
                            ...                        
95    [-0.010864258, 0.032226562, -0.1171875, -0.218...
96    [-0.33203125, 0.22363281, -0.20117188, 0.03857...
97    [-0.73828125, -0.06542969, 0.30859375, 0.19042...
98    [0.38085938, 0.27539062, -0.39648438, 0.294921...
99    [0.5546875, 0.08544922, -0.27148438, -0.057617...
Name: abstract, Length: 100, dtype: object


### Create a new DataFrame with the vectorized abstract

In [13]:
start_time = time.time()

new_df = pd.DataFrame({
    'vectorized_abstract': vectorized_abstracts,
    'title': df_subset['title'],
    'categories': df_subset['categories'],
    'update_date': df_subset['update_date'],
    'abstract': df_subset['abstract']
})

print(new_df)

end_time = time.time()
time_dataframe_creation = end_time - start_time
print(f"Time taken to create new DataFrame: {time_dataframe_creation} seconds")


                                   vectorized_abstract  \
0    [0.007603085599839687, 0.004107543732970953, 0...   
1    [0.006115400698035955, 0.0024362001568078995, ...   
2    [0.016317538917064667, -0.003693857230246067, ...   
3    [0.009021581150591373, -0.011483951471745968, ...   
4    [0.00858279038220644, -0.0020114851649850607, ...   
..                                                 ...   
995  [-0.0017040536040440202, 0.01018516719341278, ...   
996  [0.020026983693242073, 0.008944433182477951, -...   
997  [0.0017824446549639106, 0.0012971345568075776,...   
998  [-0.014983518049120903, 0.00807292852550745, 0...   
999  [0.014853671193122864, 0.010220940224826336, 0...   

                                                 title       categories  \
0    Calculation of prompt diphoton production cros...           hep-ph   
1             Sparsity-certifying Graph Decompositions    math.CO cs.CG   
2    The evolution of the Earth-Moon system based o...   physics.gen-ph   
3  

### Optional: Clean rows with missing or null values

In [14]:
import pandas as pd
import time


start_time = time.time()

# Number of rows before removing missing/null values
initial_row_count = new_df.shape[0]

# Remove rows with missing or null values
cleaned_df = new_df.dropna()

# Number of rows after removing missing/null values
final_row_count = cleaned_df.shape[0]

# Calculate the number of rows removed
rows_removed = initial_row_count - final_row_count

print(cleaned_df)

end_time = time.time()
time_dataframe_creation = end_time - start_time

print(f"Number of rows removed: {rows_removed}")
print(f"Total remaining rows: {final_row_count}")
print(f"Time taken to create and clean new DataFrame: {time_dataframe_creation} seconds")



                                   vectorized_abstract  \
0    [0.007603085599839687, 0.004107543732970953, 0...   
1    [0.006115400698035955, 0.0024362001568078995, ...   
2    [0.016317538917064667, -0.003693857230246067, ...   
3    [0.009021581150591373, -0.011483951471745968, ...   
4    [0.00858279038220644, -0.0020114851649850607, ...   
..                                                 ...   
995  [-0.0017040536040440202, 0.01018516719341278, ...   
996  [0.020026983693242073, 0.008944433182477951, -...   
997  [0.0017824446549639106, 0.0012971345568075776,...   
998  [-0.014983518049120903, 0.00807292852550745, 0...   
999  [0.014853671193122864, 0.010220940224826336, 0...   

                                                 title       categories  \
0    Calculation of prompt diphoton production cros...           hep-ph   
1             Sparsity-certifying Graph Decompositions    math.CO cs.CG   
2    The evolution of the Earth-Moon system based o...   physics.gen-ph   
3  

# 5. Load the Data into OpenSearch

### Should work for all embedding models (GPT-J, BERT, Titan)

In [15]:
# 1000 records in ~23 seconds
start_time = time.time()
import numpy as np

def index_dataframe(aos_client, index_name, dataframe):
    for index, row in dataframe.iterrows():
        # Convert to list if it's a numpy array
        vector = row['vectorized_abstract']
        if isinstance(vector, np.ndarray):
            vector = vector.tolist()

        # Print the document for the first few rows to debug
        # if index < 1:
            # print(f"Document at index {index}: {vector}")

        # Skip if the vector is null or empty
        if vector is None or len(vector) == 0:
            print(f"Skipping index {index} due to null or empty vector")
            continue

        document = {
            "abstract_vector": vector,
            "title": row['title'],
            "categories": row['categories'],
            "update_date": row['update_date'],
            "abstract": row['abstract']
        }

        try:
            aos_client.index(index=index_name, body=document)
            if index % 100 == 0:
                print(f"Indexed {index} documents")
        except Exception as e:
            print(f"Error indexing document at index {index}: {e}")

# Example usage
index_dataframe(aos_client, index_name, new_df)

# timing
end_time = time.time()
print(f"Time taken to index data in OpenSearch: {end_time - start_time} seconds")

Indexed 0 documents
Indexed 100 documents
Indexed 200 documents
Indexed 300 documents
Indexed 400 documents
Indexed 500 documents
Indexed 600 documents
Indexed 700 documents
Indexed 800 documents
Indexed 900 documents
Time taken to index data in OpenSearch: 23.23329448699951 seconds


### To validate the load, we'll query the number of documents number in the index. We should have 1000 hits in the index.

In [16]:
res = aos_client.search(index=index_name, body={"query": {"match_all": {}}})
print("Records found: %d." % res['hits']['total']['value'])

Records found: 1000.


# 6. Return Relevant Semantic Search Results in a scored table

## Generate vector for user input query
Next, we'll use the same bert helper function to translate our input question into a vector.

### GPT-J-6b vectorized Search Query

In [None]:
query_raw_sentences = ['We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.']
# looking for this article: https://arxiv.org/pdf/1706.03762.pdf
search_vector = vectorize_text_sagemaker(query_raw_sentences)
# search_vector = vectorize_text(query_raw_sentences)[0].tolist()
search_vector

### BERT vectorized Search Query (Optional - skip if not using)

In [None]:
query_raw_sentences = ['We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.']
# looking for this article: https://arxiv.org/pdf/1706.03762.pdf
search_vector = vectorize_text(query_raw_sentences)
# search_vector = vectorize_text(query_raw_sentences)[0].tolist()
search_vector

### Titan Embedding Search Query ((Optional - skip if not using)

In [14]:
# Single sentence as a string
query_sentence = 'We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.'
# Get embedding for the single sentence
search_vector = get_titan_embedding(query_sentence)

print(search_vector)

[0.17871094 0.01312256 0.55078125 ... 0.2890625  0.75       0.2890625 ]


## Search vector with "Semantic Search"
Now that we have vector in OpenSearch and a vector for our query question, let's perform a KNN search in OpenSearch.

In [18]:
query = {
    "size": 30,
    "query": {
        "knn": {
            "abstract_vector": {
                "vector": search_vector,
                "k": 30
            }
        }
    }
}

res = aos_client.search(index="metadata_arxiv", body=query, stored_fields=["title", "categories", "abstract"])
print(f"Got {res['hits']['total']['value']} Hits:")
query_result = []

for hit in res['hits']['hits']:
    row = [hit['_id'], hit['_score'], hit['fields']['title'][0], hit['fields']['categories'][0],  hit['fields']['abstract'][0]]
    query_result.append(row)

query_result_df = pd.DataFrame(data=query_result, columns=["_id", "_score", "title", "categories", "abstract"])
display(query_result_df)

Got 150 Hits:


Unnamed: 0,_id,_score,title,categories,abstract
0,odp_C40B7yV8sqPy8FM_,0.774213,Architecture for Pseudo Acausal Evolvable Embe...,cs.NE cs.AI,Advances in semiconductor technology are con...
1,N9p_C40B7yV8sqPy01Ix,0.77005,Algorithm for anisotropic diffusion in hydroge...,cond-mat.soft,In this paper I describe a specialized algor...
2,Btp_C40B7yV8sqPypFDs,0.769617,On-line Viterbi Algorithm and Its Relationship...,cs.DS,"In this paper, we introduce the on-line Vite..."
3,ttp_C40B7yV8sqPy3FK9,0.769492,Some combinatorial aspects of differential ope...,math.DG math.CA,In this paper we present a recurrent relatio...
4,UNp_C40B7yV8sqPy1VIJ,0.769154,Behavioral response to strong aversive stimuli...,q-bio.NC,In this paper a theoretical model of functio...
5,s9p_C40B7yV8sqPy3FKI,0.768879,A note on higher-order differential operations,math.DG math.CA,In this paper we consider successive iterati...
6,odp_C40B7yV8sqPyyFFT,0.768208,Contrasting Two Transformation-Based Methods f...,math.OC,In this note we contrast two transformation-...
7,1Np_C40B7yV8sqPytlDW,0.763659,Automated Generation of Layout and Control for...,quant-ph,We present a computer-aided design flow for ...
8,-9p_C40B7yV8sqPypE8o,0.763191,Visualizing Teleportation,physics.ed-ph quant-ph,A novel way of picturing the processing of q...
9,Htp_C40B7yV8sqPy0VJX,0.761483,Evolutionary Neural Gas (ENG): A Model of Self...,physics.gen-ph q-bio.PE,Despite their claimed biological plausibilit...


# 7. LLM Summarize and pull out key information

## Select desired LLM from deployed endpoints

In [27]:
import boto3
from botocore.exceptions import NoCredentialsError, ClientError

def list_sagemaker_endpoints():
    try:
        # Create a SageMaker client
        sagemaker_client = boto3.client('sagemaker')

        # List SageMaker endpoints
        response = sagemaker_client.list_endpoints(
            SortBy='CreationTime',
            SortOrder='Descending'
        )

        return [endpoint['EndpointName'] for endpoint in response.get('Endpoints', [])]

    except NoCredentialsError:
        print("No AWS credentials found. Please configure your AWS credentials.")
        return []
    except ClientError as e:
        print(f"An error occurred: {e}")
        return []

def select_endpoints(all_endpoints):
    print("Available SageMaker Endpoints:")
    for i, endpoint in enumerate(all_endpoints, 1):
        print(f"{i}. {endpoint}")

    llm_index = int(input("Select the number for the LLM model endpoint: ")) - 1
    embedding_index = int(input("Select the number for the Embedding model endpoint: ")) - 1

    return all_endpoints[llm_index], all_endpoints[embedding_index]

# List all available endpoints
all_endpoints = list_sagemaker_endpoints()

# Let the user select the LLM and Embedding endpoints
if all_endpoints:
    llm_endpoint_name, embedding_endpoint_name = select_endpoints(all_endpoints)
    print(f"Selected LLM Endpoint: {llm_endpoint_name}")
    print(f"Selected Embedding Endpoint: {embedding_endpoint_name}")
else:
    print("No endpoints available to select.")

Available SageMaker Endpoints:
1. meta-textgeneration-llama-2-13b-2024-01-15-04-27-30-881
2. huggingface-pytorch-tgi-inference-2024-01-15-03-52-46-512
3. hf-llm-mistral-7b-2024-01-12-22-18-38-063
4. opensearch-gen-ai-llm-falcon-7b-bf16-acf36e80
5. opensearch-gen-ai-embedding-gpt-j-6b-acf36e80
6. RagEnginesSageMakerModelMultiAB24AEndpoint6DA7D681-7QzzdmVCz76E


Select the number for the LLM model endpoint:  3
Select the number for the Embedding model endpoint:  5


Selected LLM Endpoint: hf-llm-mistral-7b-2024-01-12-22-18-38-063
Selected Embedding Endpoint: opensearch-gen-ai-embedding-gpt-j-6b-acf36e80


In [51]:
import boto3
import json
import re

# Create a SageMaker runtime client
sagemaker_runtime = boto3.client('sagemaker-runtime')

# Assuming 'llm_endpoint_name' contains the name of your endpoint
# llm_endpoint_name is defined when endpoints are queried

# Now let's generate new queries for each of the top 3 articles
for index, row in query_result_df.iterrows():
    # Break the loop after processing 3 rows
    if index >= 3:
        break

    question = 'Please provide executive bullet points of the previous text'
    # question = 'Please provide one sentence answers for who, what, when, where, and why of the previous text?'
    doc = row['abstract']  # Assuming the 'abstract' field contains the text of the article
    prompt = f"""Answer based on context:\n\n{doc}\n\n{question}"""
    
    # default payload
    # payload = {
    # "inputs": prompt,
    # "parameters": {
    #     "max_new_tokens": 1024,
    #     "top_k":50,
    #     "top_p":0.95,
    #     "do_sample": True,
    #     "stop": ['A:']
    # }
    # }
    
    # Prepare the payload for the language model
    payload = json.dumps({
        "inputs": prompt,
        "parameters": {
            "max_new_tokens": 128, 
            "do_sample": True, 
            "temperature": 0.2
        }
    })

    # Make the prediction request
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=llm_endpoint_name,
        ContentType='application/json',
        Body=payload
    )
            
    # Decode the response
    response_body = json.loads(response['Body'].read().decode())

    # Iterate through the response and print each item
    print(f"Response for Article {index + 1}:")
    for item in response_body:
        for key, value in item.items():
            # Print the key
            print(f"{key}:")

            # Clean up the value by removing redundant new lines and spaces
            cleaned_value = re.sub(r'\n\s*\n', '\n', value.strip())

            # Print the cleaned value
            print(cleaned_value, "\n\n")



Response for Article 1:
generated_text:
:
  - Advances in semiconductor technology are contributing to the increasing
complexity in the design of embedded systems.
  - Architectures with novel techniques such as evolvable nature and
autonomous behavior have engrossed lot of attention.
  - This paper demonstrates conceptually evolvable embedded systems can be
characterized basing on acausal nature.
  - It is noted that in acausal systems, future input needs to be known, here we
make a mechanism such that the system predicts the future inputs and exhibits
pseudo acausal nature 


Response for Article 2:
generated_text:
:
  * In this paper I describe a specialized algorithm for anisotropic diffusion
determined by a field of transition rates.
  * The algorithm can be used to describe some interesting forms of diffusion
that occur in the study of proton motion in a network of hydrogen bonds.
  * The algorithm produces data that require a nonstandard method of spectral
analysis which is also

# Bonus Bedrock

## Bedrock list all available models

In [None]:
!aws bedrock list-foundation-models --query "modelSummaries[*].modelId"

## Bedrock AI21 J2 Ultra v1

### Sample Query

In [78]:
import os
import json
import boto3

session = boto3.Session(
    profile_name=os.environ.get("BWB_PROFILE_NAME")
) #sets the profile name to use for AWS credentials

bedrock = session.client(
    service_name='bedrock-runtime', #creates a Bedrock client
    region_name=os.environ.get("BWB_REGION_NAME"),
    endpoint_url=os.environ.get("BWB_ENDPOINT_URL")
) 

bedrock_model_id = "ai21.j2-ultra-v1" #set the foundation model

prompt = "What is the largest city in New Hampshire?" #the prompt to send to the model

body = json.dumps({
    "prompt": prompt, #AI21
    "maxTokens": 1024, 
    "temperature": 0, 
    "topP": 0.5, 
    "stopSequences": [], 
    "countPenalty": {"scale": 0 }, 
    "presencePenalty": {"scale": 0 }, 
    "frequencyPenalty": {"scale": 0 }
}) #build the request payload


response = bedrock.invoke_model(body=body, modelId=bedrock_model_id, accept='application/json', contentType='application/json') #send the payload to Bedrock

response_body = json.loads(response.get('body').read()) # read the response

response_text = response_body.get("completions")[0].get("data").get("text") #extract the text from the JSON response

print(response_text)

## Process articles with AI21 J2

In [52]:
import os
import json
import boto3

# Create a session with AWS using the provided profile and region details
session = boto3.Session(
    profile_name=os.environ.get("BWB_PROFILE_NAME")
)

# Create a Bedrock runtime client
bedrock = session.client(
    service_name='bedrock-runtime',
    region_name=os.environ.get("BWB_REGION_NAME"),
    endpoint_url=os.environ.get("BWB_ENDPOINT_URL")
)

# Set the foundation model
bedrock_model_id = "ai21.j2-ultra-v1"

# Process each of the top 3 articles

for index, row in query_result_df.iterrows():
    # Break the loop after processing 3 rows
    if index >= 3:
        break
        
    question = 'Please provide executive bullet points of the previous text'
    # question = 'Please provide one sentence answers for who, what, when, where, and why of the previous text?'
    doc = row['abstract']  # Assuming the 'abstract' field contains the text of the article
    prompt = f"""Answer based on context:\n\n{doc}\n\n{question}"""
    
    # Prepare the payload for the Bedrock model
    body = json.dumps({
        "prompt": prompt,
        "maxTokens": 1024,
        "temperature": 0.2,
        "topP": 0.5,
        "stopSequences": [],
        "countPenalty": {"scale": 0},
        "presencePenalty": {"scale": 0},
        "frequencyPenalty": {"scale": 0}
    })

    # Make the prediction request
    response = bedrock.invoke_model(
        body=body,
        modelId=bedrock_model_id,
        accept='application/json',
        contentType='application/json'
    )

    # Decode the response and print it
    response_body = json.loads(response.get('body').read())
    response_text = response_body.get("completions")[0].get("data").get("text")
    print(f"Response for Article {index + 1}:\n{response_text}\n")


Response for Article 1:

Advances in semiconductor technology are contributing to the increasing complexity in the design of embedded systems. Architectures with novel techniques such as evolvable nature and autonomous behavior have engrossed a lot of attention. This paper demonstrates conceptually evolvable embedded systems can be characterized based on acausal nature. It is noted that in acausal systems, future input needs to be known, here we make a mechanism such that the system predicts the future inputs and exhibits pseudo acausal nature. An embedded system that uses theoretical framework of acausality is proposed. Our method aims at a novel architecture that features the hardware evolability and autonomous behavior alongside pseudo acausality. Various aspects of this architecture are discussed in detail along with the limitations.

* Advances in semiconductor technology are contributing to the increasing complexity in the design of embedded systems.
* Architectures with novel te

# Cleanup Resources (Don't waste money if you don't have to)

## Cleanup SageMaker Inference Endpoints

In [7]:
import boto3

def list_endpoints(sagemaker_client):
    # List all active endpoints
    response = sagemaker_client.list_endpoints(SortBy='CreationTime', SortOrder='Descending')
    endpoints = response['Endpoints']
    while 'NextToken' in response:
        response = sagemaker_client.list_endpoints(NextToken=response['NextToken'], SortBy='CreationTime', SortOrder='Descending')
        endpoints.extend(response['Endpoints'])
    return endpoints

def delete_selected_endpoints(sagemaker_client, selected_endpoint_names):
    for endpoint_name in selected_endpoint_names:
        sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
        print(f"Deleted endpoint: {endpoint_name}")

def main():
    sagemaker_client = boto3.client('sagemaker')
    endpoints = list_endpoints(sagemaker_client)

    # Print endpoint names
    print("Active Endpoints:")
    for i, endpoint in enumerate(endpoints):
        print(f"{i+1}. {endpoint['EndpointName']}")

    # User input to select endpoints
    selected_indices = input("Enter the endpoint numbers to delete (comma-separated): ")
    selected_indices = [int(x.strip()) for x in selected_indices.split(',')]

    # Get endpoint names based on indices
    selected_endpoint_names = [endpoints[i-1]['EndpointName'] for i in selected_indices]

    # Delete selected endpoints
    delete_selected_endpoints(sagemaker_client, selected_endpoint_names)

if __name__ == "__main__":
    main()


Active Endpoints:
1. huggingface-pytorch-tgi-inference-2024-01-15-03-52-46-512
2. huggingface-pytorch-tgi-inference-2024-01-15-01-25-42-790
3. huggingface-pytorch-tgi-inference-2024-01-15-01-15-53-997
4. huggingface-pytorch-tgi-inference-2024-01-15-00-32-07-025
5. hf-llm-mistral-7b-2024-01-12-22-18-38-063
6. opensearch-gen-ai-llm-falcon-7b-bf16-acf36e80
7. opensearch-gen-ai-embedding-gpt-j-6b-acf36e80
8. RagEnginesSageMakerModelMultiAB24AEndpoint6DA7D681-7QzzdmVCz76E


Enter the endpoint numbers to delete (comma-separated):  3,4


Deleted endpoint: huggingface-pytorch-tgi-inference-2024-01-15-01-15-53-997
Deleted endpoint: huggingface-pytorch-tgi-inference-2024-01-15-00-32-07-025


## Cleanup OpenSearch Domains

In [None]:
import boto3

def list_domains(opensearch_client):
    response = opensearch_client.list_domain_names()
    return [domain['DomainName'] for domain in response['DomainNames']]

def delete_domain(opensearch_client, domain_name):
    opensearch_client.delete_domain(DomainName=domain_name)
    print(f"Deleted domain: {domain_name}")

def main():
    opensearch_client = boto3.client('opensearch')

    # List and select domain
    domains = list_domains(opensearch_client)
    print("Available Domains:")
    for i, domain in enumerate(domains):
        print(f"{i+1}. {domain}")

    selected_domain_indices = input("Enter the domain numbers to delete (comma-separated): ")
    selected_domain_indices = [int(x.strip()) for x in selected_domain_indices.split(',')]

    # Delete selected domains
    for index in selected_domain_indices:
        delete_domain(opensearch_client, domains[index-1])

if __name__ == "__main__":
    main()
