# buKnowledge Bases for Amazon Bedrock
## Access Control Filtering - End to end notebook

This notebook will guide the users on creating access controls for Knowledge Bases on Amazon Bedrock.

To demonstrate the access control capabilities enabled by metadata filtering in Knowledge Bases, let's consider a use case where you work at a large enterprise, AcmeCorp. At AcmeCorp we want to create a Knowledge Base containing content from various s3 buckets.  However, each user does not have access to all data. A RAG architecture is perfect for this use case since we can restrict the retrieval to only the documents we have access to.  

To complete this notebook you should have a role with access to the following services: Amazon S3, AWS STS, AWS Lambda, AWS CloudFormation, Amazon Bedrock, Amazon Cognito and Amazon Opensearch Serverless. 

This notebook contains the following sections:

0. **Base Infrastructure Deployment**: In this section you will deploy an Amazon Cloudformation Template which will create and configure some of the services used for the solution. 
1. **Amazon Cognito:** You are going to populate an Amazon Cognito pool with three users. We will use the unique identifiers generated by Cognito for each user to associate document corpus with the respective users.
2. **User-corpus association in Amazon DynamoDB:** You will populate an Amazon DynamoDB table which will store user-corpus associations. 
3. **Dataset download:** For this notebook you will use documents provided in an s3 bucket and stored in 3 different folders.
4. **Metadata association:** You will use the user identifiers generated by Cognito to create metadata files associated to each corpus.
5. **Create OpensearchServerless** You will create an OpenSearch Serverless collection to be used by Amazon Knowledge Base.
6. **Create a Knowledge Base for Amazon Bedrock**: You will create and sync the Knowledge Base with the documents and associated metadata.
7. **Update AWS Lambda:** Until Boto3/Lambda is updated -- Create a Lambda Layer to include the latest SDK.
8. **Create and run a Streamlit Application:** You will create a simple interface to showcase access control with metadata filtering using a Streamlit application
9. **Clean up:** Delete all the resources created during this notebook to avoid unnecessary costs. 

In [None]:
!pip install -qU opensearch-py streamlit streamlit-cognito-auth retrying boto3 botocore

Let's import necessary Python modules and libraries, and initialize AWS service clients required for the notebook.

In [209]:
import os
import json
import time
import uuid
import boto3
import requests
import random
from utilsmod import create_base_infrastructure, create_kb_infrastructure, updateDataAccessPolicy, createAOSSIndex, replace_vars
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
from botocore.exceptions import ClientError


s3_client = boto3.client('s3')
sts_client = boto3.client('sts')
session = boto3.session.Session()
region = session.region_name
lambda_client = boto3.client('lambda')
dynamodb_resource = boto3.resource('dynamodb')
cloudformation = boto3.client('cloudformation')
opensearch = boto3.client('opensearchserverless')
bedrock_agent_client = boto3.client('bedrock-agent')
bedrock = boto3.client("bedrock",region_name=region)
account_id = sts_client.get_caller_identity()["Account"]
cognito_client = boto3.client('cognito-idp', region_name=region)
identity_arn = session.client('sts').get_caller_identity()['Arn']
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime')
bucket_name = 'namer' + account_id + '-bucket'

### 0. Base Infrastructure 
The following has already been created for you by the workshop: 

- Amazon Cognito User Pool and App Client. (user_pool_id, cognito_arn, client_id, client_secret)
- Amazon DynamoDB Table
- Amazon S3 Bucket
- AWS Lambda Function

<div class="alert alert-block alert-warning">
The deployment of the Amazon Cloudformation template should take around <b>1-2 minutes</b>.
    
You can also follow the deployment status in the Amazon Cloudformation console. 
</div>

In [None]:
#WE NO LONGER NEED THIS
def short_uuid():
    uuid_str = str(uuid.uuid4())
    return uuid_str[:8]

solution_id = 'KBS{}'.format(short_uuid()).lower()
user_pool_id, user_pool_arn, cognito_arn, client_id, client_secret, dynamo_table, s3_bucket, lambda_function_arn, collection_id = create_base_infrastructure(solution_id)

In [None]:
#WE NO LONGER NEED THIS
%store user_pool_id user_pool_arn cognito_arn client_id client_secret dynamo_table s3_bucket lambda_function_arn collection_id solution_id

In [None]:
%store

### 1. Amazon Cognito User Pool: Users and Corpus
#### Create users and corpus into the user pool
We will create users and corpus to test out the use case. User ids are stored for later use when retrieving information.
For the notebook to work you will need to replace the placeholder for 2 doctors and 3 patients. This users will be created in the Amazon Cognito user pool and you will later need them to log into the web application. While this is a dummy user creation for test purposes, in production use cases you will need to follow you organization best practices and guidelines to create users. 

**For this example, the first doctor will have associated the first two patients, and the second doctor will have associated the third patient.** 

<div class="alert alert-block alert-warning">
<b>Warning:</b> 
<br><b>Password minimum length:</b>8 character(s)
<br><b>Password requirements</b>
<br>Contains at least 1 number
<br>Contains at least 1 special character
<br>Contains at least 1 uppercase letter
<br>Contains at least 1 lowercase letter
</div>

In [None]:
users = [
    {
        'name': 'Highway Harry',
        'email': 'highway.harry@acmecorp.com',
        'password': 'Highway.Harry.123$',
        'corpus': ['highway']
    },
    {
        'name': 'Wildlife Walter',
        'email': 'wildlife.walter@acmecorp.com',
        'password': 'Wildlife.Walter.123$',
        'corpus': ['wildlife']
    },
    {
        'name': 'Admin Amy',
        'email': 'admin.amy@acmecorp.com',
        'password': 'Admin.Amy.123$',
        'corpus': ['highway', 'wildlife']
    },
]

corpus = [
    {
        'name': 'highway',
        'description': 'document regarding highway and roadsign regulations',
        's3path': f"s3://{s3_bucket}/highway/"
    },
    {
        'name': 'wildlife',
        'description': 'documents regarding fishing and hunting regulations',
        's3path': f's3://{s3_bucket}/wildlife/'
    },

]

In [None]:

user_ids = []
corpus_ids = []

def create_user(user_data, user_type):
    user_ids = []
    for user in user_data:
        response = cognito_client.admin_create_user(
            UserPoolId=user_pool_id,
            Username=user['email'],
            UserAttributes=[
                {'Name': 'name', 'Value': user['name']},
                {'Name': 'email', 'Value': user['email']},
                {'Name': 'email_verified', 'Value': 'true'}
            ],
            ForceAliasCreation=False,
            MessageAction='SUPPRESS'
        )
        cognito_client.admin_set_user_password(
            UserPoolId=user_pool_id,
            Username=user['email'],
            Password=user['password'],
            Permanent=True
        )
        print(f"{user_type.capitalize()} created:", response['User']['Username'])
        print(f"{user_type.capitalize()} id:", response['User']['Attributes'][3]['Value'])
        user_ids.append(response['User']['Attributes'][3]['Value'])
    return user_ids

user_ids = create_user(users, 'user')
corpus_ids = [str(uuid.uuid4()) for c in corpus]

print("User IDs:", user_ids)
print("Corpus IDs:", corpus_ids)

%store user_ids corpus_ids

### 2. User-corpus association in DynamoDB
In this section we will populate the already created DynamoDB table with the user-corpus associations. This will be useful later on to retrieve the list of corpus ids a user is allowed to filter by. *

In [None]:
table = dynamodb_resource.Table(dynamo_table)
corpus_mapping = [entry['name'] for entry in corpus]
with table.batch_writer() as batch:
    for corpus,user in enumerate(users):
        temp = []
        for corpus_id,corpuses in enumerate(corpus_mapping):
            if corpuses in user['corpus']:
                temp.append(corpus_ids[corpus_id])

        batch.put_item(
            Item={
                'user_id': user_ids[corpus],
                'corpus_id_list': temp
            }
        )

print('Data inserted successfully!')

In [176]:
!aws s3 cp ./source_transcripts/ s3://bucket_name/ --recursive

upload failed: source_transcripts/highway/23 CFR Part 655 (up to date as of 8-21-2024).pdf to s3://bucket_name/highway/23 CFR Part 655 (up to date as of 8-21-2024).pdf An error occurred (NoSuchBucket) when calling the PutObject operation: The specified bucket does not exist
upload failed: source_transcripts/wildlife/50 CFR Part 13 (up to date as of 8-21-2024).pdf to s3://bucket_name/wildlife/50 CFR Part 13 (up to date as of 8-21-2024).pdf An error occurred (NoSuchBucket) when calling the PutObject operation: The specified bucket does not exist
upload failed: source_transcripts/wildlife/Loon - Wikipedia.pdf to s3://bucket_name/wildlife/Loon - Wikipedia.pdf An error occurred (NoSuchBucket) when calling the PutObject operation: The specified bucket does not exist


In [None]:
# Loop through the corpus and their corresponding IDs
for corpuses, corpus_entry in enumerate(corpus):
    corpus_id = corpus_ids[corpuses]
    s3path = corpus_entry['s3path']
    
    # Get bucket and prefix
    # Remove 's3://' and split bucket and prefix
    path_parts = s3path.replace('s3://', '').split('/', 1)
    bucket = path_parts[0]
    prefix = path_parts[1] if len(path_parts) > 1 else ''
    
    # List all files in the S3 folder
    response = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix)
    if 'Contents' in response:
        files = [obj['Key'] for obj in response['Contents'] if obj['Key'] != prefix]
    else:
        files = []
    
    for file in files:
        metadata = {
            "metadataAttributes": {
                "corpus_id": corpus_id
            }
        }

        # Upload metadata file to S3
        s3_client.put_object(
            Bucket=bucket,
            Key=f"{file}.metadata.json",
            Body=json.dumps(metadata, indent=4),
            ContentType='application/json'
        )

### 5. Upload to Amazon S3
Knowledge Bases for Amazon Bedrock, currently require data to reside in an Amazon S3 bucket. We will upload both files and metadata files.

### 6. Create OpensearchServerless

In this section we will create all the policies for the OpenSearch Serverless Collection and then create the collection itself.

First we will create an encryption policy for the collection.

In [188]:
policy = '{{"Rules":[{{"ResourceType": "collection", "Resource":["collection/namer-{0}-kbcollection"]}}], "AWSOwnedKey": true}}'.format(account_id)
print(policy)

results = opensearch.create_security_policy(
    description='Public encryption access namer workshop collection',
    name='namer-' + account_id + '-kbenc',
    policy=policy,
    type='encryption'
)

print(results)

{"Rules":[{"ResourceType": "collection", "Resource":["collection/namer-431615879134-kbcollection"]}], "AWSOwnedKey": true}
{'securityPolicyDetail': {'createdDate': 1725552314705, 'description': 'Public encryption access namer workshop collection', 'lastModifiedDate': 1725552314705, 'name': 'namer-431615879134-kbenc', 'policy': {'Rules': [{'Resource': ['collection/namer-431615879134-kbcollection'], 'ResourceType': 'collection'}], 'AWSOwnedKey': True}, 'policyVersion': 'MTcyNTU1MjMxNDcwNV8x', 'type': 'encryption'}, 'ResponseMetadata': {'RequestId': '08c8dd23-67d8-4889-b740-a393a0e3ed18', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '08c8dd23-67d8-4889-b740-a393a0e3ed18', 'date': 'Thu, 05 Sep 2024 16:05:14 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '375', 'connection': 'keep-alive'}, 'RetryAttempts': 0}}


Now we will create a Network Policy

In [193]:
policy = '''[{{"Rules": [{{"ResourceType": "dashboard", 
    "Resource": ["collection/namer-{0}-kbcollection"]}}, 
    {{"ResourceType": "collection", "Resource": ["collection/namer{0}-kbcollection"]}}], 
    "AllowFromPublic": true}}]'''.format(account_id)
print(policy)
results = opensearch.create_security_policy(
    description='Public network access namer workshop collection',
    name='namer-{0}-kbnet'.format(account_id),
    policy=policy,
    type='network'
)

print(results)

[{"Rules": [{"ResourceType": "dashboard", 
    "Resource": ["collection/namer-431615879134-kbcollection"]}, 
    {"ResourceType": "collection", "Resource": ["collection/namer431615879134-kbcollection"]}], 
    "AllowFromPublic": true}]
{'securityPolicyDetail': {'createdDate': 1725554665546, 'description': 'Public network access namer workshop collection', 'lastModifiedDate': 1725554665546, 'name': 'namer-431615879134-kbnet', 'policy': [{'Rules': [{'Resource': ['collection/namer-431615879134-kbcollection'], 'ResourceType': 'dashboard'}, {'Resource': ['collection/namer431615879134-kbcollection'], 'ResourceType': 'collection'}], 'AllowFromPublic': True}], 'policyVersion': 'MTcyNTU1NDY2NTU0Nl8x', 'type': 'network'}, 'ResponseMetadata': {'RequestId': '3d5037df-ae23-448c-8be9-ab57bb0a1525', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '3d5037df-ae23-448c-8be9-ab57bb0a1525', 'date': 'Thu, 05 Sep 2024 16:44:25 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': 

Next we need to create the Data Access Policy

In [196]:
policy = '''[{{"Rules": [{{"Resource": ["collection/namer-{0}-kbcollection"], 
                           "Permission": ["aoss:CreateCollectionItems", "aoss:UpdateCollectionItems", "aoss:DescribeCollectionItems"], 
                           "ResourceType": "collection"}}, 
                          {{"ResourceType": "index", "Resource": ["index/namer-{0}-kbcollection/*"], 
                           "Permission": ["aoss:CreateIndex", "aoss:DescribeIndex", "aoss:ReadDocument", "aoss:WriteDocument", "aoss:UpdateIndex", "aoss:DeleteIndex"]}}], 
                "Principal": ["arn:aws:iam::{0}:role/namer-{0}-kbrole"]}}]'''.format(account_id)
print(policy)
results = opensearch.create_access_policy(
    description='Data access policy for the NAMER summit',
    name='namer-{0}-kbaccess'.format(account_id),
    policy=policy,
    type='data'
)

[{"Rules": [{"Resource": ["collection/namer-431615879134-kbcollection"], 
                           "Permission": ["aoss:CreateCollectionItems", "aoss:UpdateCollectionItems", "aoss:DescribeCollectionItems"], 
                           "ResourceType": "collection"}, 
                          {"ResourceType": "index", "Resource": ["index/namer-431615879134-kbcollection/*"], 
                           "Permission": ["aoss:CreateIndex", "aoss:DescribeIndex", "aoss:ReadDocument", "aoss:WriteDocument", "aoss:UpdateIndex", "aoss:DeleteIndex"]}], 
                "Principal": ["arn:aws:iam::431615879134:role/namer-431615879134-kbrole"]}]


Now that we have our policies we can create the OpenSearch Serverless Collection

In [197]:
results = opensearch.create_collection(
    description='KB AOSS Collection',
    name='namer-{0}-kbcollection'.format(account_id),
    type='VECTORSEARCH'
)

print(results)

Creating the collection takes some time so we will check to see if it has been created yet

In [213]:
response = opensearch.list_collections(collectionFilters={'name':'namer-{0}-kbcollection'.format(account_id)})
collection_id = response["collectionSummaries"][0]["id"]
collection_arn = response["collectionSummaries"][0]["arn"]
while response["collectionSummaries"][0]["status"] != "ACTIVE":
    wait(10)

print("Collection created")


Collection created


### 7. Create a Knowledge Base for Amazon Bedrock

In this section we will go through all the steps to create and test a Knowledge Base. 

In [204]:
indexName = "namer-{0}-kb-acl-index".format(account_id)
print("Index name:",indexName)
%store indexName

Index name: namer-431615879134-kb-acl-index
Stored 'indexName' (str)


In [207]:
# Adding the current role to the collection's data access policy
data_access_policy_name = 'namer-{0}-kbaccess'.format(account_id)
current_role_arn = sts_client.get_caller_identity()['Arn']
response = opensearch.get_access_policy(
    name=data_access_policy_name,
    type='data'
)
policy_version = response["accessPolicyDetail"]["policyVersion"]
existing_policy = response['accessPolicyDetail']['policy']
updated_policy = existing_policy.copy()
updated_policy[0]['Principal'].append(current_role_arn)
updated_policy = str(updated_policy).replace("'", '"')

response = opensearch.update_access_policy(
    description='dataAccessPolicy',
    name=data_access_policy_name,
    policy=updated_policy,
    policyVersion=policy_version,
    type='data'
)
print(response)

time.sleep(60) # Changes to the data access policy might take a bit to update
print("Finished adding the role")

{'accessPolicyDetail': {'createdDate': 1725554790042, 'description': 'dataAccessPolicy', 'lastModifiedDate': 1725557089030, 'name': 'namer-431615879134-kbaccess', 'policy': [{'Rules': [{'Resource': ['collection/namer-431615879134-kbcollection'], 'Permission': ['aoss:CreateCollectionItems', 'aoss:UpdateCollectionItems', 'aoss:DescribeCollectionItems'], 'ResourceType': 'collection'}, {'Resource': ['index/namer-431615879134-kbcollection/*'], 'Permission': ['aoss:CreateIndex', 'aoss:DescribeIndex', 'aoss:ReadDocument', 'aoss:WriteDocument', 'aoss:UpdateIndex', 'aoss:DeleteIndex'], 'ResourceType': 'index'}], 'Principal': ['arn:aws:iam::431615879134:role/namer-431615879134-kbrole', 'arn:aws:sts::431615879134:assumed-role/namer-summit-2024/SageMaker']}], 'policyVersion': 'MTcyNTU1NzA4OTAzMF8y', 'type': 'data'}, 'ResponseMetadata': {'RequestId': '6ffa318b-9783-400b-bf1e-d4854c066b89', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '6ffa318b-9783-400b-bf1e-d4854c066b89', 'date': 'Th

In [210]:
# Set up AWS authentication
service = 'aoss'
credentials = boto3.Session().get_credentials()
awsauth = AWSV4SignerAuth(credentials, region, service)

# Define index settings and mappings
index_settings = {
    "settings": {
        "index.knn": "true"
    },
    "mappings": {
        "properties": {
            "vector": {
                "type": "knn_vector",
                "dimension": 1024,
                 "method": {
                     "name": "hnsw",
                     "engine": "faiss",
                     "space_type": "innerproduct",
                     "parameters": {
                         "ef_construction": 512,
                         "m": 16
                     },
                 },
             },
            "text": {
                "type": "text"
            },
            "text-metadata": {
                "type": "text"
            }
        }
    }
}

# Build the OpenSearch client
host = f"{collection_id}.{region}.aoss.amazonaws.com"
oss_client = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    timeout=300
)

# Create index
response = oss_client.indices.create(index=indexName, body=json.dumps(index_settings))
print(response)

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'namer-431615879134-kb-acl-index'}


#### Create the Knowledge Base
In this section you will create the Knowledge Base. Before creating a new KB we need to define which embeddings model we want it to use. In this case we will be using Amazon Titan Embeddings V2. 

<div class="alert alert-block alert-warning">
<b>Warning:</b> Make sure you have enabled Amazon Titan Embeddings V2 access in the Amazon Bedrock Console (model access). 
</div>

In [211]:
embeddingModelArn = "arn:aws:bedrock:{}::foundation-model/amazon.titan-embed-text-v2:0".format(region)

Now we can create our Knowledge Base for Amazon Bedrock. We have created an Amazon CloudFormation template which takes care of the configuration needed.

<div class="alert alert-block alert-warning">
The deployment of the Amazon Cloudformation template should take around <b>1-2 minutes</b>.
    
You can also follow the deployment status in the Amazon Cloudformation console. 
</div>

In [None]:
#kb_id, datasource_id = create_kb_infrastructure(solution_id, s3_bucket, embeddingModelArn, indexName, region, account_id, collection_id)

results = bedrock_agent_client.create_knowledge_base(
    description='Test KB Deployment',
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': embeddingModelArn
        }
    },
    name='namer-{0}-knowledge-base'.format(account_id),
    roleArn='arn:aws:iam::{0}:role/Namer-{0}-SageMaker-Execution-KBRole'.format(account_id),
    storageConfiguration={
        'opensearchServerlessConfiguration': {
            'collectionArn': collection_arn,
            'fieldMapping': {
                'metadataField': 'text-metadata',
                'textField': 'text',
                'vectorField': 'vector'
            },
            'vectorIndexName': indexName
        },
        'type': 'OPENSEARCH_SERVERLESS'
    }
)

print(results)

In [None]:
%store kb_id datasource_id

In [None]:
results = bedrock_agent_client.create_data_source(
    dataSourceConfiguration={
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::{0}'.format(bucket_name),
        },
        'type': 'S3',
    },
    description='KB Data Source',
    knowledgeBaseId='string',
    name='string',
    serverSideEncryptionConfiguration={
        'kmsKeyArn': 'string'
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE'|'NONE'|'HIERARCHICAL'|'SEMANTIC',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 123,
                'overlapPercentage': 123
            },
            'hierarchicalChunkingConfiguration': {
                'levelConfigurations': [
                    {
                        'maxTokens': 123
                    },
                ],
                'overlapTokens': 123
            },
            'semanticChunkingConfiguration': {
                'breakpointPercentileThreshold': 123,
                'bufferSize': 123,
                'maxTokens': 123
            }
        },
        'customTransformationConfiguration': {
            'intermediateStorage': {
                's3Location': {
                    'uri': 'string'
                }
            },
            'transformations': [
                {
                    'stepToApply': 'POST_CHUNKING',
                    'transformationFunction': {
                        'transformationLambdaConfiguration': {
                            'lambdaArn': 'string'
                        }
                    }
                },
            ]
        },
        'parsingConfiguration': {
            'bedrockFoundationModelConfiguration': {
                'modelArn': 'string',
                'parsingPrompt': {
                    'parsingPromptText': 'string'
                }
            },
            'parsingStrategy': 'BEDROCK_FOUNDATION_MODEL'
        }
    }
)

#### Sync the Knowledge Base
As we have created and associated the data source to the Knowledge Base, we can proceed to Sync the data. 


Each time you add, modify, or remove files from the S3 bucket for a data source, you must sync the data source so that it is re-indexed to the knowledge base. Syncing is incremental, so Amazon Bedrock only processes the objects in your S3 bucket that have been added, modified, or deleted since the last sync.

In [None]:
ingestion_job_response = bedrock_agent_client.start_ingestion_job(
    knowledgeBaseId=kb_id,
    dataSourceId=datasource_id,
    description='Initial Ingestion'
)

In [None]:
status = bedrock_agent_client.get_ingestion_job(
    knowledgeBaseId=ingestion_job_response["ingestionJob"]["knowledgeBaseId"],
    dataSourceId=ingestion_job_response["ingestionJob"]["dataSourceId"],
    ingestionJobId=ingestion_job_response["ingestionJob"]["ingestionJobId"]
)["ingestionJob"]["status"]
print(status)
while status not in ["COMPLETE", "FAILED", "STOPPED"]:
    status = bedrock_agent_client.get_ingestion_job(
        knowledgeBaseId=ingestion_job_response["ingestionJob"]["knowledgeBaseId"],
        dataSourceId=ingestion_job_response["ingestionJob"]["dataSourceId"],
        ingestionJobId=ingestion_job_response["ingestionJob"]["ingestionJobId"]
    )["ingestionJob"]["status"]
    print(status)
    time.sleep(30)
print("Waiting for changes to take place in the vector database")
time.sleep(30) # Wait for all changes to take place

#### Test the Knowledge Base

Now the Knowlegde Base is available we can test it out using the **retrieve** and **retrieve_and_generate** APIs.

Let's examine a test case with patient 0's transcript, where they mention a cat named Kelly. We'll query the knowledge base using the metadata filter for patient 0 to retrieve information about Kelly. Changing the patient_id will prevent the model from responding accurately. Read through the PDFs for other questions you might want to ask. 

In this first example we are going to use the **retrieve and generate API**. This API queries a knowledge base and generates responses based on the retrieved results, using an LLM.

<div class="alert alert-block alert-warning">
<b>Warning:</b> Make sure you have enabled Anthropic Claude 3 Sonnet access in the Amazon Bedrock Console (model access). 
</div>

In [None]:
# retrieve and generate API
response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": "Which office do I submit for golden eagle permits?"
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0".format(region),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5,
                    "filter": {
                        "equals": {
                            "key": "corpus_id",
                            "value": corpus_ids[1]
                        }
                    }
                } 
            }
        }
    }
)

print(response['output']['text'],end='\n'*2)

In this second example we are going to use the **retrieve API**. This API queries the knowledge base and retrieves relavant information from it, it does not generate the response.

In [None]:
response_ret = bedrock_agent_runtime_client.retrieve(
    knowledgeBaseId=kb_id, 
    nextToken='string',
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "numberOfResults":3,
            "filter": {
                 "equals": {
                    "key": "corpus_id",
                    "value": corpus_ids[1]
                        }
                    }
                } 
            },
    retrievalQuery={
        'text': "Which office do I submit for golden eagle permits?"
            
        }
)

def response_print(retrieve_resp):
#structure 'retrievalResults': list of contents
# each list has content,location,score,metadata
    for num,chunk in enumerate(response_ret['retrievalResults'],1):
        print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
        print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
        print(f'Chunk {num} Score: ',chunk['score'],end='\n'*2)
        print(f'Chunk {num} Metadata: ',chunk['metadata'],end='\n'*2)

response_print(response_ret)

### 7. Add Lambda Layer
At the time of developing this notebook, the latest Boto3 version available in Lambda with Python 3.12 does not include metadata filtering capabilities. To solve this, we will create and attach an AWS Lambda Layer with the latest Boto3 version.

For this section to run you will need the **zip** package to by installed at the system level.

You can check if zip is installed running the following command: !zip

If it is not installed you will need to install it using the appropriate package manager (apt-get for Debian-based systems or yum for RHEL-based systems for example).

In [None]:
# can we have the lambda layer already attached to the lambda function?

In [None]:
#!zip
!sudo apt-get install zip -y # Debian-based systems 
#!sudo yum install zip -y # RHEL-based systems

In [None]:
!mkdir latest-sdk-layer
%cd latest-sdk-layer
!pip install -qU boto3 botocore -t python/lib/python3.12/site-packages/
!zip -rq latest-sdk-layer.zip .
%cd ..

In [None]:
def publish_lambda_layer(layer_name, description, zip_file_path, compatible_runtimes):
    with open(zip_file_path, 'rb') as f:
        response = lambda_client.publish_layer_version(
            LayerName=layer_name,
            Description=description,
            Content={
                'ZipFile': f.read(),
            },
            CompatibleRuntimes=compatible_runtimes
        )
    return response['LayerVersionArn']

In [None]:
layer_name = 'latest-sdk-layer'
description = 'Layer with the latest boto3 version.'
zip_file_path = 'latest-sdk-layer/latest-sdk-layer.zip'
compatible_runtimes = ['python3.12']

In [None]:
layer_version_arn = publish_lambda_layer(layer_name, description, zip_file_path, compatible_runtimes)
print("Layer version ARN:", layer_version_arn)

In [None]:
try:
    # Add the layer to the Lambda function
    lambda_client.update_function_configuration(
        FunctionName=lambda_function_arn,
        Layers=[layer_version_arn]
    )
    print("Layer added to the Lambda function successfully.")

except ClientError as e:
    print(f"Error adding layer to Lambda function: {e.response['Error']['Message']}")
    
except Exception as e:
    print(f"An unexpected error occurred: {e}")

### 8. Create Streamlit Application
To showcase the interaction between doctors and the Knowledge Bases, we can develop a user-friendly web application using Streamlit for testing purposes, a popular open-source Python library for building interactive data apps. Streamlit provides a simple and intuitive way to create custom interfaces that can seamlessly integrate with the various AWS services involved in this solution.

Here is the application, **don't modify the placeholders, we will replace them in the next cell.** 

In [None]:
%%writefile app.py
import os
import boto3
import json
import requests
import streamlit as st
from streamlit_cognito_auth import CognitoAuthenticator

pool_id = "<<replace_pool_id>>"
app_client_id = "<<replace_app_client_id>>"
app_client_secret = "<<replace_app_client_secret>>"
kb_id = "<<replace_kb_id>>"
lambda_function_arn = '<<replace_lambda_function_arn>>'
dynamo_table = '<<replace_dynamo_table_name>>'

authenticator = CognitoAuthenticator(
    pool_id=pool_id,
    app_client_id=app_client_id,
    app_client_secret= app_client_secret,
    use_cookies=False
)

is_logged_in = authenticator.login()

if not is_logged_in:
    st.stop()

def logout():
    authenticator.logout()

def get_user_sub(user_pool_id, username):
    cognito_client = boto3.client('cognito-idp')
    try:
        response = cognito_client.admin_get_user(
            UserPoolId=pool_id,
            Username=authenticator.get_username()
        )
        sub = None
        for attr in response['UserAttributes']:
            if attr['Name'] == 'sub':
                sub = attr['Value']
                break
        return sub
    except cognito_client.exceptions.UserNotFoundException:
        print("User not found.")
        return None

def get_corpus_ids(user_id):
    dynamodb = boto3.client('dynamodb')
    response = dynamodb.query(
        TableName=dynamo_table,
        KeyConditionExpression='user_id = :user_id',
        ExpressionAttributeValues={
            ':user_id': {'S': user_id}
        }
    )
    print(response)
    corpus_id_list = []  # Initialize the list
    for item in response['Items']:
        corpus_ids = item.get('corpus_id_list', {}).get('L', [])
        corpus_id_list.extend([corpus_id['S'] for corpus_id in corpus_ids])
    return corpus_id_list

def search_transcript(user_id, kb_id, text, corpus_ids):
    # Initialize the Lambda client
    lambda_client = boto3.client('lambda')

    # Payload for the Lambda function
    payload = json.dumps({
        "userId": sub,
        "knowledgeBaseId": kb_id,
        "text": text, 
        "corpusIds": corpus_ids
    }).encode('utf-8')

    try:
        # Invoke the Lambda function
        response = lambda_client.invoke(
            FunctionName=lambda_function_arn,
            InvocationType='RequestResponse',
            Payload=payload
        )

        # Process the response
        if response['StatusCode'] == 200:
            response_payload = json.loads(response['Payload'].read().decode('utf-8'))
            return response_payload
        else:
            # Handle error response
            return {'error': 'Failed to fetch data'}

    except Exception as e:
        # Handle exception
        return {'error': str(e)}

sub = get_user_sub(pool_id, authenticator.get_username())
print(sub)
corpus_ids = get_corpus_ids(sub)
print(corpus_ids)

# Application Front

with st.sidebar:
    st.header("User Information")
    st.markdown("## User")
    st.text(authenticator.get_username())
    st.markdown("## User Id")
    st.text(sub)
    # selected_patient = st.selectbox("Select a patient (or 'All' for all patients)", ['All'] + patient_ids)
    st.button("Logout", "logout_btn", on_click=logout)

st.header("Corpus Search Tool")

# Text input for the search query
query = st.text_input("Enter your search query:")

if st.button("Search"):
    if query:
        # Perform search
        corpus_ids_filter = corpus_ids
        results = search_transcript(sub, kb_id, query, corpus_ids_filter)
        print(results)
        if results:
            st.subheader("Search Results:")
            st.markdown(results["body"], unsafe_allow_html=True)
        else:
            st.write("No matching results found in corpus.")
    else:
        st.write("Please enter a search query.")

In [None]:
replace_vars("app.py", user_pool_id, client_id, client_secret, kb_id, lambda_function_arn, dynamo_table)

#### Execute the streamlit locally
Execute the cell below to run the Streamlit application.

**Use the email and password of the doctors you defined at the top of the notebook to access the application.**

Once you have logged in, you can filter by specific patients you have assigned (dropdown in the left panel), or all to query the knowledge base. 

In [None]:
!streamlit run app.py

If you are executing this notebook on SageMaker Studio you can access the Streamlit application in the following url. 

```
https://<<STUDIOID>>.studio.<<REGION>>.sagemaker.aws/jupyterlab/default/proxy/8501/
```

If you are executing this notebook on a SageMaker Notebook you can access the Streamlit application in the following url. 

```
https://<<NOTEBOOKID>>.notebook.<<REGION>>.sagemaker.aws/proxy/8501/
```

In [None]:
https://rxhmevhu84g12we.studio.us-west-2.sagemaker.aws/jupyterlab/default/proxy/8501/

### 9. Clean up
**Before running this cell you will need to stop the cell above where the app is runnning!**

Run the following cell to delete the created resources and avoid unnecesary costs. This should take about 2-3 minutes to complete. 

In [None]:
# Delete all objects in the bucket
try:
    response = s3_client.list_objects_v2(Bucket=s3_bucket)
    if 'Contents' in response:
        for obj in response['Contents']:
            s3_client.delete_object(Bucket=s3_bucket, Key=obj['Key'])
        print(f"All objects in {s3_bucket} have been deleted.")
except Exception as e:
    print(f"Error deleting objects from {s3_bucket}: {e}")

# Define the stack names to delete
stack_names = ["KB-E2E-KB-{}".format(solution_id),"KB-E2E-Base-{}".format(solution_id)]

# Iterate over the stack names and delete each stack
for stack_name in stack_names:
    try:
        # Retrieve the stack information
        stack_info = cloudformation.describe_stacks(StackName=stack_name)
        stack_status = stack_info['Stacks'][0]['StackStatus']

        # Check if the stack exists and is in a deletable state
        if stack_status != 'DELETE_COMPLETE':
            # Delete the stack
            cloudformation.delete_stack(StackName=stack_name)
            print(f'Deleting stack: {stack_name}')

            # Wait for the stack deletion to complete
            waiter = cloudformation.get_waiter('stack_delete_complete')
            waiter.wait(StackName=stack_name)
            print(f'Stack {stack_name} deleted successfully.')
        else:
            print(f'Stack {stack_name} does not exist or has already been deleted.')

    except cloudformation.exceptions.ClientError as e:
        print(f'Error deleting stack {stack_name}: {e.response["Error"]["Message"]}')

In [None]:
results = opensearch.create_security_policy(
    description='Public network access namer workshop collection',
    name='namer-' + account_id + '-kbnet',
    policy='''"Rules": [{["ResourceType": "dashboard", 
    "Resource": ["collection/namer-{0}-kbcollection"]]}, 
    {{"ResourceType": "collection", "Resource": ["collection/namer{0}-kbcollection"]}}], 
    "AllowFromPublic": true}}'''.format(account_id),
    type='network'
)

print(results)

Next we need to create the Data Access Policy

In [None]:
results = opensearch.create_access_policy(
    description='Data access policy for the NAMER summit',
    name='namer-' + account_id + '-kbaccess',
    policy='''[{"Rules": [{"Resource": ["collection/namer-{0}-kbcollection"], 
                           "Permission": ["aoss:CreateCollectionItems", "aoss:UpdateCollectionItems", "aoss:DescribeCollectionItems"], 
                           "ResourceType": "collection"}, 
                          {"ResourceType": "index", "Resource": ["index/namer-{0}-kbcollection/*"], 
                           "Permission": ["aoss:CreateIndex", "aoss:DescribeIndex", "aoss:ReadDocument", "aoss:WriteDocument", "aoss:UpdateIndex", "aoss:DeleteIndex"]}], 
                "Principal": ["arn:aws:iam::' + account_id + ':role/namer-{0}-kbrole"]}]'''.format(account_id),
    type='data'
)

Now that we have our policies we can create the OpenSearch Serverless Collection

In [None]:
results = opensearch.create_collection(
    description='KB AOSS Collection',
    name='namer-{0}-kbcollection'.format(account_id),
    type='VECTORSEARCH'
)

print(results)

Creating the collection takes some time so we will check to see if it has been created yet

In [None]:
response = opensearch.list_collections(collectionFilters={'name':'namer-{0}-kbcollection'.format(account_id)})
collection_id = response["collectionSummaries"][0]["id"]
while response["collectionSummaries"][0]["status"] != "ACTIVE":
    wait(10)

print("Collection created")

### 7. Create a Knowledge Base for Amazon Bedrock

In this section we will go through all the steps to create and test a Knowledge Base. 

In [None]:
indexName = "kb-acl-index-" + solution_id
print("Index name:",indexName)
%store indexName

In [None]:
# Adding the current role to the collection's data access policy
data_access_policy_name = 'namer-{0}-kbaccess'.format(account_id)
current_role_arn = sts_client.get_caller_identity()['Arn']
response = opensearch.get_access_policy(
    name=data_access_policy_name,
    type='data'
)
policy_version = response["accessPolicyDetail"]["policyVersion"]
existing_policy = response['accessPolicyDetail']['policy']
updated_policy = existing_policy.copy()
updated_policy[0]['Principal'].append(current_role_arn)
updated_policy = str(updated_policy).replace("'", '"')

response = opensearch.update_access_policy(
    description='dataAccessPolicy',
    name=data_access_policy_name,
    policy=updated_policy,
    policyVersion=policy_version,
    type='data'
)
print(response)

time.sleep(60) # Changes to the data access policy might take a bit to update
print("Finished adding the role")

In [None]:
# Set up AWS authentication
service = 'aoss'
credentials = boto3.Session().get_credentials()
awsauth = AWSV4SignerAuth(credentials, region, service)

# Define index settings and mappings
index_settings = {
    "settings": {
        "index.knn": "true"
    },
    "mappings": {
        "properties": {
            "vector": {
                "type": "knn_vector",
                "dimension": 1024,
                 "method": {
                     "name": "hnsw",
                     "engine": "faiss",
                     "space_type": "innerproduct",
                     "parameters": {
                         "ef_construction": 512,
                         "m": 16
                     },
                 },
             },
            "text": {
                "type": "text"
            },
            "text-metadata": {
                "type": "text"
            }
        }
    }
}

# Build the OpenSearch client
host = f"{collection_id}.{region}.aoss.amazonaws.com"
oss_client = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    timeout=300
)

# Create index
response = oss_client.indices.create(index=indexName, body=json.dumps(index_settings))
print(response)

#### Create the Knowledge Base
In this section you will create the Knowledge Base. Before creating a new KB we need to define which embeddings model we want it to use. In this case we will be using Amazon Titan Embeddings V2. 

<div class="alert alert-block alert-warning">
<b>Warning:</b> Make sure you have enabled Amazon Titan Embeddings V2 access in the Amazon Bedrock Console (model access). 
</div>

In [None]:
embeddingModelArn = "arn:aws:bedrock:{}::foundation-model/amazon.titan-embed-text-v2:0".format(region)

Now we can create our Knowledge Base for Amazon Bedrock. We have created an Amazon CloudFormation template which takes care of the configuration needed.

<div class="alert alert-block alert-warning">
The deployment of the Amazon Cloudformation template should take around <b>1-2 minutes</b>.
    
You can also follow the deployment status in the Amazon Cloudformation console. 
</div>

In [None]:
kb_id, datasource_id = create_kb_infrastructure(solution_id, s3_bucket, embeddingModelArn, indexName, region, account_id, collection_id)

In [None]:
%store kb_id datasource_id

#### Sync the Knowledge Base
As we have created and associated the data source to the Knowledge Base, we can proceed to Sync the data. 


Each time you add, modify, or remove files from the S3 bucket for a data source, you must sync the data source so that it is re-indexed to the knowledge base. Syncing is incremental, so Amazon Bedrock only processes the objects in your S3 bucket that have been added, modified, or deleted since the last sync.

In [None]:
ingestion_job_response = bedrock_agent_client.start_ingestion_job(
    knowledgeBaseId=kb_id,
    dataSourceId=datasource_id,
    description='Initial Ingestion'
)

In [None]:
status = bedrock_agent_client.get_ingestion_job(
    knowledgeBaseId=ingestion_job_response["ingestionJob"]["knowledgeBaseId"],
    dataSourceId=ingestion_job_response["ingestionJob"]["dataSourceId"],
    ingestionJobId=ingestion_job_response["ingestionJob"]["ingestionJobId"]
)["ingestionJob"]["status"]
print(status)
while status not in ["COMPLETE", "FAILED", "STOPPED"]:
    status = bedrock_agent_client.get_ingestion_job(
        knowledgeBaseId=ingestion_job_response["ingestionJob"]["knowledgeBaseId"],
        dataSourceId=ingestion_job_response["ingestionJob"]["dataSourceId"],
        ingestionJobId=ingestion_job_response["ingestionJob"]["ingestionJobId"]
    )["ingestionJob"]["status"]
    print(status)
    time.sleep(30)
print("Waiting for changes to take place in the vector database")
time.sleep(30) # Wait for all changes to take place

#### Test the Knowledge Base

Now the Knowlegde Base is available we can test it out using the **retrieve** and **retrieve_and_generate** APIs.

Let's examine a test case with patient 0's transcript, where they mention a cat named Kelly. We'll query the knowledge base using the metadata filter for patient 0 to retrieve information about Kelly. Changing the patient_id will prevent the model from responding accurately. Read through the PDFs for other questions you might want to ask. 

In this first example we are going to use the **retrieve and generate API**. This API queries a knowledge base and generates responses based on the retrieved results, using an LLM.

<div class="alert alert-block alert-warning">
<b>Warning:</b> Make sure you have enabled Anthropic Claude 3 Sonnet access in the Amazon Bedrock Console (model access). 
</div>

In [None]:
# retrieve and generate API
response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": "Which office do I submit for golden eagle permits?"
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0".format(region),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5,
                    "filter": {
                        "equals": {
                            "key": "corpus_id",
                            "value": corpus_ids[1]
                        }
                    }
                } 
            }
        }
    }
)

print(response['output']['text'],end='\n'*2)

In this second example we are going to use the **retrieve API**. This API queries the knowledge base and retrieves relavant information from it, it does not generate the response.

In [None]:
response_ret = bedrock_agent_runtime_client.retrieve(
    knowledgeBaseId=kb_id, 
    nextToken='string',
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "numberOfResults":3,
            "filter": {
                 "equals": {
                    "key": "corpus_id",
                    "value": corpus_ids[1]
                        }
                    }
                } 
            },
    retrievalQuery={
        'text': "Which office do I submit for golden eagle permits?"
            
        }
)

def response_print(retrieve_resp):
#structure 'retrievalResults': list of contents
# each list has content,location,score,metadata
    for num,chunk in enumerate(response_ret['retrievalResults'],1):
        print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
        print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
        print(f'Chunk {num} Score: ',chunk['score'],end='\n'*2)
        print(f'Chunk {num} Metadata: ',chunk['metadata'],end='\n'*2)

response_print(response_ret)

### 7. Add Lambda Layer
At the time of developing this notebook, the latest Boto3 version available in Lambda with Python 3.12 does not include metadata filtering capabilities. To solve this, we will create and attach an AWS Lambda Layer with the latest Boto3 version.

For this section to run you will need the **zip** package to by installed at the system level.

You can check if zip is installed running the following command: !zip

If it is not installed you will need to install it using the appropriate package manager (apt-get for Debian-based systems or yum for RHEL-based systems for example).

In [None]:
# can we have the lambda layer already attached to the lambda function?

In [None]:
#!zip
!sudo apt-get install zip -y # Debian-based systems 
#!sudo yum install zip -y # RHEL-based systems

In [None]:
!mkdir latest-sdk-layer
%cd latest-sdk-layer
!pip install -qU boto3 botocore -t python/lib/python3.12/site-packages/
!zip -rq latest-sdk-layer.zip .
%cd ..

In [None]:
def publish_lambda_layer(layer_name, description, zip_file_path, compatible_runtimes):
    with open(zip_file_path, 'rb') as f:
        response = lambda_client.publish_layer_version(
            LayerName=layer_name,
            Description=description,
            Content={
                'ZipFile': f.read(),
            },
            CompatibleRuntimes=compatible_runtimes
        )
    return response['LayerVersionArn']

In [None]:
layer_name = 'latest-sdk-layer'
description = 'Layer with the latest boto3 version.'
zip_file_path = 'latest-sdk-layer/latest-sdk-layer.zip'
compatible_runtimes = ['python3.12']

In [None]:
layer_version_arn = publish_lambda_layer(layer_name, description, zip_file_path, compatible_runtimes)
print("Layer version ARN:", layer_version_arn)

In [None]:
try:
    # Add the layer to the Lambda function
    lambda_client.update_function_configuration(
        FunctionName=lambda_function_arn,
        Layers=[layer_version_arn]
    )
    print("Layer added to the Lambda function successfully.")

except ClientError as e:
    print(f"Error adding layer to Lambda function: {e.response['Error']['Message']}")
    
except Exception as e:
    print(f"An unexpected error occurred: {e}")

### 8. Create Streamlit Application
To showcase the interaction between doctors and the Knowledge Bases, we can develop a user-friendly web application using Streamlit for testing purposes, a popular open-source Python library for building interactive data apps. Streamlit provides a simple and intuitive way to create custom interfaces that can seamlessly integrate with the various AWS services involved in this solution.

Here is the application, **don't modify the placeholders, we will replace them in the next cell.** 

In [None]:
%%writefile app.py
import os
import boto3
import json
import requests
import streamlit as st
from streamlit_cognito_auth import CognitoAuthenticator

pool_id = "<<replace_pool_id>>"
app_client_id = "<<replace_app_client_id>>"
app_client_secret = "<<replace_app_client_secret>>"
kb_id = "<<replace_kb_id>>"
lambda_function_arn = '<<replace_lambda_function_arn>>'
dynamo_table = '<<replace_dynamo_table_name>>'

authenticator = CognitoAuthenticator(
    pool_id=pool_id,
    app_client_id=app_client_id,
    app_client_secret= app_client_secret,
    use_cookies=False
)

is_logged_in = authenticator.login()

if not is_logged_in:
    st.stop()

def logout():
    authenticator.logout()

def get_user_sub(user_pool_id, username):
    cognito_client = boto3.client('cognito-idp')
    try:
        response = cognito_client.admin_get_user(
            UserPoolId=pool_id,
            Username=authenticator.get_username()
        )
        sub = None
        for attr in response['UserAttributes']:
            if attr['Name'] == 'sub':
                sub = attr['Value']
                break
        return sub
    except cognito_client.exceptions.UserNotFoundException:
        print("User not found.")
        return None

def get_corpus_ids(user_id):
    dynamodb = boto3.client('dynamodb')
    response = dynamodb.query(
        TableName=dynamo_table,
        KeyConditionExpression='user_id = :user_id',
        ExpressionAttributeValues={
            ':user_id': {'S': user_id}
        }
    )
    print(response)
    corpus_id_list = []  # Initialize the list
    for item in response['Items']:
        corpus_ids = item.get('corpus_id_list', {}).get('L', [])
        corpus_id_list.extend([corpus_id['S'] for corpus_id in corpus_ids])
    return corpus_id_list

def search_transcript(user_id, kb_id, text, corpus_ids):
    # Initialize the Lambda client
    lambda_client = boto3.client('lambda')

    # Payload for the Lambda function
    payload = json.dumps({
        "userId": sub,
        "knowledgeBaseId": kb_id,
        "text": text, 
        "corpusIds": corpus_ids
    }).encode('utf-8')

    try:
        # Invoke the Lambda function
        response = lambda_client.invoke(
            FunctionName=lambda_function_arn,
            InvocationType='RequestResponse',
            Payload=payload
        )

        # Process the response
        if response['StatusCode'] == 200:
            response_payload = json.loads(response['Payload'].read().decode('utf-8'))
            return response_payload
        else:
            # Handle error response
            return {'error': 'Failed to fetch data'}

    except Exception as e:
        # Handle exception
        return {'error': str(e)}

sub = get_user_sub(pool_id, authenticator.get_username())
print(sub)
corpus_ids = get_corpus_ids(sub)
print(corpus_ids)

# Application Front

with st.sidebar:
    st.header("User Information")
    st.markdown("## User")
    st.text(authenticator.get_username())
    st.markdown("## User Id")
    st.text(sub)
    # selected_patient = st.selectbox("Select a patient (or 'All' for all patients)", ['All'] + patient_ids)
    st.button("Logout", "logout_btn", on_click=logout)

st.header("Corpus Search Tool")

# Text input for the search query
query = st.text_input("Enter your search query:")

if st.button("Search"):
    if query:
        # Perform search
        corpus_ids_filter = corpus_ids
        results = search_transcript(sub, kb_id, query, corpus_ids_filter)
        print(results)
        if results:
            st.subheader("Search Results:")
            st.markdown(results["body"], unsafe_allow_html=True)
        else:
            st.write("No matching results found in corpus.")
    else:
        st.write("Please enter a search query.")

In [None]:
replace_vars("app.py", user_pool_id, client_id, client_secret, kb_id, lambda_function_arn, dynamo_table)

#### Execute the streamlit locally
Execute the cell below to run the Streamlit application.

**Use the email and password of the doctors you defined at the top of the notebook to access the application.**

Once you have logged in, you can filter by specific patients you have assigned (dropdown in the left panel), or all to query the knowledge base. 

In [None]:
!streamlit run app.py

If you are executing this notebook on SageMaker Studio you can access the Streamlit application in the following url. 

```
https://<<STUDIOID>>.studio.<<REGION>>.sagemaker.aws/jupyterlab/default/proxy/8501/
```

If you are executing this notebook on a SageMaker Notebook you can access the Streamlit application in the following url. 

```
https://<<NOTEBOOKID>>.notebook.<<REGION>>.sagemaker.aws/proxy/8501/
```

In [None]:
https://rxhmevhu84g12we.studio.us-west-2.sagemaker.aws/jupyterlab/default/proxy/8501/

### 9. Clean up
**Before running this cell you will need to stop the cell above where the app is runnning!**

Run the following cell to delete the created resources and avoid unnecesary costs. This should take about 2-3 minutes to complete. 

In [None]:
# Delete all objects in the bucket
try:
    response = s3_client.list_objects_v2(Bucket=s3_bucket)
    if 'Contents' in response:
        for obj in response['Contents']:
            s3_client.delete_object(Bucket=s3_bucket, Key=obj['Key'])
        print(f"All objects in {s3_bucket} have been deleted.")
except Exception as e:
    print(f"Error deleting objects from {s3_bucket}: {e}")

# Define the stack names to delete
stack_names = ["KB-E2E-KB-{}".format(solution_id),"KB-E2E-Base-{}".format(solution_id)]

# Iterate over the stack names and delete each stack
for stack_name in stack_names:
    try:
        # Retrieve the stack information
        stack_info = cloudformation.describe_stacks(StackName=stack_name)
        stack_status = stack_info['Stacks'][0]['StackStatus']

        # Check if the stack exists and is in a deletable state
        if stack_status != 'DELETE_COMPLETE':
            # Delete the stack
            cloudformation.delete_stack(StackName=stack_name)
            print(f'Deleting stack: {stack_name}')

            # Wait for the stack deletion to complete
            waiter = cloudformation.get_waiter('stack_delete_complete')
            waiter.wait(StackName=stack_name)
            print(f'Stack {stack_name} deleted successfully.')
        else:
            print(f'Stack {stack_name} does not exist or has already been deleted.')

    except cloudformation.exceptions.ClientError as e:
        print(f'Error deleting stack {stack_name}: {e.response["Error"]["Message"]}')