# Knowledge Bases for Amazon Bedrock
## Access Control Filtering - End to end notebook

This notebook will guide the users on creating access controls for Knowledge Bases on Amazon Bedrock.

To demonstrate the access control capabilities enabled by metadata filtering in Knowledge Bases, let's consider a use case where you work at a large enterprise, AcmeCorp. At AcmeCorp we want to create a Knowledge Base containing content from various s3 buckets.  However, each user does not have access to all data. A RAG architecture is perfect for this use case since we can restrict the retrieval to only the documents we have access to.  

To complete this notebook you should have a role with access to the following services: Amazon S3, AWS STS, AWS Lambda, AWS CloudFormation, Amazon Bedrock, Amazon Cognito and Amazon Opensearch Serverless. 

This notebook contains the following sections:

0. **Base Infrastructure Deployment**: In this section you will deploy an Amazon Cloudformation Template which will create and configure some of the services used for the solution. 
1. **Amazon Cognito:** You are going to populate an Amazon Cognito pool with three users. We will use the unique identifiers generated by Cognito for each user to associate document corpus with the respective users.
2. **User-corpus association in Amazon DynamoDB:** You will populate an Amazon DynamoDB table which will store user-corpus associations. 
3. **Dataset download:** For this notebook you will use documents provided in an s3 bucket and stored in 3 different folders.
4. **Metadata association:** You will use the user identifiers generated by Cognito to create metadata files associated to each corpus.
5. **Create a Knowledge Base for Amazon Bedrock**: You will create and sync the Knowledge Base with the documents and associated metadata.
6. **Update AWS Lambda:** Until Boto3/Lambda is updated -- Create a Lambda Layer to include the latest SDK.
7. **Create and run a Streamlit Application:** You will create a simple interface to showcase access control with metadata filtering using a Streamlit application
8. **Clean up:** Delete all the resources created during this notebook to avoid unnecessary costs. 

In [1]:
!pip install -qU opensearch-py streamlit streamlit-cognito-auth retrying boto3 botocore

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
aiobotocore 2.13.1 requires botocore<1.34.132,>=1.34.70, but you have botocore 1.35.9 which is incompatible.
amazon-sagemaker-sql-magic 0.1.3 requires sqlparse==0.5.0, but you have sqlparse 0.5.1 which is incompatible.
autogluon-common 0.8.3 requires pandas<1.6,>=1.4.1, but you have pandas 2.1.4 which is incompatible.
autogluon-core 0.8.3 requires pandas<1.6,>=1.4.1, but you have pandas 2.1.4 which is incompatible.
autogluon-core 0.8.3 requires scikit-learn<1.4.1,>=1.1, but you have scikit-learn 1.4.2 which is incompatible.
autogluon-features 0.8.3 requires pandas<1.6,>=1.4.1, but you have pandas 2.1.4 which is incompatible.
autogluon-features 0.8.3 requires scikit-learn<1.4.1,>=1.1, but you have scikit-learn 1.4.2 which is incompatible.
autogluon-multimodal 0.8.3 requires pandas<1.6,>=1.4.1, but you have pan

Let's import necessary Python modules and libraries, and initialize AWS service clients required for the notebook.

In [1]:
import os
import json
import time
import uuid
import boto3
import requests
import random
from utilsmod import create_base_infrastructure, create_kb_infrastructure, updateDataAccessPolicy, createAOSSIndex, replace_vars
from botocore.exceptions import ClientError


s3_client = boto3.client('s3')
sts_client = boto3.client('sts')
session = boto3.session.Session()
region = session.region_name
lambda_client = boto3.client('lambda')
dynamodb_resource = boto3.resource('dynamodb')
cloudformation = boto3.client('cloudformation')
bedrock_agent_client = boto3.client('bedrock-agent')
bedrock = boto3.client("bedrock",region_name=region)
account_id = sts_client.get_caller_identity()["Account"]
cognito_client = boto3.client('cognito-idp', region_name=region)
identity_arn = session.client('sts').get_caller_identity()['Arn']
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime')

### 0. Base Infrastructure Deployment 
We have created for you an Amazon CloudFormation template which will automatically set up some of the services needed for this notebook.

This template will automatically create:
- Amazon Cognito User Pool and App Client. (user_pool_id, cognito_arn, client_id, client_secret)
- Amazon DynamoDB Table
- Amazon S3 Bucket
- AWS Lambda Function

<div class="alert alert-block alert-warning">
The deployment of the Amazon Cloudformation template should take around <b>1-2 minutes</b>.
    
You can also follow the deployment status in the Amazon Cloudformation console. 
</div>

In [2]:
def short_uuid():
    uuid_str = str(uuid.uuid4())
    return uuid_str[:8]

solution_id = 'KBS{}'.format(short_uuid()).lower()
user_pool_id, user_pool_arn, cognito_arn, client_id, client_secret, dynamo_table, s3_bucket, lambda_function_arn, collection_id = create_base_infrastructure(solution_id)

Creating stack KB-E2E-Base-kbse1d68b57 (arn:aws:cloudformation:us-west-2:431615879134:stack/KB-E2E-Base-kbse1d68b57/e1409800-6648-11ef-8162-02dad0048ad3)
Stack outputs:
User Pool ID: us-west-2_zidUml6yG
User Pool ARN: arn:aws:cognito-idp:us-west-2:431615879134:userpool/us-west-2_zidUml6yG
Cognito ARN: arn:aws:cognito-idp:us-west-2:431615879134:userpool/us-west-2_zidUml6yG
Client ID: 4l78s47crsfjiq7dsvl6m5ghvg
Client Secret: 1u8jcu923foi8rbvvjilbe4li1l36827buoojid9ota6uuc98485
DynamoDB Table: kbse1d68b57_user_corpus_list_association
S3 Bucket: kbse1d68b57-bucket
Lambda Arn: arn:aws:lambda:us-west-2:431615879134:function:kbse1d68b57-lambda-function
OpenSearchCollectionId: obdbwdsvnyky5elupqck


In [3]:
%store user_pool_id user_pool_arn cognito_arn client_id client_secret dynamo_table s3_bucket lambda_function_arn collection_id solution_id

Stored 'user_pool_id' (str)
Stored 'user_pool_arn' (str)
Stored 'cognito_arn' (str)
Stored 'client_id' (str)
Stored 'client_secret' (str)
Stored 'dynamo_table' (str)
Stored 's3_bucket' (str)
Stored 'lambda_function_arn' (str)
Stored 'collection_id' (str)
Stored 'solution_id' (str)


In [4]:
%store

Stored variables and their in-db values:
client_id                       -> '4l78s47crsfjiq7dsvl6m5ghvg'
client_secret                   -> '1u8jcu923foi8rbvvjilbe4li1l36827buoojid9ota6uuc98
cognito_arn                     -> 'arn:aws:cognito-idp:us-west-2:431615879134:userpo
collection_id                   -> 'obdbwdsvnyky5elupqck'
corpus_ids                      -> ['fa1c3635-fc7e-4249-91da-d828a20f83bc', '6394b9f6
datasource_id                   -> '5LCFZHJZRJ'
dynamo_table                    -> 'kbse1d68b57_user_corpus_list_association'
indexName                       -> 'kb-acl-index-kbs27be3efe'
kb_id                           -> '5GFND4H4X4'
lambda_function_arn             -> 'arn:aws:lambda:us-west-2:431615879134:function:kb
s3_bucket                       -> 'kbse1d68b57-bucket'
solution_id                     -> 'kbse1d68b57'
user_ids                        -> ['58f1e390-8071-705e-f71b-7406907b1d56', '08b1b3e0
user_pool_arn                   -> 'arn:aws:cognito-idp:us-west-2:

### 1. Amazon Cognito User Pool: Users and Corpus
#### Create users and corpus into the user pool
We will create users and corpus to test out the use case. User ids are stored for later use when retrieving information.
For the notebook to work you will need to replace the placeholder for 2 doctors and 3 patients. This users will be created in the Amazon Cognito user pool and you will later need them to log into the web application. While this is a dummy user creation for test purposes, in production use cases you will need to follow you organization best practices and guidelines to create users. 

**For this example, the first doctor will have associated the first two patients, and the second doctor will have associated the third patient.** 

<div class="alert alert-block alert-warning">
<b>Warning:</b> 
<br><b>Password minimum length:</b>8 character(s)
<br><b>Password requirements</b>
<br>Contains at least 1 number
<br>Contains at least 1 special character
<br>Contains at least 1 uppercase letter
<br>Contains at least 1 lowercase letter
</div>

In [5]:
users = [
    {
        'name': 'Highway Harry',
        'email': 'highway.harry@acmecorp.com',
        'password': 'Highway.Harry.123$',
        'corpus': ['highway']
    },
    {
        'name': 'Wildlife Walter',
        'email': 'wildlife.walter@acmecorp.com',
        'password': 'Wildlife.Walter.123$',
        'corpus': ['wildlife']
    },
    {
        'name': 'Admin Amy',
        'email': 'admin.amy@acmecorp.com',
        'password': 'Admin.Amy.123$',
        'corpus': ['highway', 'wildlife']
    },
]

corpus = [
    {
        'name': 'highway',
        'description': 'document regarding highway and roadsign regulations',
        's3path': f"s3://{s3_bucket}/highway/"
    },
    {
        'name': 'wildlife',
        'description': 'documents regarding fishing and hunting regulations',
        's3path': f's3://{s3_bucket}/wildlife/'
    },

]

In [6]:

user_ids = []
corpus_ids = []

def create_user(user_data, user_type):
    user_ids = []
    for user in user_data:
        response = cognito_client.admin_create_user(
            UserPoolId=user_pool_id,
            Username=user['email'],
            UserAttributes=[
                {'Name': 'name', 'Value': user['name']},
                {'Name': 'email', 'Value': user['email']},
                {'Name': 'email_verified', 'Value': 'true'}
            ],
            ForceAliasCreation=False,
            MessageAction='SUPPRESS'
        )
        cognito_client.admin_set_user_password(
            UserPoolId=user_pool_id,
            Username=user['email'],
            Password=user['password'],
            Permanent=True
        )
        print(f"{user_type.capitalize()} created:", response['User']['Username'])
        print(f"{user_type.capitalize()} id:", response['User']['Attributes'][3]['Value'])
        user_ids.append(response['User']['Attributes'][3]['Value'])
    return user_ids

user_ids = create_user(users, 'user')
corpus_ids = [str(uuid.uuid4()) for c in corpus]

print("User IDs:", user_ids)
print("Corpus IDs:", corpus_ids)

%store user_ids corpus_ids

User created: highway.harry@acmecorp.com
User id: f801f330-1031-7092-9be5-1ae68bb86d28
User created: wildlife.walter@acmecorp.com
User id: b8811320-9011-7085-6116-3918c3844836
User created: admin.amy@acmecorp.com
User id: d8d193a0-b051-7025-4c2e-901e68575ab1
User IDs: ['f801f330-1031-7092-9be5-1ae68bb86d28', 'b8811320-9011-7085-6116-3918c3844836', 'd8d193a0-b051-7025-4c2e-901e68575ab1']
Corpus IDs: ['12ce2ffe-16bc-4de0-8ee0-a8e527a97729', '2a0b6654-2ac9-4fd2-b782-f8f190ecb3d4']
Stored 'user_ids' (list)
Stored 'corpus_ids' (list)


### 2. User-corpus association in DynamoDB
In this section we will populate the already created DynamoDB table with the user-corpus associations. This will be useful later on to retrieve the list of corpus ids a user is allowed to filter by. *

In [7]:
table = dynamodb_resource.Table(dynamo_table)
corpus_mapping = [entry['name'] for entry in corpus]
with table.batch_writer() as batch:
    for j,users in enumerate(users):
        temp = []
        for i,c in enumerate(corpus_mapping):
            if c in users['corpus']:
                temp.append(corpus_ids[i])

        batch.put_item(
            Item={
                'user_id': user_ids[j],
                'corpus_id_list': temp
            }
        )

print('Data inserted successfully!')

Data inserted successfully!


In [8]:
!aws s3 cp ./source_transcripts/ s3://{s3_bucket}/ --recursive

upload: source_transcripts/highway/23 CFR Part 655 (up to date as of 8-21-2024).pdf to s3://kbse1d68b57-bucket/highway/23 CFR Part 655 (up to date as of 8-21-2024).pdf
upload: source_transcripts/wildlife/Loon - Wikipedia.pdf to s3://kbse1d68b57-bucket/wildlife/Loon - Wikipedia.pdf
upload: source_transcripts/wildlife/50 CFR Part 13 (up to date as of 8-21-2024).pdf to s3://kbse1d68b57-bucket/wildlife/50 CFR Part 13 (up to date as of 8-21-2024).pdf


In [9]:
# Loop through the corpus and their corresponding IDs
for i, corpus_entry in enumerate(corpus):
    corpus_id = corpus_ids[i]
    s3path = corpus_entry['s3path']
    
    # Get bucket and prefix
    # Remove 's3://' and split bucket and prefix
    path_parts = s3path.replace('s3://', '').split('/', 1)
    bucket = path_parts[0]
    prefix = path_parts[1] if len(path_parts) > 1 else ''
    
    # List all files in the S3 folder
    response = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix)
    if 'Contents' in response:
        files = [obj['Key'] for obj in response['Contents'] if obj['Key'] != prefix]
    else:
        files = []
    
    for file in files:
        metadata = {
            "metadataAttributes": {
                "corpus_id": corpus_id
            }
        }

        # Upload metadata file to S3
        s3_client.put_object(
            Bucket=bucket,
            Key=f"{file}.metadata.json",
            Body=json.dumps(metadata, indent=4),
            ContentType='application/json'
        )

### 5. Upload to Amazon S3
Knowledge Bases for Amazon Bedrock, currently require data to reside in an Amazon S3 bucket. We will upload both files and metadata files.

### 6. Create a Knowledge Base for Amazon Bedrock

In this section we will go through all the steps to create and test a Knowledge Base. 

In [10]:
indexName = "kb-acl-index-" + solution_id
print("Index name:",indexName)
%store indexName

Index name: kb-acl-index-kbse1d68b57
Stored 'indexName' (str)


In [11]:
updateDataAccessPolicy(solution_id) # Adding the current role to the collection's data access policy
time.sleep(60) # Changes to the data access policy might take a bit to update
createAOSSIndex(indexName, region, collection_id) # Create the AOSS index

{'accessPolicyDetail': {'createdDate': 1724964874088, 'description': 'dataAccessPolicy', 'lastModifiedDate': 1724964931661, 'name': 'kbse1d68b57-kbcollection-access', 'policy': [{'Rules': [{'Resource': ['collection/kbse1d68b57-kbcollection'], 'Permission': ['aoss:CreateCollectionItems', 'aoss:UpdateCollectionItems', 'aoss:DescribeCollectionItems'], 'ResourceType': 'collection'}, {'Resource': ['index/kbse1d68b57-kbcollection/*'], 'Permission': ['aoss:CreateIndex', 'aoss:DescribeIndex', 'aoss:ReadDocument', 'aoss:WriteDocument', 'aoss:UpdateIndex', 'aoss:DeleteIndex'], 'ResourceType': 'index'}], 'Principal': ['arn:aws:iam::431615879134:role/kbse1d68b57-kbrole', 'arn:aws:sts::431615879134:assumed-role/namer-summit-2024/SageMaker']}], 'policyVersion': 'MTcyNDk2NDkzMTY2MV8y', 'type': 'data'}, 'ResponseMetadata': {'RequestId': '54a2a811-5539-44da-a838-bd743c3bcc6f', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '54a2a811-5539-44da-a838-bd743c3bcc6f', 'date': 'Thu, 29 Aug 2024 20

#### Create the Knowledge Base
In this section you will create the Knowledge Base. Before creating a new KB we need to define which embeddings model we want it to use. In this case we will be using Amazon Titan Embeddings V2. 

<div class="alert alert-block alert-warning">
<b>Warning:</b> Make sure you have enabled Amazon Titan Embeddings V2 access in the Amazon Bedrock Console (model access). 
</div>

In [12]:
embeddingModelArn = "arn:aws:bedrock:{}::foundation-model/amazon.titan-embed-text-v2:0".format(region)

Now we can create our Knowledge Base for Amazon Bedrock. We have created an Amazon CloudFormation template which takes care of the configuration needed.

<div class="alert alert-block alert-warning">
The deployment of the Amazon Cloudformation template should take around <b>1-2 minutes</b>.
    
You can also follow the deployment status in the Amazon Cloudformation console. 
</div>

In [13]:
kb_id, datasource_id = create_kb_infrastructure(solution_id, s3_bucket, embeddingModelArn, indexName, region, account_id, collection_id)

Stack creation initiated: arn:aws:cloudformation:us-west-2:431615879134:stack/KB-E2E-KB-kbse1d68b57/2ba11f00-6649-11ef-ac3a-06b5122181c5
KBID: ELSQWLXZWS
DS: ELSQWLXZWS|ERK6UW1AKX


In [14]:
%store kb_id datasource_id

Stored 'kb_id' (str)
Stored 'datasource_id' (str)


#### Sync the Knowledge Base
As we have created and associated the data source to the Knowledge Base, we can proceed to Sync the data. 


Each time you add, modify, or remove files from the S3 bucket for a data source, you must sync the data source so that it is re-indexed to the knowledge base. Syncing is incremental, so Amazon Bedrock only processes the objects in your S3 bucket that have been added, modified, or deleted since the last sync.

In [15]:
ingestion_job_response = bedrock_agent_client.start_ingestion_job(
    knowledgeBaseId=kb_id,
    dataSourceId=datasource_id,
    description='Initial Ingestion'
)

In [16]:
status = bedrock_agent_client.get_ingestion_job(
    knowledgeBaseId=ingestion_job_response["ingestionJob"]["knowledgeBaseId"],
    dataSourceId=ingestion_job_response["ingestionJob"]["dataSourceId"],
    ingestionJobId=ingestion_job_response["ingestionJob"]["ingestionJobId"]
)["ingestionJob"]["status"]
print(status)
while status not in ["COMPLETE", "FAILED", "STOPPED"]:
    status = bedrock_agent_client.get_ingestion_job(
        knowledgeBaseId=ingestion_job_response["ingestionJob"]["knowledgeBaseId"],
        dataSourceId=ingestion_job_response["ingestionJob"]["dataSourceId"],
        ingestionJobId=ingestion_job_response["ingestionJob"]["ingestionJobId"]
    )["ingestionJob"]["status"]
    print(status)
    time.sleep(30)
print("Waiting for changes to take place in the vector database")
time.sleep(30) # Wait for all changes to take place

STARTING
STARTING
COMPLETE
Waiting for changes to take place in the vector database


#### Test the Knowledge Base

Now the Knowlegde Base is available we can test it out using the **retrieve** and **retrieve_and_generate** APIs.

Let's examine a test case with patient 0's transcript, where they mention a cat named Kelly. We'll query the knowledge base using the metadata filter for patient 0 to retrieve information about Kelly. Changing the patient_id will prevent the model from responding accurately. Read through the PDFs for other questions you might want to ask. 

In this first example we are going to use the **retrieve and generate API**. This API queries a knowledge base and generates responses based on the retrieved results, using an LLM.

<div class="alert alert-block alert-warning">
<b>Warning:</b> Make sure you have enabled Anthropic Claude 3 Sonnet access in the Amazon Bedrock Console (model access). 
</div>

In [17]:
# retrieve and generate API
response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": "Which office do I submit for golden eagle permits?"
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0".format(region),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5,
                    "filter": {
                        "equals": {
                            "key": "corpus_id",
                            "value": corpus_ids[1]
                        }
                    }
                } 
            }
        }
    }
)

print(response['output']['text'],end='\n'*2)

To obtain permits for golden eagle activities such as scientific collecting, exhibition, religious use, depredation, nest take, and incidental take, you should submit your application to the "Migratory Bird Permit Program Office" in the region where you reside. The addresses for the regional offices can be found at 50 CFR 2.2 or on the U.S. Fish and Wildlife Service website.



In this second example we are going to use the **retrieve API**. This API queries the knowledge base and retrieves relavant information from it, it does not generate the response.

In [18]:
response_ret = bedrock_agent_runtime_client.retrieve(
    knowledgeBaseId=kb_id, 
    nextToken='string',
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "numberOfResults":3,
            "filter": {
                 "equals": {
                    "key": "corpus_id",
                    "value": corpus_ids[1]
                        }
                    }
                } 
            },
    retrievalQuery={
        'text': "Which office do I submit for golden eagle permits?"
            
        }
)

def response_print(retrieve_resp):
#structure 'retrievalResults': list of contents
# each list has content,location,score,metadata
    for num,chunk in enumerate(response_ret['retrievalResults'],1):
        print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
        print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
        print(f'Chunk {num} Score: ',chunk['score'],end='\n'*2)
        print(f'Chunk {num} Metadata: ',chunk['metadata'],end='\n'*2)

response_print(response_ret)

Chunk 1:  Endangered Species Act permit applications for the import or export of native endangered and threatened species may be obtained from the Division of Management Authority in accordance with paragraph (b)(3) of this section.   (5) You may obtain applications for bald and golden eagle permits (50 CFR part 22) and migratory bird permits (50 CFR part 21), except for banding and marking permits, from, and you may submit completed applications to, the “Migratory Bird Permit Program Office” in the Region in which you reside. For addresses of the regional offices, see 50 CFR 2.2, or go to: http://www.fws.gov/ migratorybirds/mbpermits/Addresses.html.   (c) Time notice. The Service will process all applications as quickly as possible. However, we cannot guarantee final action within the time limit you request. You should ensure that applications for permits for marine mammals and/or endangered and threatened species are postmarked at least 90 calendar days prior to the requested effecti

### 7. Add Lambda Layer
At the time of developing this notebook, the latest Boto3 version available in Lambda with Python 3.12 does not include metadata filtering capabilities. To solve this, we will create and attach an AWS Lambda Layer with the latest Boto3 version.

For this section to run you will need the **zip** package to by installed at the system level.

You can check if zip is installed running the following command: !zip

If it is not installed you will need to install it using the appropriate package manager (apt-get for Debian-based systems or yum for RHEL-based systems for example).

In [19]:
# can we have the lambda layer already attached to the lambda function?

In [20]:
#!zip
!sudo apt-get install zip -y # Debian-based systems 
#!sudo yum install zip -y # RHEL-based systems

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
zip is already the newest version (3.0-12build2).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.


In [21]:
!mkdir latest-sdk-layer
%cd latest-sdk-layer
!pip install -qU boto3 botocore -t python/lib/python3.12/site-packages/
!zip -rq latest-sdk-layer.zip .
%cd ..

/home/sagemaker-user/namer-summit-2024-genAI-privacy/latest-sdk-layer
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
aiobotocore 2.13.1 requires botocore<1.34.132,>=1.34.70, but you have botocore 1.35.9 which is incompatible.
amazon-sagemaker-sql-magic 0.1.3 requires sqlparse==0.5.0, but you have sqlparse 0.5.1 which is incompatible.
autogluon-common 0.8.3 requires pandas<1.6,>=1.4.1, but you have pandas 2.1.4 which is incompatible.
autogluon-core 0.8.3 requires pandas<1.6,>=1.4.1, but you have pandas 2.1.4 which is incompatible.
autogluon-core 0.8.3 requires scikit-learn<1.4.1,>=1.1, but you have scikit-learn 1.4.2 which is incompatible.
autogluon-features 0.8.3 requires pandas<1.6,>=1.4.1, but you have pandas 2.1.4 which is incompatible.
autogluon-features 0.8.3 requires scikit-learn<1.4.1,>=1.1, but you have scikit-learn 1.4.2 which is incompatible.
au

In [22]:
def publish_lambda_layer(layer_name, description, zip_file_path, compatible_runtimes):
    with open(zip_file_path, 'rb') as f:
        response = lambda_client.publish_layer_version(
            LayerName=layer_name,
            Description=description,
            Content={
                'ZipFile': f.read(),
            },
            CompatibleRuntimes=compatible_runtimes
        )
    return response['LayerVersionArn']

In [23]:
layer_name = 'latest-sdk-layer'
description = 'Layer with the latest boto3 version.'
zip_file_path = 'latest-sdk-layer/latest-sdk-layer.zip'
compatible_runtimes = ['python3.12']

In [24]:
layer_version_arn = publish_lambda_layer(layer_name, description, zip_file_path, compatible_runtimes)
print("Layer version ARN:", layer_version_arn)

Layer version ARN: arn:aws:lambda:us-west-2:431615879134:layer:latest-sdk-layer:5


In [25]:
try:
    # Add the layer to the Lambda function
    lambda_client.update_function_configuration(
        FunctionName=lambda_function_arn,
        Layers=[layer_version_arn]
    )
    print("Layer added to the Lambda function successfully.")

except ClientError as e:
    print(f"Error adding layer to Lambda function: {e.response['Error']['Message']}")
    
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Layer added to the Lambda function successfully.


### 8. Create Streamlit Application
To showcase the interaction between doctors and the Knowledge Bases, we can develop a user-friendly web application using Streamlit for testing purposes, a popular open-source Python library for building interactive data apps. Streamlit provides a simple and intuitive way to create custom interfaces that can seamlessly integrate with the various AWS services involved in this solution.

Here is the application, **don't modify the placeholders, we will replace them in the next cell.** 

In [26]:
%%writefile app.py
import os
import boto3
import json
import requests
import streamlit as st
from streamlit_cognito_auth import CognitoAuthenticator

pool_id = "<<replace_pool_id>>"
app_client_id = "<<replace_app_client_id>>"
app_client_secret = "<<replace_app_client_secret>>"
kb_id = "<<replace_kb_id>>"
lambda_function_arn = '<<replace_lambda_function_arn>>'
dynamo_table = '<<replace_dynamo_table_name>>'

authenticator = CognitoAuthenticator(
    pool_id=pool_id,
    app_client_id=app_client_id,
    app_client_secret= app_client_secret,
    use_cookies=False
)

is_logged_in = authenticator.login()

if not is_logged_in:
    st.stop()

def logout():
    authenticator.logout()

def get_user_sub(user_pool_id, username):
    cognito_client = boto3.client('cognito-idp')
    try:
        response = cognito_client.admin_get_user(
            UserPoolId=pool_id,
            Username=authenticator.get_username()
        )
        sub = None
        for attr in response['UserAttributes']:
            if attr['Name'] == 'sub':
                sub = attr['Value']
                break
        return sub
    except cognito_client.exceptions.UserNotFoundException:
        print("User not found.")
        return None

def get_corpus_ids(user_id):
    dynamodb = boto3.client('dynamodb')
    response = dynamodb.query(
        TableName=dynamo_table,
        KeyConditionExpression='user_id = :user_id',
        ExpressionAttributeValues={
            ':user_id': {'S': user_id}
        }
    )
    print(response)
    corpus_id_list = []  # Initialize the list
    for item in response['Items']:
        corpus_ids = item.get('corpus_id_list', {}).get('L', [])
        corpus_id_list.extend([corpus_id['S'] for corpus_id in corpus_ids])
    return corpus_id_list

def search_transcript(user_id, kb_id, text, corpus_ids):
    # Initialize the Lambda client
    lambda_client = boto3.client('lambda')

    # Payload for the Lambda function
    payload = json.dumps({
        "userId": sub,
        "knowledgeBaseId": kb_id,
        "text": text, 
        "corpusIds": corpus_ids
    }).encode('utf-8')

    try:
        # Invoke the Lambda function
        response = lambda_client.invoke(
            FunctionName=lambda_function_arn,
            InvocationType='RequestResponse',
            Payload=payload
        )

        # Process the response
        if response['StatusCode'] == 200:
            response_payload = json.loads(response['Payload'].read().decode('utf-8'))
            return response_payload
        else:
            # Handle error response
            return {'error': 'Failed to fetch data'}

    except Exception as e:
        # Handle exception
        return {'error': str(e)}

sub = get_user_sub(pool_id, authenticator.get_username())
print(sub)
corpus_ids = get_corpus_ids(sub)
print(corpus_ids)

# Application Front

with st.sidebar:
    st.header("User Information")
    st.markdown("## User")
    st.text(authenticator.get_username())
    st.markdown("## User Id")
    st.text(sub)
    # selected_patient = st.selectbox("Select a patient (or 'All' for all patients)", ['All'] + patient_ids)
    st.button("Logout", "logout_btn", on_click=logout)

st.header("Corpus Search Tool")

# Text input for the search query
query = st.text_input("Enter your search query:")

if st.button("Search"):
    if query:
        # Perform search
        corpus_ids_filter = corpus_ids
        results = search_transcript(sub, kb_id, query, corpus_ids_filter)
        print(results)
        if results:
            st.subheader("Search Results:")
            st.markdown(results["body"], unsafe_allow_html=True)
        else:
            st.write("No matching results found in corpus.")
    else:
        st.write("Please enter a search query.")

Overwriting app.py


In [27]:
replace_vars("app.py", user_pool_id, client_id, client_secret, kb_id, lambda_function_arn, dynamo_table)

#### Execute the streamlit locally
Execute the cell below to run the Streamlit application.

**Use the email and password of the doctors you defined at the top of the notebook to access the application.**

Once you have logged in, you can filter by specific patients you have assigned (dropdown in the left panel), or all to query the knowledge base. 

In [28]:
!streamlit run app.py


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://169.255.255.2:8501[0m
[34m  External URL: [0m[1mhttp://34.213.171.250:8501[0m
[0m
^C
[34m  Stopping...[0m


If you are executing this notebook on SageMaker Studio you can access the Streamlit application in the following url. 

```
https://<<STUDIOID>>.studio.<<REGION>>.sagemaker.aws/jupyterlab/default/proxy/8501/
```

If you are executing this notebook on a SageMaker Notebook you can access the Streamlit application in the following url. 

```
https://<<NOTEBOOKID>>.notebook.<<REGION>>.sagemaker.aws/proxy/8501/
```

In [None]:
https://rxhmevhu84g12we.studio.us-west-2.sagemaker.aws/jupyterlab/default/proxy/8501/

### 9. Clean up
**Before running this cell you will need to stop the cell above where the app is runnning!**

Run the following cell to delete the created resources and avoid unnecesary costs. This should take about 2-3 minutes to complete. 

In [29]:
# Delete all objects in the bucket
try:
    response = s3_client.list_objects_v2(Bucket=s3_bucket)
    if 'Contents' in response:
        for obj in response['Contents']:
            s3_client.delete_object(Bucket=s3_bucket, Key=obj['Key'])
        print(f"All objects in {s3_bucket} have been deleted.")
except Exception as e:
    print(f"Error deleting objects from {s3_bucket}: {e}")

# Define the stack names to delete
stack_names = ["KB-E2E-KB-{}".format(solution_id),"KB-E2E-Base-{}".format(solution_id)]

# Iterate over the stack names and delete each stack
for stack_name in stack_names:
    try:
        # Retrieve the stack information
        stack_info = cloudformation.describe_stacks(StackName=stack_name)
        stack_status = stack_info['Stacks'][0]['StackStatus']

        # Check if the stack exists and is in a deletable state
        if stack_status != 'DELETE_COMPLETE':
            # Delete the stack
            cloudformation.delete_stack(StackName=stack_name)
            print(f'Deleting stack: {stack_name}')

            # Wait for the stack deletion to complete
            waiter = cloudformation.get_waiter('stack_delete_complete')
            waiter.wait(StackName=stack_name)
            print(f'Stack {stack_name} deleted successfully.')
        else:
            print(f'Stack {stack_name} does not exist or has already been deleted.')

    except cloudformation.exceptions.ClientError as e:
        print(f'Error deleting stack {stack_name}: {e.response["Error"]["Message"]}')

All objects in kbse1d68b57-bucket have been deleted.
Deleting stack: KB-E2E-KB-kbse1d68b57


KeyboardInterrupt: 