<center>
    <p style="text-align:center">
        <img width="500" height="120" src="https://privacera.com/wp-content/uploads/2022/11/logo.png" alt="Privacera" class="site-logo__image entered lazyloaded" data-lazy-src="https://privacera.com/wp-content/uploads/2022/11/logo.png" data-ll-status="loaded">
        <br>
        <a href="https://privacera.com/products/ai-governance/">Visit Us</a>
    </p>
</center>

# Privacera AI Governance - Quick Start using OpenSearch VectorDB and Bedrock

This notebook shows how to use Privacera Shield Library with a LangChain application that uses OpenSearch Vector Database. To run this notebook you will need the following,




# 1. First sign up for a free account at [Privacera AI Governance (PAIG)](https://privacera.ai)

# 2. Set the required variables

In [None]:
# Vector database raw data location detaols
S3_BUCKET = "<SET YOUR BUCKET"  # E.g. your-vectordb-bucket
S3_OBJECT_PREFIX = "vectordb-raw-data" # E.g. vectordb-raw-data

# Bedrock model and aws region. PLEASE CHANGE these according to your preference
AWS_REGION = "us-east-1"
EMBEDDING_MODEL_NAME = "amazon.titan-embed-text-v1"
LLM_MODEL_NAME = "amazon.titan-tg1-large"

# 3. Install the Python packages
This will take several seconds, upto a minute.

In [None]:
!pip -q install  \
  privacera_securechat==1.0.2 \
  unstructured==0.14.4 --no-warn-conflicts

# 4. Create sample documents in a folder

Creating some sample documents in a folder named raw_data that will be loaded copied to S3. 
x10.txt - Contains existing product specification
x11.txt - Contains the specification of the product which is under development. This is highly classified data
x10-salesdata.txt - Sales number for the product x10. Only Sales team have access to it
customer-feedback.txt - Customer feedback which contains PII data. Only few people can access see PII data


In [None]:
import os
import warnings
warnings.filterwarnings('ignore')

def create_raw_data():
    raw_data_dir = "raw_data"

    file_contents = {
        "x10.txt": """
Product Specification Sheet of x10
Display: Size and resolution - 6.5" AMOLED, 120Hz refresh rate
Processor: Model name  Snapdragon 8 Gen 1
RAM: Options 8GB/12GB
Storage: Options 128GB/256GB
Camera: rear camera system with multiple lenses, front-facing camera
Battery: Capacity 5000mAh
Operating System: Version Android 13
Key Features: long battery life, fast performance, high-quality camera
        """
        , "x11.txt": """
Product Specification Sheet of x11
Display: Size and resolution - 7.5" AMOLED, 360Hz refresh rate
Processor: Model name  Snapdragon 10 Gen 3
RAM: Options 16GB/24GB
Storage: Options 256GB/512GB
Camera: 360 camera system with multiple lenses, front-facing camera
Battery: Capacity 10000mAh
Operating System: Version Android 13
Key Features: super long battery life, ultra fast performance, 360 camera
        """
        , "x10-salesdata.txt": """
Sales Data for X10 Model:
Monthly Sales Report (Internal)
Region	Units Sold	Revenue
North America	20,000	$10,000,000
Europe	15,000	$7,500,000
Asia Pacific	10,000	$5,000,000
Total	45,000	$22,500,000
    """
        , "customer-feedback.txt": """
Customer Feedback Analysis - X10 Model

Positive Feedback for X10 Model:

"The X10's battery life is amazing! I can finally ditch the portable charger."

Sarah Jones, Busy Professional
Email: sarah.jones@samplemail.com
Phone: (123) 456-7890
"The camera takes crystal-clear pictures, even in low-light conditions. Perfect for capturing memories on the go!"

David Lee, Travel Blogger
Email: david.lee@travelblogger.com
Phone: (234) 567-8901
"The phone's design is sleek and feels luxurious in hand. The user interface is user-friendly and easy to navigate, even for non-tech-savvy users like me."

Emily Garcia, Teacher
Email: emily.garcia@schoolmail.com
Phone: (345) 678-9012

Areas for Improvement for X10 Model:

"The phone is a bit bulky for one-handed use. It can be challenging to reach the top of the screen comfortably."

Michael Chen, Gamer
Email: michael.chen@gamermail.com
Phone: (456) 789-0123
"I've encountered a few minor software bugs that require restarting the phone. Hopefully, future updates will address these."

Olivia Rodriguez, Social Media Manager
Email: olivia.rodriguez@socialhub.com
Phone: (567) 890-1234
"The current storage options are a bit limiting for someone who stores a lot of photos and videos. A higher storage tier or microSD card support would be ideal."

William Smith, Content Creator
Email: william.smith@creatorhub.com
Phone: (678) 901-2345

Feature Requests for X10 Model:

"Wireless charging would be a fantastic addition for convenience. No more fumbling with cables!" (Multiple Users)
"A built-in fingerprint sensor would be a welcome security feature for added peace of mind." (Several Users)
"The ability to expand storage with a microSD card would be incredibly helpful for users who need more space." (Content Creators & Photographers)
"""
    }

    os.makedirs(raw_data_dir, exist_ok=True)

    for file_path, content in file_contents.items():
        file_path_with_dir = raw_data_dir + "/" + file_path
        with open(file_path_with_dir, 'w') as file:
            file.write(content)

    print("Raw data created successfully.")


create_raw_data()

# 5. Associate metadata with the documents

The access permissions and classifications comes from the source data repository. For this exercise, we will add them via script to the S3 object Meta Data.

Here, we create a custom loader class that will add additional metadata for each *document* in the collection. For each document, we have list of users who are allowed to access the document, a list of groups that are allowed to access the document and additional metadata such as location (country) associated with the document.

We will use the users, groups and country attribute to filter the documents based upon the user querying the vector database.

In [None]:
import json

# Define raw data metadata information
file_metadata = [
    {
        "file": "x10.txt",
        "users": ["sally", "peter", "emily", "mark"],
        "groups": [],
        "metadata": {"file_name": "x10.txt"}
    },
    {
        "file": "x11.txt",
        "users": ["mark", "peter"],
        "groups": [],
        "metadata": {"SECURITY_LEVEL": "CONFIDENTIAL", "file_name": "x11.txt"}
    },
    {
        "file": "x10-salesdata.txt",
        "users": ["sally"],
        "groups": ["Sales"],
        "metadata": {"file_name": "x10-salesdata.txt"}
    },
    {
        "file": "customer-feedback.txt",
        "users": ["emily", "sally", "peter", "mark"],
        "groups": ["Sales"],
        "metadata": {"file_name": "customer-feedback.txt"}
    }
]

# Define output JSON file path
output_file_path = "raw_data_metadata_details.json"

# Write file metadata to JSON file
with open(output_file_path, "w") as json_file:
    json.dump(file_metadata, json_file, indent=4)

print(f"Raw data metadata written to {output_file_path}")

# 6. Set the details about OpenSearch, Bedrock and AWS S3

In [None]:
# OpenSearch vector database details
OPENSEARCH_HOST = "opensearch-node1"
OPENSEARCH_PORT = 9200
OPENSEARCH_USE_SSL = True
OPENSEARCH_VERIFY_CERTS = False
OPENSEARCH_USERNAME = "admin"
OPENSEARCH_PASSWORD = "admin"
OPENSEARCH_INDEX = "paig_quickstart_vector_index"

OPENSEARCH_PROTOCOL = "https://" if OPENSEARCH_USE_SSL else "http://"
OPENSEARCH_ENDPOINT = OPENSEARCH_PROTOCOL + OPENSEARCH_HOST + ":" + str(OPENSEARCH_PORT)



# 7. Upload raw data along with metadata to S3 location
Ensure your system has the necessary permissions to access the S3 bucket where you're uploading the files.

In [None]:
import boto3
import json


def upload_raw_data_to_s3(bucket_name: str, object_prefix: str, raw_data_location: str, raw_data_details_file_path: str):
    """
    Update metadata of files in S3 based on details from a JSON file.

    :param bucket_name: The name of the S3 bucket.
    :param object_prefix: The prefix for S3 object keys.
    :param raw_data_location: The local directory path where raw data files are located.
    """
    # Load JSON data from raw_data_details.json
    with open(raw_data_details_file_path) as raw_data_details_file:
        data = json.load(raw_data_details_file)

        # Initialize S3 client
        s3 = boto3.client('s3')

        # Iterate through each item in the data
        for item in data:
            file_name = item['file']
            users = item['users']
            groups = item['groups']
            metadata = item['metadata']

            # Read existing file content from local directory
            file_content = ""
            with open(raw_data_location + "/" + file_name, 'r') as file:
                file_content = file.read()

            # Upload the file back to S3 with updated metadata
            s3.put_object(
                Bucket=bucket_name,
                Key=object_prefix + "/" + file_name,
                Body=file_content.encode('utf-8'),
                Metadata={
                    'users': json.dumps(users),
                    'groups': json.dumps(groups),
                    'metadata': json.dumps(metadata)
                },
                ContentType='text/plain'
            )

            print(f"Uploaded file: {file_name}")

        print("All files uploaded with metadata successfully.")

upload_raw_data_to_s3(S3_BUCKET, S3_OBJECT_PREFIX, "raw_data", "raw_data_metadata_details.json")
print(f"https://{AWS_REGION}.console.aws.amazon.com/s3/object/{S3_BUCKET}?region={AWS_REGION}&bucketType=general&prefix={S3_OBJECT_PREFIX}/x10.txt")

# 8. Load the sample documents into OpenSearch vector database
Now the sample documents are loaded into OpenSearch vector database using LangChain and Bedrock embedding API.

In [None]:
import os
import tempfile
import json
import botocore.client

from langchain.schema import Document
from typing import List, Optional, Union, Callable, Any

from langchain_community.document_loaders.s3_directory import BaseLoader
from langchain_community.document_loaders.unstructured import UnstructuredBaseLoader


class PrivaceraS3FileLoader(UnstructuredBaseLoader):
    """Load from `Amazon AWS S3` file."""

    def __init__(
        self,
        bucket: str,
        key: str,
        *,
        region_name: Optional[str] = None,
        api_version: Optional[str] = None,
        use_ssl: Optional[bool] = True,
        verify: Union[str, bool, None] = None,
        endpoint_url: Optional[str] = None,
        aws_access_key_id: Optional[str] = None,
        aws_secret_access_key: Optional[str] = None,
        aws_session_token: Optional[str] = None,
        boto_config: Optional[botocore.client.Config] = None,
        mode: str = "single",
        post_processors: Optional[List[Callable]] = None,
        **unstructured_kwargs: Any,
    ):
        """Initialize with bucket and key name.

        :param bucket: The name of the S3 bucket.
        :param key: The key of the S3 object.

        :param region_name: The name of the region associated with the client.
            A client is associated with a single region.

        :param api_version: The API version to use.  By default, botocore will
            use the latest API version when creating a client.  You only need
            to specify this parameter if you want to use a previous API version
            of the client.

        :param use_ssl: Whether or not to use SSL.  By default, SSL is used.
            Note that not all services support non-ssl connections.

        :param verify: Whether or not to verify SSL certificates.
            By default SSL certificates are verified.  You can provide the
            following values:

            * False - do not validate SSL certificates.  SSL will still be
              used (unless use_ssl is False), but SSL certificates
              will not be verified.
            * path/to/cert/bundle.pem - A filename of the CA cert bundle to
              uses.  You can specify this argument if you want to use a
              different CA cert bundle than the one used by botocore.

        :param endpoint_url: The complete URL to use for the constructed
            client.  Normally, botocore will automatically construct the
            appropriate URL to use when communicating with a service.  You can
            specify a complete URL (including the "http/https" scheme) to
            override this behavior.  If this value is provided, then
            ``use_ssl`` is ignored.

        :param aws_access_key_id: The access key to use when creating
            the client.  This is entirely optional, and if not provided,
            the credentials configured for the session will automatically
            be used.  You only need to provide this argument if you want
            to override the credentials used for this specific client.

        :param aws_secret_access_key: The secret key to use when creating
            the client.  Same semantics as aws_access_key_id above.

        :param aws_session_token: The session token to use when creating
            the client.  Same semantics as aws_access_key_id above.

        :type boto_config: botocore.client.Config
        :param boto_config: Advanced boto3 client configuration options. If a value
            is specified in the client config, its value will take precedence
            over environment variables and configuration values, but not over
            a value passed explicitly to the method. If a default config
            object is set on the session, the config object used when creating
            the client will be the result of calling ``merge()`` on the
            default config with the config provided to this call.
        :param mode: Mode in which to read the file. Valid options are: single,
            paged and elements.
        :param post_processors: Post processing functions to be applied to
            extracted elements.
        :param **unstructured_kwargs: Arbitrary additional kwargs to pass in when
            calling `partition`
        """
        super().__init__(mode, post_processors, **unstructured_kwargs)
        self.bucket = bucket
        self.key = key
        self.region_name = region_name
        self.api_version = api_version
        self.use_ssl = use_ssl
        self.verify = verify
        self.endpoint_url = endpoint_url
        self.aws_access_key_id = aws_access_key_id
        self.aws_secret_access_key = aws_secret_access_key
        self.aws_session_token = aws_session_token
        self.boto_config = boto_config

        try:
            import boto3
        except ImportError:
            raise ImportError(
                "Could not import `boto3` python package. "
                "Please install it with `pip install boto3`."
            )
        self.ak_s3 = boto3.client(
            "s3",
            region_name=self.region_name,
            api_version=self.api_version,
            use_ssl=self.use_ssl,
            verify=self.verify,
            endpoint_url=self.endpoint_url,
            aws_access_key_id=self.aws_access_key_id,
            aws_secret_access_key=self.aws_secret_access_key,
            aws_session_token=self.aws_session_token,
            config=self.boto_config,
        )

    def _get_elements(self) -> List:
        """Get elements."""
        from unstructured.partition.auto import partition

        try:
            import boto3
        except ImportError:
            raise ImportError(
                "Could not import `boto3` python package. "
                "Please install it with `pip install boto3`."
            )
        s3 = boto3.client(
            "s3",
            region_name=self.region_name,
            api_version=self.api_version,
            use_ssl=self.use_ssl,
            verify=self.verify,
            endpoint_url=self.endpoint_url,
            aws_access_key_id=self.aws_access_key_id,
            aws_secret_access_key=self.aws_secret_access_key,
            aws_session_token=self.aws_session_token,
            config=self.boto_config,
        )
        with tempfile.TemporaryDirectory() as temp_dir:
            file_path = f"{temp_dir}/{self.key}"
            os.makedirs(os.path.dirname(file_path), exist_ok=True)
            s3.download_file(self.bucket, self.key, file_path)
            return partition(filename=file_path, **self.unstructured_kwargs)

    def _get_metadata(self) -> dict:
        metadata = {"source": f"s3://{self.bucket}/{self.key}"}

        try:
            import boto3
        except ImportError:
            raise ImportError(
                "Could not import `boto3` python package. "
                "Please install it with `pip install boto3`."
            )
        s3 = boto3.client(
            "s3",
            region_name=self.region_name,
            api_version=self.api_version,
            use_ssl=self.use_ssl,
            verify=self.verify,
            endpoint_url=self.endpoint_url,
            aws_access_key_id=self.aws_access_key_id,
            aws_secret_access_key=self.aws_secret_access_key,
            aws_session_token=self.aws_session_token,
            config=self.boto_config,
        )

        # Assuming ak_s3 is already defined or initialized
        s3_file_object = s3.get_object(Bucket=self.bucket, Key=self.key)
        s3_object_metadata = s3_file_object["Metadata"]

        # Convert metadata values from string to their respective types
        if "users" in s3_object_metadata:
            metadata["users"] = json.loads(s3_object_metadata["users"])
        if "groups" in s3_object_metadata:
            metadata["groups"] = json.loads(s3_object_metadata["groups"])
        if "metadata" in s3_object_metadata:
            metadata["metadata"] = json.loads(s3_object_metadata["metadata"])

        return metadata


class PrivaceraS3DirectoryLoader(BaseLoader):
    """Load from `Amazon AWS S3` directory."""

    def __init__(
            self,
            bucket: str,
            prefix: str = "",
            *,
            region_name: Optional[str] = None,
            api_version: Optional[str] = None,
            use_ssl: Optional[bool] = True,
            verify: Union[str, bool, None] = None,
            endpoint_url: Optional[str] = None,
            aws_access_key_id: Optional[str] = None,
            aws_secret_access_key: Optional[str] = None,
            aws_session_token: Optional[str] = None,
            boto_config: Optional[botocore.client.Config] = None,
    ):
        """Initialize with bucket and key name.

        :param bucket: The name of the S3 bucket.
        :param prefix: The prefix of the S3 key. Defaults to "".

        :param region_name: The name of the region associated with the client.
            A client is associated with a single region.

        :param api_version: The API version to use.  By default, botocore will
            use the latest API version when creating a client.  You only need
            to specify this parameter if you want to use a previous API version
            of the client.

        :param use_ssl: Whether to use SSL.  By default, SSL is used.
            Note that not all services support non-ssl connections.

        :param verify: Whether to verify SSL certificates.
            By default SSL certificates are verified.  You can provide the
            following values:

            * False - do not validate SSL certificates.  SSL will still be
              used (unless use_ssl is False), but SSL certificates
              will not be verified.
            * path/to/cert/bundle.pem - A filename of the CA cert bundle to
              uses.  You can specify this argument if you want to use a
              different CA cert bundle than the one used by botocore.

        :param endpoint_url: The complete URL to use for the constructed
            client.  Normally, botocore will automatically construct the
            appropriate URL to use when communicating with a service.  You can
            specify a complete URL (including the "http/https" scheme) to
            override this behavior.  If this value is provided, then
            ``use_ssl`` is ignored.

        :param aws_access_key_id: The access key to use when creating
            the client.  This is entirely optional, and if not provided,
            the credentials configured for the session will automatically
            be used.  You only need to provide this argument if you want
            to override the credentials used for this specific client.

        :param aws_secret_access_key: The secret key to use when creating
            the client.  Same semantics as aws_access_key_id above.

        :param aws_session_token: The session token to use when creating
            the client.  Same semantics as aws_access_key_id above.

        :type boto_config: botocore.client.Config
        :param boto_config: Advanced boto3 client configuration options. If a value
            is specified in the client config, its value will take precedence
            over environment variables and configuration values, but not over
            a value passed explicitly to the method. If a default config
            object is set on the session, the config object used when creating
            the client will be the result of calling ``merge()`` on the
            default config with the config provided to this call.
        """
        self.bucket = bucket
        self.prefix = prefix
        self.region_name = region_name
        self.api_version = api_version
        self.use_ssl = use_ssl
        self.verify = verify
        self.endpoint_url = endpoint_url
        self.aws_access_key_id = aws_access_key_id
        self.aws_secret_access_key = aws_secret_access_key
        self.aws_session_token = aws_session_token
        self.boto_config = boto_config

    def load(self) -> List[Document]:
        """Load documents."""
        try:
            import boto3
        except ImportError:
            raise ImportError(
                "Could not import boto3 python package. "
                "Please install it with `pip install boto3`."
            )
        s3 = boto3.resource(
            "s3",
            region_name=self.region_name,
            api_version=self.api_version,
            use_ssl=self.use_ssl,
            verify=self.verify,
            endpoint_url=self.endpoint_url,
            aws_access_key_id=self.aws_access_key_id,
            aws_secret_access_key=self.aws_secret_access_key,
            aws_session_token=self.aws_session_token,
            config=self.boto_config,
        )
        bucket = s3.Bucket(self.bucket)
        docs = []
        for obj in bucket.objects.filter(Prefix=self.prefix):
            # Skip directories
            if obj.size == 0 and obj.key.endswith("/"):
                continue
            loader = PrivaceraS3FileLoader(
                self.bucket,
                obj.key,
                region_name=self.region_name,
                api_version=self.api_version,
                use_ssl=self.use_ssl,
                verify=self.verify,
                endpoint_url=self.endpoint_url,
                aws_access_key_id=self.aws_access_key_id,
                aws_secret_access_key=self.aws_secret_access_key,
                aws_session_token=self.aws_session_token,
                boto_config=self.boto_config,
            )
            docs.extend(loader.load())
        return docs

#-------------------------------------------------------------------------------------------------------------------------------------------------#

from opensearchpy import OpenSearch, RequestsHttpConnection


def get_opensearch_cluster_client(opensearch_host, opensearch_port, opensearch_use_ssl, opensearch_verify_certs,
                                  opensearch_username, opensearch_password, opensearch_index):
    opensearch_client = OpenSearch(
        hosts=[{
            'host': opensearch_host,
            'port': opensearch_port
        }],
        http_auth=(opensearch_username, opensearch_password),
        index_name=opensearch_index,
        use_ssl=opensearch_use_ssl,
        verify_certs=opensearch_verify_certs,
        connection_class=RequestsHttpConnection,
        timeout=30
    )
    return opensearch_client


def create_index_with_embeddings(opensearch_endpoint, opensearch_use_ssl, opensearch_verify_certs, opensearch_username,
                                 opensearch_password, opensearch_index, s3_bucket, s3_object_prefix, embeddings):

    from langchain.text_splitter import CharacterTextSplitter
    from langchain_community.vectorstores import OpenSearchVectorSearch

    loader = PrivaceraS3DirectoryLoader(
        bucket=s3_bucket,
        prefix=s3_object_prefix
    )
    docs = loader.load()

    text_splitter = CharacterTextSplitter(chunk_size=1024, chunk_overlap=0)
    docs = text_splitter.split_documents(docs)

    vector_store = OpenSearchVectorSearch.from_documents(
        docs,
        embedding=embeddings,
        opensearch_url=opensearch_endpoint,
        http_auth=(opensearch_username, opensearch_password),
        index_name=opensearch_index,
        use_ssl=opensearch_use_ssl,
        verify_certs=opensearch_verify_certs
    )

    return vector_store


def create_vector_store(opensearch_host, opensearch_port, opensearch_use_ssl, opensearch_verify_certs,
                               opensearch_username, opensearch_password, opensearch_index, s3_bucket, s3_object_prefix, embeddings):
    opensearch_client = get_opensearch_cluster_client(opensearch_host, opensearch_port, opensearch_use_ssl,
                                                      opensearch_verify_certs, opensearch_username, opensearch_password,
                                                      opensearch_index)

    print(f"Checking if index {opensearch_index} exists in OpenSearch cluster")
    exists = opensearch_client.indices.exists(index=opensearch_index)

    opensearch_protocol = "https://" if opensearch_use_ssl else "http://"
    opensearch_endpoint = opensearch_protocol + opensearch_host + ":" + str(opensearch_port)

    if exists:
        try:
            response = opensearch_client.indices.delete(index=opensearch_index)
            print(f"Index {opensearch_index} deleted")
            return response['acknowledged']
        except Exception as e:
            print (f"Error deleting index {opensearch_index} exception={e}")
        
    if not exists:
        print("Creating OpenSearch index With Embeddings")

        create_index_with_embeddings(
            opensearch_endpoint, opensearch_use_ssl, opensearch_verify_certs, opensearch_username,
            opensearch_password, opensearch_index, s3_bucket, s3_object_prefix, embeddings)
    else:
        print("OpenSearch index With Embeddings already exists")

#-------------------------------------------------------------------------------------------------------------------------------------------------#

from langchain_community.embeddings import BedrockEmbeddings
from langchain_community.llms.bedrock import Bedrock

# Create bedrock embeddings
boto3_bedrock_client = boto3.client('bedrock-runtime', region_name=AWS_REGION)
bedrock_embeddings = BedrockEmbeddings(model_id=EMBEDDING_MODEL_NAME, client=boto3_bedrock_client)

# Get/Create Vector Store
create_vector_store(OPENSEARCH_HOST, OPENSEARCH_PORT,
                       OPENSEARCH_USE_SSL, OPENSEARCH_VERIFY_CERTS,
                       OPENSEARCH_USERNAME, OPENSEARCH_PASSWORD,
                       OPENSEARCH_INDEX, S3_BUCKET, S3_OBJECT_PREFIX,
                       bedrock_embeddings)

print("Loaded data into index successfully.")

# 9. Create Privacera AI Application and the VectorDB configuration

In this step, we will create an AI Application configuration in PAIG that will be used to associate PAIG with a sample RAG Langchain application.

1. Log into your account in PAIG.
1. Click on [Paig Navigator -> Vector DB](https://na.privacera.ai/#/vector_db) and create a Vector DB and name it **Product Catalog - OpenSearch**, and save it.
1. Navigate back to the [Paig Navigator -> AI Application](https://na.privacera.ai/#/ai_applications) and create a new application and call it **Product Catalog - OpenSearch**
1. By clicking the **DOWNLOAD APP CONFIG**, download your application configuration file to your local disk.
1. Click on the pencil icon in the Information panel, and then click on the Enabled toggle to enable it, and then click on the Associated VectorDB drop-down and select the **Product Catalog - OpenSearch** vector database, and then click on Save in the application panel.


# 10. Upload the PAIG Application Config file to Jupyter

This is a 2 step process
1. From the left top, clike the UP arrow which is to the left of refresh button
2. Update the file name below and run the cell
We have implemented a small RAG bot using LangChain that will use the OpenSearch vector database to provide the context.


In [None]:
file_name="<UPDATE FILE NAME" # e.g. privacera-shield-Product-Catalog---OpenSearch-config.json
import os.path
if not os.path.isfile(file_name):
    print("Please give correct file_name")
else:
    with open(file_name, 'r') as file:
        app_config_file_content = file.read()

# 8. LangChain RAG bot
We have implemented a RAG bot using LangChain that will use the OpenSearch vector database to provide the context.


In [None]:
import privacera_shield
from privacera_shield import client as privacera_shield_client
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationalRetrievalChain
from langchain_community.vectorstores import OpenSearchVectorSearch

memory = ConversationBufferWindowMemory(memory_key="chat_history", return_messages=True, k=0)

# Create OpenSearch vector store
vector_store = OpenSearchVectorSearch(
    embedding_function=bedrock_embeddings,
    opensearch_url=OPENSEARCH_ENDPOINT,
    http_auth=(OPENSEARCH_USERNAME, OPENSEARCH_PASSWORD),
    index_name=OPENSEARCH_INDEX,
    use_ssl=OPENSEARCH_USE_SSL,
    verify_certs=OPENSEARCH_VERIFY_CERTS
)

# expose this index in a retriever interface
opensearch_retriever = vector_store.as_retriever(
    search_type="similarity", search_kwargs={"k": 5}
)

# Initialize Privacera Shield
#privacera_shield_client.setup(frameworks=["opensearch", "langchain"])
privacera_shield_client.setup(frameworks=["opensearch", "langchain"], application_config=app_config_file_content)

llm = Bedrock(model_id=LLM_MODEL_NAME, client=boto3_bedrock_client, model_kwargs={'temperature': 0.1})
template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])

# Let's assume the user is "testuser"
def query_as_user(user, query):
  print(f"Prompt: {query}")
  print()

  llm_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=opensearch_retriever, memory=memory, verbose=False)
  try:
      with privacera_shield_client.create_shield_context(username=user):
          response = llm_chain.invoke({"question": query})
          print()
          wrap_text(f"LLM Response: {response.get('answer')}")
  except privacera_shield.exception.AccessControlException as e:
      # If access is denied, then this exception will be thrown. You can handle it accordingly.
      print(f"AccessControlException: {e}")

# utility function to wrap the output
def wrap_text(text, width=80):
    words = text.split()
    character_count = 0
    for word in words:
        if character_count + len(word) + 1 > width:  # Check if adding the word would exceed the width
            print("\n", end="")  # Start a new line
            character_count = 0  # Reset the character count for the new line
        print(word, end=" ")  # Print the word followed by a space
        character_count += len(word) + 1  # Update the character count

# 9. Create users in PAIG portal

1. Click on Settings > Users and click on Add User button
1. Enter First Name as mark, Last Name as mark, User Name as mark and select Role as User and save the user.
1. Similarly create users sally, emily and peter


# 10. Filter vector database by users and groups

We will enable User/Group filter access to documents in the vector database so that the vector database will use only those documents to which the user, testuser,  has access to. Here are the steps to follow -
1. Click on Paig Navigator -> Vector DB and select OpenSearch vector database
1. Click on the Permissions tab and click on the pencil icon in the top panel.
1. Click on the User/Group Access-Limited Retrieval toggle to enable it and then click on the Save button.



1. Wait for a few seconds and re-run step [10. Ask a question](#scrollTo=mvHNN3GR84n9&line=1&uniqifier=1) in the notebook. Now the response will have only 2 customers. These are the 2 customer documents for which testuser has access to. You can check the metadata for each document in step [5. Associate metadata with the documents](#scrollTo=mPwRbVwo4hKJ&line=1&uniqifier=1) above.

# 11. Ask question about the product X11 which is under development
Peter belongs to the R&D team and has access to details of unreleased product called X11. And he should be able to compare all the phone models.


Sally belongs to the Sales team and she doesn't have access to details of X11 and she shouldn't be able to compare the phone models

Since the Product Development of X11 is marked as CONFIDENTIAL, only certain users have access to it.

In [None]:
query_as_user("peter", "Compare the product specifications for X10 and X11")
# this will compare both the product names

In [None]:
query_as_user("sally", "Compare the product specifications for X10 and X11")
# since Sally doesn't have access to new development, she won't be able to compare the models

# 13. Ask sales details by members of Sales and other teams

Sally belongs to the Sales team and she has access to the sales numbers.

Peter belonging to the R&D doesn't have access sales data.

Only the sales team has access to sales documents and these are carried forward in the VectorDB and enforced there

In [None]:
query_as_user("sally", "Give me the monthly sales data for X10?")

In [None]:
query_as_user("peter", "Give me the monthly sales data for X10?")

# 14. Let's redact PII data based on policy

Sally belongs to the Sales team and she can see customer details

Peter belonging to the R&D can't see customer PII data, but can see the feedback.

1. Go to [Paig Navigator -> AI Applications](https://na.privacera.ai/#/ai_applications) and select the AI Application you created
2. Now select the **PERMISSIONS** tab
3. Click the pencil for the **Personal Identifier Redaction** policy
4. Remove **Everyone** and add **peter**
5. On the right side for **Prompt** select the dropdown value **Allow**
6. Leave the **Reply** as **Redact**
7. Save the policy
8. Now **Enable** the policy by toggling **Status* toggle

In [None]:
query_as_user("sally", "Give me the customer feedbacks and their contact information")

In [None]:
query_as_user("peter", "Give me the customer feedbacks and their contact information")