# Environment Setup for Labs

In this notebook we will prepare the environment for following labs:
1. Install python dependencies
2. Download & upload sample documents for a knowledge base.
3. Update the knowledge base with the document.

We will use Amazon's return policies available in the web site as sample documents. The original documents are available at:
* https://www.amazon.in/gp/help/customer/display.html?nodeId=202111910 (India)
* https://www.amazon.co.uk/gp/help/customer/display.html?nodeId=GKM69DUUYKQWKWX7 (UK)
* https://www.amazon.com/gp/help/customer/display.html/?nodeId=GKM69DUUYKQWKWX7 (US)

The metadata files are pre-created for the documents under "metadata" folder.

In [None]:
!pip install -r requirements.txt -Uq

In [None]:
import boto3
from utils import get_param_value
# Get AWS Account ID and Region
session = boto3.Session()

sts = session.client('sts')
identity = sts.get_caller_identity()
account_id = identity['Account']
region = boto3.Session().region_name or 'us-west-2'

print(f"Account ID: {account_id}")
print(f"Region: {region}")

In [None]:
from utils.web_scraper import process_urls
urls = [
    ("https://www.amazon.in/gp/help/customer/display.html?nodeId=202111910", "Amazon-return-policy-in"),
    ("https://www.amazon.co.uk/gp/help/customer/display.html?nodeId=GKM69DUUYKQWKWX7","Amazon-return-policy-uk"),
    ("https://www.amazon.com/gp/help/customer/display.html/?nodeId=GKM69DUUYKQWKWX7", "Amazon-return-policy-us")
]
print("Processing URLs...")
process_urls(urls)
print("Done!")

In [None]:
!aws s3 sync ./kb_docs s3://{account_id}-{region}-kb-data-bucket

In [None]:
kb_id = get_param_value(f"/app/workshop/kb/knowledge-base-id")
ds_id = get_param_value(f"/app/workshop/kb/data-source-id")

In [None]:
%%time
from utils.knowledgebase import ingest_documents_to_kb
ingest_documents_to_kb(session, kb_id, ds_id, region)