# Welcom to the the ComSum GenAI tutorial

## Hosted by Andrew Larssen @ PA Consulting

If you find this useful and want to read my AWS related blog posts or connect: https://www.linkedin.com/in/andrewlarssen/
Email Andrew dot Larssen at PAConsulting dot com

## NOTE
Before starting make sure you have completed the setup steps:
* Setup SageMaker studio (you have probably done that if you have got here)
* Setup s3 bucket
* Setup Bedrock model access
* Setup Bedrock Knowledgebase

## GOAL
The goal is to try a real world setup ans dome skills that you can take amay and use in your own products
* Understanding of RAG
* Some basic prompt enineering
* Setting up AWS Bedrock Knowledgebases and using them

### Always start with a defining the buisness problem

You are a IT industry analyst. You need to write a report on "The legal and ethical use of AI in Europe". Your paper must be around 2 pages (750-1000) words. Your answer must include citations.

You plan to use GenAI to produce the first draft of your paper

Steps:
1. Create a knowledge base of possible reference texts
2. Create a prompt defining your problem
3. Let Bedrock build your answer
4. PUB!

What AWS tools can we use to make this as easy as possible?
* Bedrock
* Sagemaker
  * jupyter
  * python
* Bedrock Knowledge bases
  * OpenSearch
  * s3

### Secret sauce!
First start by installing the *latest* AWS oficial Python library - boto3
Bedrock is constantly changing you may not need this step. But if you experience any strange behaviour from boto3 the best first step is to reinstall boto3 and then restart the Jupyter kernel!

I have learnt this the hard way...

In [None]:
!pip install boto3 --upgrade

### Set some constants

In [1]:
# London
region = 'eu-west-2'
# Virginia
#region='us-east-1'

claude_model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'
claude_model_arn = "arn:aws:bedrock:{}::foundation-model/{}".format(region, claude_model_id)
anthropic_version = 'bedrock-2023-05-31'
max_return_tokens = 100000
temperature = 0.1

### Include libs and configure clients

In [2]:
import boto3
import json
from botocore.config import Config

my_config = Config(
    region_name = region,
)

bedrock_client = boto3.client('bedrock', config=my_config)
bedrock_runtime_client = boto3.client('bedrock-runtime', config=my_config)
bedrock_agent_client = boto3.client('bedrock-agent', config=my_config)
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime', config=my_config)

### We will need the knowledgebase ID

In [None]:
result = bedrock_agent_client.list_knowledge_bases()
result['knowledgeBaseSummaries']

In [3]:
kb_id_eu = '9LWD4BL7BQ'
#kb_id_shake = 'FDSDL0NWSF'

## Show the models to test that everything is working

In [None]:
response = bedrock_client.list_foundation_models()
[x['modelName'] for x in response['modelSummaries']]

## Define some helper functions

In [4]:
#This is the basic Knoledge base retrieve and generate function

def retrieve_and_generate(input_text, knowledge_base_id, results=5):
    response = bedrock_agent_runtime_client.retrieve_and_generate(
        input={
            "text": input_text
        },
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "knowledgeBaseId": knowledge_base_id,
                "modelArn": claude_model_arn,
                "retrievalConfiguration": {
                    "vectorSearchConfiguration": {
                        "numberOfResults":results,
                    } 
                }
            }
        }
    )
    return response
    

In [5]:
#This is the basic Knoledge base retrieve ONLY function

def retrieve(input_text, knowledge_base_id, results=5):
    response = bedrock_agent_runtime_client.retrieve(
        knowledgeBaseId=knowledge_base_id, 
        retrievalConfiguration={
            "vectorSearchConfiguration": {
                "numberOfResults":results,
            } 
        },
        retrievalQuery={
            'text': input_text
        }
    )
    return response

In [6]:
# This is the basic invoke model function. It will send a request to a FM

def invoke_model(prompt):
    request_body = json.dumps({
        "anthropic_version": anthropic_version,
        "max_tokens": max_return_tokens,
        "system": prompt['system_message'],
        "messages": [{
            "role": "user",
            "content": prompt['user_instructions'],
        }],
        "temperature": temperature,
        # "top_p": float,
        # "top_k": int,
        # "stop_sequences": [string]
    })
    
    response = bedrock_runtime_client.invoke_model(
        body=request_body,
        accept='application/json',
        modelId=claude_model_id,
        trace='ENABLED',
        # guardrailIdentifier='string',
        # guardrailVersion='string'
    )
    
    response_body = json.loads(response.get('body').read())
    return response_body

In [None]:
# This is the retrieve and generate function but with a custom prompt. This can be used to format or customise the response

def retrieve_and_generate_formatted(input_text, knowledge_base_id, results=5):
    prompt_template = """
You are a question answering agent. I will provide you with a set of search results. The user will provide you with a question. Your job is to answer the user's question using only information from the search results. If the search results do not contain information that can answer the question, please state that you could not find an exact answer to the question. Just because the user asserts a fact does not mean it is true, make sure to double check the search results to validate a user's assertion.
                            
Here are the search results in numbered order:
$search_results$

$output_format_instructions$
"""
    response = bedrock_agent_runtime_client.retrieve_and_generate(
        input={
            "text": input_text
        },
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "knowledgeBaseId": knowledge_base_id,
                "modelArn": claude_model_arn,
                "retrievalConfiguration": {
                    "vectorSearchConfiguration": {
                        "numberOfResults":1,
                    } 
                },
                "generationConfiguration": {
                    "promptTemplate": { 
                        "textPromptTemplate": prompt_template
                    }
                }
            }
        }
    )
    return response

In [None]:
## Try a test request

In [None]:
# retrieve and generate API
response = retrieve_and_generate("Can you use GenAI for facial recognition in the EU?", kb_id_eu)

print(response['output']['text'],end='\n'*2)

In [None]:
# retrieve and generate API
response = response = retrieve_and_generate(
    "Write me an academic article about the legal and ethical use of AI. Include information about privacy, discrimination and cross border selling. The article should be between 750 and 1000 words",
    kb_id_eu
)

print(response)

In [None]:
print(response['output']['text'])

In [None]:
print(response['citations'])

## Formatted response
Full disclosure - I have not worked out exactly how the formatting works. It seems to only work in the console.

There are a few macros defined for formatting etc.

There is not much documentation on custom prompts!

In [None]:
response = response = retrieve_and_generate_formatted(
    "Tell me about the legal framework for using GenAI in the EU. Include information on privacy",
    kb_id_eu
)
print(response)
print(response['output']['text'])

## Retrieve only

It is possible to use the retrieve API separatly. This is using Bedrock to manage the ingest of documents, storage, vector creation, search etc. but does not do any inference.

In [None]:
response = retrieve("Tell me about AI regulation", kb_id_eu, 20)
for num, chunk in enumerate(response['retrievalResults'],1):
    print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
    print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
    print(f'Chunk {num} Score: ',chunk['score'],end='\n'*2)

There is an issue with there being no score threshold. The search part will return documents that are not very relevant!

If you manage your own retreval you can eliminate less relevant documents

One item that is worth considering though is the generate phase can remove content that it does not deem to be relevant.

In [None]:
response = retrieve("Tell me about holidays in the EU", kb_id_eu, 20)
for num, chunk in enumerate(response['retrievalResults'],1):
    print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
    print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
    print(f'Chunk {num} Score: ',chunk['score'],end='\n'*2)

## How to improve?

### Tuning the model

* Temperature: Controls randomness, higher values increase diversity. Lower can create more analytical answers and higher can be more creative. *Float 0.0-1.0*
* Top-p (nucleus): The cumulative probability cutoff for token selection. Lower values mean sampling from a smaller, more top-weighted nucleus. Lower top-p values reduce diversity and focus on more probable tokens. *Float 0.0-1.0*
* Top-k: Sample from the k most likely next tokens at each step. Lower k focuses on higher probability tokens. *Int 1+*
* max_tokens: The maximum tokens in response. A hard limit. It can be better to ask the model for an answer size as well as a hard limit. *Int 1+*
* stop_sequences: A sequence in the output that will cause the model to stop generating. *String*

Reccomendations:
* Start with defaults
* For most situations, only use temprature and leave top-k and top-p alone!
* Read the model docs
* Be scientific, don't change lots of paraeters at onece. Change one at a time and make small changes.

A little technical detail:
* Temperature is the amount of randomness injected into the response. Even with 0.0 though answers are not completly deterministic.
* Top-k is how many answers to consider. Istead of just choosin the next most likely candidate, consiser k options
* Top-p is the cumulative probability of answers to consider. It eliminates a long tail of low probability answers

## Take more control
The easiest way to take more control is to split up the retrieval and generation opperations. This has many advantages:
* Better control the search query
* Query multiple knowledgebases
* Manage relevance yourself
* Can be easier to format and quality control


## Customise query decomposition/improvement

There are multiple techniques to improve the search query 

* step-back query (get a more general overview)
* make query more specific
* query decomposition - split into separate queries

Another option is to add some problem specific logic. Adding an overall theme, excluding certain items...

In [None]:
user_query = "What is the law on the use of GenAI in the EU? Include information on export controls and user privacy"
DECOMPOSE_PROMPT = {
    'system_message': """
You are an AI assistant that prepares queries that will be sent to a search component.
Your job is to reformulating user queries to improve retrieval in a RAG system.
""",
    'user_instructions': f"""
Perform the following steps:
 - If the query is narrow in focus, generate an additional step-back query that is more general and can help retrieve relevant background information.
 - Rewrite it to be more specific, detailed, and likely to retrieve relevant information.
 - Perform query decomposition. Given a user question, break it down into distinct sub questions that you need to answer in order to answer the original question.


You should produce 1-10 subqueries that, when answered together, would provide a comprehensive response to the original query.
If there are acronyms or words you are not familiar with, do not try to rephrase them.
If the query is already well formed, do not try to decompose it further.

Your output should be a JSON document with an array of subqueries

<examples>
<example1>
<user_input>Did Microsoft or Google make more money last year?</user_input>
<output>
{{
  query: "Did Microsoft or Google make more money last year?",
  subqueries: [
    "How much profit did Microsoft make last year?",
    "How much profit did Google make last year?"
  ]
}}
</output>
</example1>

<example2>
<user_input>What is the capital of France?</user_input>
<output>
{{ 
  query: "What is the capital of France?",
  subqueries: [
    "What is the capital of France?"
  ]
}}
</output>
</example2>

<example3>
<user_input>What are the impacts of climate change on the environment?</user_input>
<output>
{{ 
  query: "What are the impacts of climate change on the environment?",
  subqueries: [
    "What are the impacts of climate change on biodiversity and ecosystems?",
    "How does climate change affect the oceans and sea levels?",
    "What are the effects of climate change on agriculture?",
    "What are the impacts of climate change on human health?",
    "What are the impacts of climate change on weather patterns?"
  ]
}}
</output>
</example3>

</examples>

<user_input>{user_query}</user_input>
"""
}
decompose_response = invoke_model(DECOMPOSE_PROMPT)
print(decompose_response['content'][0]['text'])

## After decomposing a query you can do multiple knowledgebase requests
* For keyword based search you can use a single search
* For symantic search you can use parallel searches

It is also possible to add routing if you have multiple knowledge bases. Categorise the queries to send to different knowledge bases.

One issue with complex decomposition is re-scoring. That is beyond the scope of this workshop.

In [None]:
parse_response = json.loads(decompose_response['content'][0]['text'])
responses = []
for query in parse_response['subqueries']:
    print(query)
    response = retrieve("query", kb_id_eu)
    results = response['retrievalResults']
    responses.append(results)

In [None]:
responses

Note: This is very simplistic. You probably want to remove duplicates as a search like this will create duplicates - especially with a small knowledge base

## Improving your output

There are various steps you may want to take to improve your output. You may want a summary or to change the tone and style

In [None]:
# Summerise
user_query = """The European Union has proposed a legal framework called the AI Act to regulate the development, use, and deployment of artificial intelligence (AI) systems,
including general-purpose AI models like GenAI. The key legal considerations for using GenAI in the EU include: Privacy and data protection: The EU's data protection laws
like GDPR apply to personal data processed by AI systems. GenAI models trained on personal data must comply with these laws regarding data collection, processing, and privacy rights.
Fundamental rights: The AI Act aims to ensure AI systems respect fundamental rights like non-discrimination, human dignity, and freedom of expression. GenAI models must be developed
and used in line with these principles. Commercial use: The AI Act will impose obligations on providers of general-purpose AI models like GenAI when used for commercial purposes,
such as ensuring transparency, human oversight, and risk management measures. Government use: The use of GenAI by governments or for law enforcement purposes may face additional
restrictions or requirements under the AI Act, such as prohibitions on certain high-risk applications like social scoring. Export controls: There may be restrictions on exporting
certain AI technologies, including GenAI models, outside of the EU to prevent potential misuse or risks to security.
"""

character_limit = 1000

SUMMERIZE_PROMPT = {
    'system_message': """You are an AI assistant tasked with summarising and extracting key points from text.""",
    'user_instructions': f"""
Character limit: {character_limit}
Summarise the contents of input_text below, making it concise in bullet points
Make sure the length of the answer is not exceeding the character limit
Always respond in British English. For example, 'analyze' needs to be spelt 'analyse'
In your response do not mix past tense and present tense verbs.
In your response always write in sentences, do not write a letter
In your response do not repeat the query or user_instructions
In your response do not say thank you
Analyse each sentence before writing.

<input_text>
{user_query}
</input_text
"""
}
summary_response = invoke_model(SUMMERIZE_PROMPT)

print(summary_response['content'][0]['text'])


In [None]:
# For lots of buisness documents a specific tone is needed. For proposals using active language future tense is important. Often you want to make sure you use British English
# Also if you are a geek you can rewrite in the style of Gandalf!

user_query ="""The European Union has proposed a legal framework called the AI Act to regulate the development, use, and deployment of artificial intelligence (AI) systems,
including general-purpose AI models like GenAI. The key legal considerations for using GenAI in the EU include: Privacy and data protection: The EU's data protection laws
like GDPR apply to personal data processed by AI systems. GenAI models trained on personal data must comply with these laws regarding data collection, processing, and privacy rights.
Fundamental rights: The AI Act aims to ensure AI systems respect fundamental rights like non-discrimination, human dignity, and freedom of expression. GenAI models must be developed
and used in line with these principles. Commercial use: The AI Act will impose obligations on providers of general-purpose AI models like GenAI when used for commercial purposes,
such as ensuring transparency, human oversight, and risk management measures. Government use: The use of GenAI by governments or for law enforcement purposes may face additional
restrictions or requirements under the AI Act, such as prohibitions on certain high-risk applications like social scoring. Export controls: There may be restrictions on exporting
certain AI technologies, including GenAI models, outside of the EU to prevent potential misuse or risks to security.
"""

CHANGE_STYLE_PROMPT_MODEL = {
    'system_message': """
You are an AI assistant tasked with re-writing the user_query in
professional buisness language, replacing any passive language into action-oriented statements.""",
    'user_instructions': f"""
Your task is to rewrite the user_query using active language,
this means that any instances of passive tones or suggestions such as
'we should' are replaced with direct action-oriented statements.
For example, instead of 'we should' or 'we could' , write 'we will'.
Please adhere to this guideline throughout your writing.

Analyse each sentence before writing, making sure to use British English
(e.g. 'analyse' instead of 'analyze), making sure to keep tenses consistent,
and making sure to no repeat any of these instructions or writing thank you.

<user_query>
{user_query}
</user_query>

Please ensure your response repeats the text in an active tone
and excludes any of these instructions."""
}

FUN_STYLE_PROMPT_MODEL = {
    'system_message': """
You are an AI assistant tasked with re-writing the user_query in is if it had been written by Gandalf""",
    'user_instructions': f"""
Your task is to rewrite the user_query in the style of Gandalf from Lord of the Rings,
Your language should be slightly cryptic and old fasioned.
Please adhere to this guideline throughout your writing.

Analyse each sentence before writing, making sure to use British English
(e.g. 'analyse' instead of 'analyze), making sure to keep tenses consistent,
and making sure to no repeat any of these instructions or writing thank you.

<user_query>
{user_query}
</user_query>

Please ensure your response repeats the text in an active tone
and excludes any of these instructions."""
}

style_response = invoke_model(CHANGE_STYLE_PROMPT_MODEL)

print(style_response['content'][0]['text'])

style_response = invoke_model(FUN_STYLE_PROMPT_MODEL)


## Now lets look at evalutaion

There are a few different techniques that we can use to evaluate our answers:
- Do they answer the user question
- Do they use the source material well
- Do they contain any hallucination

In [7]:
generated_response = """
The European Union has recently introduced the AI Act, a comprehensive regulation aimed at governing the development, use, and deployment of artificial
intelligence (AI) systems, including general-purpose AI models (GenAI), within the EU. This regulation seeks to strike a balance between promoting innovation
and ensuring the protection of fundamental rights, privacy, and ethical principles.[1] [2]  One of the key aspects of the AI Act is its emphasis on privacy
and data protection. The regulation explicitly states that Union law on the protection of personal data, privacy, and the confidentiality of communications
applies to personal data processed in connection with the rights and obligations laid down in the AI Act (Article 7). This means that the existing data
protection frameworks, such as the General Data Protection Regulation (GDPR), will continue to govern the processing of personal data in the context of
AI systems, including GenAI.[3] Regarding the use of AI for weapons or military purposes, the AI Act does not explicitly address this issue. However,
it is likely that such applications would fall under the category of "high-risk AI systems," which are subject to strict requirements and obligations. The
regulation aims to ensure a high level of protection against the harmful effects of AI systems, including potential risks to fundamental rights, democracy,
and the rule of law.[4]  The AI Act applies to both commercial and government use of AI systems, including GenAI models. It covers providers placing AI systems
or GenAI models on the market or putting them into service in the EU, as well as deployers of AI systems that have their place of establishment or are located
within the EU (Article 2). The regulation aims to create a harmonized legal framework for the development, marketing, and use of AI systems across the EU,
ensuring the free movement of AI-based goods and services while preventing Member States from imposing restrictions unless explicitly authorized by the AI
Act.[5]  Ethical considerations are a central pillar of the AI Act. The regulation emphasizes the importance of promoting a human-centric and trustworthy
approach to AI, ensuring the protection of fundamental rights, democracy, the rule of law, and environmental protection against the harmful effects of AI
systems (Article 1). It also aims to support innovation while upholding ethical principles, as requested by the European Parliament.[6]  Regarding export
controls, the AI Act does not explicitly address this issue. However, it is likely that the export of certain AI systems or GenAI models, particularly those
classified as high-risk or with potential dual-use applications, may be subject to existing export control regulations and international agreements. The
regulation aims to create a harmonized legal framework within the EU, but it may also have implications for the cross-border transfer of AI technologies
and models to third countries.
"""

source_material = """
 - 1 artificial_intelligence_act.pdf
Whereas: (1) The purpose of this Regulation is to improve the functioning of the internal market by laying down a uniform legal framework in particular for the
development, the placing on the market, the putting into service and the use of artificial intelligence systems (AI systems) in the Union, in accordance with
Union values, to promote the uptake of human centric and trustworthy artificial intelligence (AI) while ensuring a high level of protection of health, safety,
fundamental rights as enshrined in the Charter of fundamental rights of the European Union (the 'Charter'), including democracy, the rule of law and environmental
protection, against the harmful effects of AI systems in the Union, and to support innovation. This Regulation ensures the free movement, cross-border, of
AI-based goods and services, thus preventing Member States from imposing restrictions on the development, marketing and use of AI systems, unless explicitly
authorised by this Regulation. (2) This Regulation should be applied in accordance with the values of the Union enshrined as in the Charter, facilitating the
protection of natural persons, undertakings, democracy, the rule of law and environmental protection, while boosting innovation and employment and making the
Union a leader in the uptake of trustworthy AI.

 - 2 artificial_intelligence_act.pdf
# CHAPTER I ## GENERAL PROVISIONS ### Article 1 ### Subject matter 1. The purpose of this Regulation is to improve the functioning of the internal market and
promote the uptake of human-centric and trustworthy artificial intelligence (AI), while ensuring a high level of protection of health, safety, fundamental
rights enshrined in the Charter of Fundamental Rights, including democracy, the rule of law and environmental protection, against the harmful effects of
artificial intelligence systems (AI systems) in the Union, and to support innovation. 2. This Regulation lays down: (a) harmonised rules for the placing on
the market, the putting into service, and the use of AI systems in the Union; (b) prohibitions of certain AI practices; (c) specific requirements for
high-risk AI systems and obligations for operators of such systems;

 - 3 artificial_intelligence_act.pdf
5. This Regulation shall not affect the application of the provisions on the liability of providers of intermediary services as set out in Chapter II of
Regulation (EU) 2022/2065. 6. This Regulation does not apply to AI systems or AI models, including their output, specifically developed and put into service
for the sole purpose of scientific research and development. 7. Union law on the protection of personal data, privacy and the confidentiality of communications
applies to personal data processed in connection with the rights and obligations laid down in this Regulation. This Regulation shall not affect Regulation (EU)
2016/679 or (EU) 2018/1725, or Directive 2002/58/EC or (EU) 2016/680, without prejudice to the arrangements provided for in Article 10(5) and Article 59 of
this Regulation. 8. This Regulation does not apply to any research, testing or development activity regarding AI systems or models prior to their being placed
on the market or put into service. Such activities shall be conducted in accordance with applicable Union law. Testing in real world conditions shall not be
covered by that exclusion.

 - 4 artificial_intelligence_act.pdf
# CHAPTER I ## GENERAL PROVISIONS ### Article 1 ### Subject matter 1. The purpose of this Regulation is to improve the functioning of the internal market and
promote the uptake of human-centric and trustworthy artificial intelligence (AI), while ensuring a high level of protection of health, safety, fundamental
rights enshrined in the Charter of Fundamental Rights, including democracy, the rule of law and environmental protection, against the harmful effects of
artificial intelligence systems (AI systems) in the Union, and to support innovation. 2. This Regulation lays down: (a) harmonised rules for the placing on
the market, the putting into service, and the use of AI systems in the Union; (b) prohibitions of certain AI practices; (c) specific requirements for
high-risk AI systems and obligations for operators of such systems;

 - 5 artificial_intelligence_act.pdf
(d) harmonised transparency rules for certain AI systems; (e) harmonised rules for the placing on the market of general-purpose AI models; (f) rules on
market monitoring, market surveillance governance and enforcement; (g) measures to support innovation, with a particular focus on SMEs, including start-ups.
## Article 2 ### Scope 1. This Regulation applies to: (a) providers placing on the market or putting into service AI systems or placing on the market
general-purpose AI models in the Union, irrespective of whether those providers are established or located within the Union or in a third country; (b)
deployers of AI systems that have their place of establishment or are located within the Union; (c) providers and deployers of AI systems that have their
place of establishment or are located in a third country, where the output produced by the AI system is used in the Union;

 - 6 artificial_intelligence_act.pdf
(8) A Union legal framework laying down harmonised rules on AI is therefore needed to foster the development, use and uptake of AI in the internal market
that at the same time meets a high level of protection of public interests, such as health and safety and the protection of fundamental rights, including
democracy, the rule of law and environmental protection as recognised and protected by Union law. To achieve that objective, rules regulating the placing on
the market, the putting into service and the use of certain AI systems should be laid down, thus ensuring the smooth functioning of the internal market and
allowing those systems to benefit from the principle of free movement of goods and services. Those rules should be clear and robust in protecting fundamental
rights, supportive of new innovative solutions, enabling a European ecosystem of public and private actors creating AI systems in line with Union values and
unlocking the potential of the digital transformation across all regions of the Union. By laying down those rules as well as measures in support of innovation
with a particular focus on small and medium enterprises (SMEs), including startups, this Regulation supports the objective of promoting the European
human-centric approach to AI and being a global leader in the development of secure, trustworthy and ethical AI ▌ as stated by the European Council^5,
and it ensures the protection of ethical principles, as specifically requested by the European Parliament^6. ^5 European Council, Special meeting of the
European Council (1 and 2 October 2020) – Conclusions, EUCO 13/20, 2020, p. 6. ^6 European Parliament resolution of 20 October 2020 with recommendations
to the Commission on a framework of ethical aspects of artificial intelligence, robotics and related technologies, 2020/2012(INL).

 - 7 artificial_intelligence_act.pdf
Whereas: (1) The purpose of this Regulation is to improve the functioning of the internal market by laying down a uniform legal framework in particular for
the development, the placing on the market, the putting into service and the use of artificial intelligence systems (AI systems) in the Union, in accordance
with Union values, to promote the uptake of human centric and trustworthy artificial intelligence (AI) while ensuring a high level of protection of health,
safety, fundamental rights as enshrined in the Charter of fundamental rights of the European Union (the 'Charter'), including democracy, the rule of law
and environmental protection, against the harmful effects of AI systems in the Union, and to support innovation. This Regulation ensures the free movement,
cross-border, of AI-based goods and services, thus preventing Member States from imposing restrictions on the development, marketing and use of AI systems,
unless explicitly authorised by this Regulation. (2) This Regulation should be applied in accordance with the values of the Union enshrined as in the Charter,
facilitating the protection of natural persons, undertakings, democracy, the rule of law and environmental protection, while boosting innovation and employment
and making the Union a leader in the uptake of trustworthy AI.

 - 8 artificial_intelligence_act.pdf
# CHAPTER I ## GENERAL PROVISIONS ### Article 1 ### Subject matter 1. The purpose of this Regulation is to improve the functioning of the internal market
and promote the uptake of human-centric and trustworthy artificial intelligence (AI), while ensuring a high level of protection of health, safety, fundamental
rights enshrined in the Charter of Fundamental Rights, including democracy, the rule of law and environmental protection, against the harmful effects of
artificial intelligence systems (AI systems) in the Union, and to support innovation. 2. This Regulation lays down: (a) harmonised rules for the placing on
the market, the putting into service, and the use of AI systems in the Union; (b) prohibitions of certain AI practices; (c) specific requirements for
high-risk AI systems and obligations for operators of such systems;
"""

user_query = """
write 500-750 words on the regulation of GenAI in the EU. Include information on privacy, weapons, comercial vs goverment use, ethics and any export controls
"""

In [None]:
ANSWER_EVALUATION_PROMPT = {
    'system_message': """
You are an analytical AI assistant able to make quality checks
on 'response' and 'user_query'""",
    
    'user_instructions': f"""
Your tasks are to make the following quality checks on 'response'
and 'user_query'. Write your answer to each check as evidence in your output.
Let's do it step by step.

Check 1: Credibility
Does each point in the response have evidence to justify the point?
Evidence should include the sources listed in the source_material.

Check 2: Meets requirements
Does the response fulfill the requirements outlined in 'user_query'?

Check 3: Detail balance
Make an assessment on how generic the user_query is and judge
whether the response is equally as generic or detailed.

Using all the information from the checks,
give a score reflecting how well the response meets all these checks
on an integer scale between 0 and 100.

Please penalise the score if any of the checks show areas of improvement.
Please penalise the score if the response states any missing information.

<response>
{generated_response}
</response>

<user_query>
{user_query}
</user_query>

<source_material>
{source_material}
</source_material>

Analyse each sentence before writing
"""
}

evaluation_response = invoke_model(ANSWER_EVALUATION_PROMPT)

print(evaluation_response['content'][0]['text'])

In [9]:
IDENTIFY_REFERENCES_PROMPT = {
    'system_message': """
    You are an analytical AI assistant tasked with
    finding important material to answer a user_query
    """,
    'user_instructions': f"""
    Your task is to find parts of text in 'source'
    that is useful to answer 'user_query'.
    Let's do it step by step.
    
    Find text in 'source' that matches with the 'user_query'.
    Identify the chunk of text around the text that you find, this is source_material.
    For each chunk of text that you find, identify the corresponding filename.
    The filename is proceeded by an index number and a space, this should be ignored. The filename ends with '.pdf'. 
    Format your response in the following way:
    Reference(references = list[source_material:..., source_filename:...,])
    where the ellipses are the parts you need to fill with the tasks above.

    <source>
    {source_material}
    </source>

    <user_query>
    {user_query}
    </user_query>

    Please ensure your response is in the correct format.
    Make sure to use double string quotations in your
    response and use valid escapes.
    """
}

references_response = invoke_model(IDENTIFY_REFERENCES_PROMPT)

print(references_response['content'][0]['text'])

Reference(references = [
    source_material:"(1) The purpose of this Regulation is to improve the functioning of the internal market by laying down a uniform legal framework in particular for the development, the placing on the market, the putting into service and the use of artificial intelligence systems (AI systems) in the Union, in accordance with Union values, to promote the uptake of human centric and trustworthy artificial intelligence (AI) while ensuring a high level of protection of health, safety, fundamental rights as enshrined in the Charter of fundamental rights of the European Union (the 'Charter'), including democracy, the rule of law and environmental protection, against the harmful effects of AI systems in the Union, and to support innovation. This Regulation ensures the free movement, cross-border, of AI-based goods and services, thus preventing Member States from imposing restrictions on the development, marketing and use of AI systems, unless explicitly authorised 

In [11]:
CLASSIFY_REFERENCE_PROMPT = {
    'system_message': """
    You are an analytical AI assistant tasked with classifying
    'reference' based on whether the text
    in source_material features in 'response'
    """,
    'user_instructions': f"""
    Your task is to classify 'reference' based
    on whether it is used in 'response' or unused.
    Let's do it step by step.
    
    Determine if there are chunks of words in 'response' that
    matches to source_material of 'reference'.
    If you find chunks, then write these chunks in response_sentences of your output,
    and ensure reference_status is 'used'.
    Then analyse 'response' and find all similar chunks in
    'response' that matches or has similar meaning
    to source_material of 'reference'. Add all the similar chunks to response_sentences
    in your output.
    If you do not find chunks, then response_sentences
    is an empty string in your output
    and ensure reference_status is 'unused'

    Write 'reference' as your output with the two extra parts of information,
    'response_sentences' and 'reference_status'.

    <reference>
    {source_material}
    </reference>
    
    <response>
    {generated_response}
    </response>
    """
}

classify_response = invoke_model(CLASSIFY_REFERENCE_PROMPT)

print(classify_response['content'][0]['text'])

<reference>
 - 1 artificial_intelligence_act.pdf
Whereas: (1) The purpose of this Regulation is to improve the functioning of the internal market by laying down a uniform legal framework in particular for the development, the placing on the market, the putting into service and the use of artificial intelligence systems (AI systems) in the Union, in accordance with Union values, to promote the uptake of human centric and trustworthy artificial intelligence (AI) while ensuring a high level of protection of health, safety, fundamental rights as enshrined in the Charter of fundamental rights of the European Union (the 'Charter'), including democracy, the rule of law and environmental protection, against the harmful effects of AI systems in the Union, and to support innovation.
 - 2 artificial_intelligence_act.pdf 
# CHAPTER I ## GENERAL PROVISIONS ### Article 1 ### Subject matter 1. The purpose of this Regulation is to improve the functioning of the internal market and promote the uptake o

In [13]:
# Detect hallucinations
DETECT_HALLUCINATION_PROMPT = {
    'system_message': """
You are an intelligent AI assistant able to determine if texts of different lengths agree with each other
""",
    'user_instructions': f"""
Your task is to determine whether 'response_sentence' aligns with
the text in the 'source_material' or not.

Read 'source_material' and determine whether any of the information within it
agrees with 'response_sentence', has a similar meaning to 'response'
or can be used to validate 'response_sentence'.

If so, then your response is a json object with the key as 'status' and the value as 'True'.

If not, then check this is the case by repeating your analysis again.
If you have repeated the analysis one more time and the results are that they
still do not agree, then your response is a json object with the key as
'status' and the value as 'False'.

If you cannot answer, then repeat your analysis until you can.

<response>
{generated_response}
</response>

<source_material>
{source_material}
</source_material>

Make sure your response is a json object with the key as 'status' and
the value as either 'True' if 'response_sentence' agrees with any text
within 'source_material' or 'False' if it does not.
"""
}

hallucination_response = invoke_model(DETECT_HALLUCINATION_PROMPT)

print(hallucination_response['content'][0]['text'])

{
    "status": True
}


In [16]:
FORMAT_CITATIONS_PROMPT = {
    'system_message': """You are an expert research assistant. You must format the output with citations used correctly""",
    'user_instructions': f"""
First, find the quotes from the source_material that are most relevant to answering the question, and then print them in numbered order.
Quotes should be relatively short. Quotes should start with the filename. The filename endswith '.pdf'

If there are no relevant quotes, write “No relevant quotes” instead.

Then, answer the question, starting with “Answer:“. Do not include or reference quoted content verbatim in the answer. Don’t say “According to Quote [1]”
when answering. Instead make references to quotes relevant to each section of the answer solely by adding their bracketed numbers at the end of relevant sentences.

Thus, the format of your overall response should look like what’s shown between the tags. Make sure to follow the formatting and spacing exactly.

Answer:
Company X earned $12 million. [1] Almost 90% of it was from widget sales. [2]

Quotes:
[1] document1.pdf “Company X reported revenue of $12 million in 2021.”
[2] document2.pdf “Almost 90% of revenue came from widget sales, with gadget sales making up the remaining 10%.”


If the question cannot be answered by the document, say so.

<response>
{generated_response}
</response>

<source_material>
{source_material}
</source_material>
"""
}

citations_response = invoke_model(FORMAT_CITATIONS_PROMPT)

print(citations_response['content'][0]['text'])

Quotes:

[1] artificial_intelligence_act.pdf "The purpose of this Regulation is to improve the functioning of the internal market by laying down a uniform legal framework in particular for the development, the placing on the market, the putting into service and the use of artificial intelligence systems (AI systems) in the Union, in accordance with Union values, to promote the uptake of human centric and trustworthy artificial intelligence (AI) while ensuring a high level of protection of health, safety, fundamental rights as enshrined in the Charter of fundamental rights of the European Union (the 'Charter'), including democracy, the rule of law and environmental protection, against the harmful effects of AI systems in the Union, and to support innovation."

[2] artificial_intelligence_act.pdf "Union law on the protection of personal data, privacy and the confidentiality of communications applies to personal data processed in connection with the rights and obligations laid down in thi

# Challenge
Using the components above, what can you produce? Can you create a great article about the use of GenAI in the EU?

How about looking at one of your own buisness problems? The contents of this knowledge base was chosen because it is in the public domain. But this is much better applied to real-world buisness problems

If you want to talk mor about the GenAI products we are developing at PA Consulting please get in touch via email or linked in