Applied machine learning
Final assignment

Elena Skrtic
Simao Ferreira
Laura Kéri

In [1]:
import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

In [2]:
!pip install -r requirements.txt



In [3]:
from langchain.document_loaders import UnstructuredMarkdownLoader
from langchain.document_loaders import PyPDFLoader
from langchain.llms import HuggingFaceHub

loader = PyPDFLoader("/content/EU_Artificial_Inteligence_Act copy.pdf")

docs = loader.load()
len(docs)

10

In [4]:
#cutting the loaded pdf into smaller chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=64)
texts = text_splitter.split_documents(docs)
len(texts)

#creating embeddings for the chunks - with HuggingFace Embeddings
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="thenlper/gte-large",
    model_kwargs={"device": "cuda"},
    encode_kwargs={"normalize_embeddings": True},
)

query_result = embeddings.embed_query(texts[0].page_content)
print(len(query_result))

1024


In [5]:
#were gonna store the embeddings in a database - it will be easier to search in them
from langchain.vectorstores import Chroma

db = Chroma.from_documents(texts, embeddings, persist_directory="db")
results = db.similarity_search("Transformer models", k=2)
print(results[0].page_content)

model (i.e., “red-teaming”). • Ensure that an adequate level of both cybersecurity and physical protections are in place. • Document and report the estimated energy consumption of the model.


In [6]:
import torch
from langchain import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline


MODEL_NAME = "TheBloke/Llama-2-7B-Chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)


model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, trust_remote_code=True, device_map="cuda"
)

#creating configuration for text generation based on the specified model name
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)


generation_config.max_new_tokens = 1024


generation_config.temperature = 0.0001

#top-p sampling value
generation_config.top_p = 0.95


generation_config.do_sample = True

generation_config.repetition_penalty = 1.15


#creating a text generation pipeline using the initialized model, tokenizer, and generation configuration
text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=generation_config,
)

#creating a LangChain pipeline that wraps the text generation pipeline and set a specific temperature for generation
llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"temperature": 0})



In [8]:
from langchain.chains import RetrievalQA
from langchain import PromptTemplate

template = """
<s>[INST] <<SYS>>
Act as a laywer with expertise in AI legislation and data protection in the European Union. Use the following information to answer the question at the end.
<</SYS>>

{context}

{question} [/INST]
"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])


qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

result = qa_chain(
    "How does attention solves the deep learning problem? Explain like I am five."
)
print(result["result"].strip())

Hello there! *adjusts glasses* So, you want to know about attention in deep learning? Well, imagine you have a big box full of toys, and you want to find a specific toy inside. You could dig through the whole box, but that would take forever! Instead, you can use something called "attention" to help you find what you're looking for faster.
Attention is like a special tool that helps your computer focus on the most important parts of the data it's working with. It looks at all the different pieces of data and decides which ones are the most important, so it can use them to make predictions or do other things. Think of it like a flashlight shining on a dark room – it helps your computer see what's important and ignore the rest.
Now, when we train these deep learning models, they need to look at lots and lots of examples to learn how to do their job well. And sometimes, those examples can be really long or complicated, like a big ol' box of toys! That's where attention comes in – it helps

In [None]:
from textwrap import fill

result = qa_chain(
    "Explain the main implications of the European AI Act."
)
print(fill(result["result"].strip(), width=80))

As a lawyer specializing in AI legislation and data protection in the European
Union, I can explain the main implications of the European AI Act (AI Act). The
AI Act is a comprehensive framework aimed at advancing the use of artificial
intelligence (AI) while ensuring its safety and compliance with fundamental
rights. Here are the main implications of the AI Act: 1. Extraterritorial
Jurisdiction: The AI Act applies to any business or organization that offers an
AI system that impacts people within the European Union, regardless of where the
organization is headquartered. This means that non-EU companies will be subject
to the AI Act if they sell or use AI systems in the EU. 2. Safe and Respectful
AI Systems: The AI Act requires AI systems placed on the EU market to be safe
and respect fundamental rights, such as privacy, dignity, and non-
discrimination. Organizations must demonstrate compliance with these
requirements before placing their AI systems on the market. 3. Legal Certainty
a

In [7]:

model_name = "TheBloke/Llama-2-7B-Chat-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

text_gen_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

prompt = "Explain in simple terms the requirements for high-risk AI systems as outlined in the EU AI Act."

simplified_explanation = text_gen_pipeline(prompt, max_length=150)[0]['generated_text']

print(simplified_explanation)

ValueError: Found modules on cpu/disk. Using Exllama or Exllamav2 backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object

In [None]:

prompts = [
    "Explain what 'high-risk AI systems' means in the EU AI Act in simple terms for a non-technical audience."

    "Describe the obligations of AI system providers under the EU AI Act in easy-to-understand terms"
    "Summarize the main regulations for AI developers in the EU AI Act in plain language"
]

for prompt in prompts:
    result = text_gen_pipeline(prompt, max_length=512, clean_up_tokenization_spaces=True)[0]['generated_text']
    print(f"Prompt: {prompt}\nResult: {result}\n")

Gradio App

In [8]:
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, TextStreamer

MODEL_ID = "TheBloke/Llama-2-7B-Chat-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

config = AutoConfig.from_pretrained(MODEL_ID)
#config.quantization_config["use_exllama"] = True
config.quantization_config["disable_exllama"] = False
config.quantization_config["exllama_config"] = {"version":2}

model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="cuda", config=config)

prompt = "hi there"
prompt_template=f'''
[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]
'''

streamer = TextStreamer(tokenizer)
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
_ = model.generate(inputs=input_ids, streamer=streamer, max_new_tokens=512)

Using `disable_exllama` is deprecated and will be removed in version 4.37. Use `use_exllama` instead and specify the version with `exllama_config`.The value of `use_exllama` will be overwritten by `disable_exllama` passed in `GPTQConfig` or stored in your config file.


<s> 
[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
hi there[/INST]




Hello! I'm here to help you with any questions or concerns you may have. Please feel free to ask me anything, and I will do my best to provide you with accurate and helpful information. If a question doesn't make sense or is not factually coherent, I will let you know and explain why. And if I don't know the answer to a question, I will not provide false information. I'm here to help you in any way I can. How can I assist you today?</s>


In [11]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

model_name = "TheBloke/Llama-2-7B-Chat-GPTQ"


device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

text_gen_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0 if device == "cuda" else -1)

def generate_simplified_explanation(prompt):
    simplified_explanation = text_gen_pipeline(prompt, max_length=150, clean_up_tokenization_spaces=True)[0]['generated_text']
    return simplified_explanation

prompt = "Explain in simple terms the requirements for high-risk AI systems as outlined in the EU AI Act."
print(generate_simplified_explanation(prompt))


ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object

In [24]:
def simplify_legislation(legislation_text):
    simplified_text = text_gen_pipeline(legislation_text, max_length=150, clean_up_tokenization_spaces=True)[0]['generated_text']
    return simplified_text

interface = gr.Interface(fn=simplify_legislation,
                         inputs=gr.Textbox(lines=2),
                         outputs=gr.Textbox())

interface.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://c5a5b316aeba06b98f.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


