<a href="https://colab.research.google.com/github/nicteal/NewsletterAlbert/blob/main/LLama_Version_Personal_Chatbot_from_ST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to the ST Chatbot Colab!

This Colab notebook contains all of the code you need to make a basic chatbot that will answer questions about a corpus of text. Colab is a cloud-based programming environment which will let you run all of this code from your browser.

At each step, follow the written instructions and press the "play" button next to the code sample in order to run it.

**Important Note:** This is a basic chatbot running on a limited selection of articles. It's only a starting point to show you what's possible!

## 1. Download our text corpus

The first thing we need to do is download the text our chatbot is going to use as reference material for answering questions.

In the Chatbot, I used every article he's written as the text corpus. But for this public codebase, I've collected two articles from his archive that we can use as a starting point.


In [None]:
! git clone https://github.com/nicteal/NewsletterAlbert

Cloning into 'NewsletterAlbert'...
remote: Enumerating objects: 13, done.[K
remote: Counting objects: 100% (13/13), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 13 (delta 6), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (13/13), 86.09 KiB | 1.12 MiB/s, done.


# 2. Install our dependencies and define our functions

In this section we'll install GPT Index and Langchain. We'll also define the functions that we'll use later to construct our index and query it.

First, let's install our dependencies.

In [None]:
!pip install llama_index
!pip install langchain

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting llama_index
  Downloading llama_index-0.4.35.post1.tar.gz (149 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m149.2/149.2 KB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting dataclasses_json
  Downloading dataclasses_json-0.5.7-py3-none-any.whl (25 kB)
Collecting langchain
  Downloading langchain-0.0.119-py3-none-any.whl (420 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m420.2/420.2 KB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=0.26.4
  Downloading openai-0.27.2-py3-none-any.whl (70 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.1/70.1 KB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken
  Downloading tiktoken-0.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━

Now, we'll define the functions we're going to use later in order to construct our index and query it.

In [None]:
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display

def construct_index(directory_path):
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 256
    # set maximum chunk overlap
    max_chunk_overlap = 20
    # set chunk size limit
    chunk_size_limit = 600

    # define LLM
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=num_outputs))
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
 
    documents = SimpleDirectoryReader(directory_path).load_data()
    
    index = GPTSimpleVectorIndex(
        documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
    )

    index.save_to_disk('index.json')

    return index

def ask_ST():
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    while True: 
        query = input("What do you want to ask ST Engineering? ")
        response = index.query(query, response_mode="compact")
        display(Markdown(f"ST Bot says: <b>{response.response}</b>"))
  

# 3. Set OpenAI API Key
In order to run this notebook you'll need an API key from OpenAI. 

If you don't have one already, you can grab one by [signing up](https://platform.openai.com/overview). Then click your account icon on the top right of the screen and select "View API Keys". Create an API key.

Then run the code below and paste it into the text input.



In [None]:
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI API key here and hit enter:")



Paste your OpenAI API key here and hit enter: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


# 4. Construct Index

Now we're going to construct our index. This will take every file in the folder 'NewsletterAlbert', split it into chunks, and embed it with OpenAI's embeddings API.

**Important Note:** This step costs money. Running it on the text corpus we've given you by default should only cost $0.03 in total. But if you use other pieces of text be careful if they're really long.


In [None]:
construct_index('/content/NewsletterAlbert')

<llama_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex at 0x7f32b41a26d0>

# 5. Ask Questions!

Now we'll run the "ask_ST" function we defined above. 

This will prompt the you to input a question, and then it will find chunks of text that might answer the question, and summarize the answer from those text chunks using GPT-3.

Remember, in this public Colab file we're only using two of websites. So it will only answer questions from:

- https://www.stengg.com/en/investor-relations/faq/ 
- https://en.wikipedia.org/wiki/ST_Engineering 



Again, this step costs money. So be aware!

In [None]:
ask_ST()

What do you want to ask ST Engineering? Who is the CEO?


ST Bot says: <b>
The CEO is Tan Pheng Hock.</b>

What do you want to ask ST Engineering? Who is the CEO of ST Engineering?


ST Bot says: <b>
The CEO of ST Engineering is Vincent Chong.</b>

What do you want to ask ST Engineering? Who is ST Engineering?


ST Bot says: <b>
ST Engineering is a major player in the defence and military industries. It is an Asia-based engineering group that provides commercial and defence services to multiple industries. It was founded on 8 December 1997 and has since grown to become one of the largest defence and engineering groups in Asia. In 2021, it was ranked 61st in the Stockholm International Peace Research Institute's list of the world's top 100 defence manufacturers. It has sold defence products to over 100 countries.</b>