<a href="https://colab.research.google.com/github/ernestmucheru/Admission-system/blob/master/ICKB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction
This notebook is a research and test model to explore the possibility of having our own internal chatbot at Cellulant powered using GPT 3.5-turbo running on our own internal knowledge.

#Linking the data to feed the model with 
For the demonstration purposes we are going to use a temporary guthub repo as our knowledge base.

This research is still in the initial stages and there's still a chance to package this in a more consumable manner that easily allows CRUD operations and permissions.

In [2]:
! git clone https://github.com/ernestmucheru/context_data.git

fatal: destination path 'context_data' already exists and is not an empty directory.


# Install the dependicies
Run the code below to install the depencies we need for our functions

In [3]:
!pip install llama-index
!pip install langchain

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting llama-index
  Downloading llama_index-0.5.16.tar.gz (175 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m175.9/175.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting dataclasses_json
  Downloading dataclasses_json-0.5.7-py3-none-any.whl (25 kB)
Collecting langchain>=0.0.123
  Downloading langchain-0.0.141-py3-none-any.whl (540 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m540.8/540.8 kB[0m [31m21.8 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=0.26.4
  Downloading openai-0.27.4-py3-none-any.whl (70 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.3/70.3 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken
  Downloading tiktoken-0.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━

# Define the functions
The following code defines the functions we need to construct the index and query it

In [4]:
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper, ServiceContext
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display

def construct_index(directory_path):
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 2000
    # set maximum chunk overlap
    max_chunk_overlap = 20
    # set chunk size limit
    chunk_size_limit = 600 

    # define prompt helper
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    # define LLM
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=num_outputs))
 
    documents = SimpleDirectoryReader(directory_path).load_data()
    
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
    index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

    index.save_to_disk('index.json')

    return index

def ask_ai():
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    while True: 
        query = input("What do you want to ask? ")
        response = index.query(query)
        display(Markdown(f"Response: <b>{response.response}</b>"))

# Set OpenAI API Key
You need an OPENAI API key to be able to run this code.

If you don't have one yet, get it by [signing up](https://platform.openai.com/overview). Then click your account icon on the top right of the screen and select "View API Keys". Create an API key.

Then run the code below and paste your API key into the text input.

In [7]:
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI key here and hit enter:")

Paste your OpenAI key here and hit enter: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


#Construct an index
Now we are ready to construct the index. This will take every file in the folder 'data', split it into chunks, and embed it with OpenAI's embeddings API.

**Notice:** running this code will cost you credits on your OpenAPI account ($0.02 for every 1,000 tokens). If you've just set up your account, the free credits that you have should be more than enough for this experiment.

In [8]:
construct_index("context_data/data")

<llama_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex at 0x7f91ac38fa30>

#Ask questions
It's time to have fun and test our AI. Run the function that queries GPT and type your question into the input. 

Our test knowledge base currently contains information regarding to our Group People Policies. Ask the model any question in regards to our policies: E.g
1. What kind of leaves does Cellulant offer?
2. How many leave days I'm I entitled to?
3. And so on...

Reach out to ernest.mucheru@cellulant.io for any further queries.

In [None]:
ask_ai()

What do you want to ask? How many leave days I'm I entiltled to?


Response: <b>

The number of leave days you are entitled to depends on the country of residence of the employee and the Employment Act of that country. Generally, you are entitled to a maximum of 30 days of sick leave at full pay in the case of illness or other incapacity and to an additional 30 days at half pay in a period of 12 months. You are also entitled to five consecutive paid family responsibility leave days per annual leave cycle, three months' maternity leave, and time off for pre-natal appointments. Additionally, you may be eligible for an annual increase process and performance bonus while on maternity leave, and you may not work for six weeks after the birth of the child unless a medical practitioner certifies that you are fit to do so. On return to work, you must return on no less favourable terms and conditions of employment as those you had enjoyed before commencing maternity leave.</b>