# Python notebook to run GPT-J (Dolly) evaluation

This notebook constitutes a test of the GPT-J Dolly version from Databricks, with no additional pre-training applied.

No weights are specifically context-configured, and the training set is smaller than that used for chatGPT(GPT3.5), so results may be less stable.


## Disclaimer
The training set(s) used may also contain abusive material ("The Pile") so the response may be abusive as a consequence of the model. Filtering should be applied to this output if deployed in a commercial/"human facing" context.


In [1]:
%pip install ipywidgets --quiet
%pip install ipyfilechooser --quiet


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [7]:
import sys
sys.path.append("training/")
import training
import ipyfilechooser
from ipyfilechooser import FileChooser
from IPython.display import display, Markdown


In [3]:


# COMMAND ----------

# Examples from https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html
test_instructions = [
    "Explain to me the difference between nuclear fission and fusion.",
    "Give me a list of 5 science fiction books I should read next.",
]
print("Model ready to use")

Model ready to use


In [13]:
import markdown
def process_prompt(prompt,query_gpt=False):


    default_model="databricks/dolly-v2-3b"
    suggested_models = [
        "databricks/dolly-v1-6b",
        "databricks/dolly-v2-3b",
        "databricks/dolly-v2-7b",
        "databricks/dolly-v2-12b",
    ]



    from training.generate import generate_response, load_model_tokenizer_for_generate

    input_model = default_model #dbutils.widgets.get("input_model")

    model, tokenizer = load_model_tokenizer_for_generate(input_model)
    print("Loaded_Tokenizer")
    # Use the model to generate responses for each of the instructions above.
    import time
    instruction=prompt
    print("instruction",instruction)
    from datetime import datetime
    print("START TIME: {}".format(datetime.utcnow()))
    start_time=time.perf_counter()
    if(query_gpt==True):
        response = generate_response(instruction, model=model, tokenizer=tokenizer)
    else:
        response=prompt
    if response:
        print(f"Instruction: {instruction}\n\n{response}\n\n-----------\n")
    print("query finished at UTC time {}".format(datetime.utcnow()))
    end_time=time.perf_counter()
    print("Query response took {} seconds".format(end_time-start_time))
    display(Markdown(response))
    
from ipywidgets import Textarea,interact_manual

interact_manual(process_prompt,
                prompt=Textarea(
                    default="Hello there what is your name?",
                    disabled=False,
                    description="GPT-J prompt"
                    )
                )

interactive(children=(Textarea(value='', description='GPT-J prompt'), Checkbox(value=False, description='query…

<function __main__.process_prompt(prompt, query_gpt=False)>

In [19]:
path_PDF="C:\\Users\\a902722\\Downloads\\atos-building-tomorrows-city-grenoble-case-study.pdf"

import pypdf
npages=10
def concatenate_pdf_text(pdf_document,max_npages=99999):
    #@param pdf to read
    #@param max number of pages (to respect GPT token limit)"
    reader=pypdf.PdfReader(pdf_document)
    if(len(reader.pages)==0):
        print("EMPTY PDF")
        return ""
    else:
        if(len(reader.pages)>8):
            print("PDF is longer than 8 pages, GPT might hit a character limit of 4096 tokens ~ 4kish words - use at your own risk")
        
        text=""
        ctr=0
        for page in reader.pages:
            if(ctr>max_npages):
                break
            text+=page.extract_text()
            ctr+=1
        return text
    
text_converted=concatenate_pdf_text(path_PDF)

prompt="Tell me one fact about the following text: {}".format(text_converted)
print(prompt)

process_prompt(prompt,query_gpt=True)

Tell me one fact about the following text: Case study
Y our business technologists. Powering progress Atos Worldgrid builds a smart grid for a large sustainable 
urban development project in Grenoble
A flagship project in France’s ÉcoCité urban planning program, Grenoble Presqu’île aims to  
create a new, sustainable living space between the science campus and the historic Grenoble city. 
The smart grid, which is being developed by Atos Worldgrid in partnership with Gaz Électricité 
de Grenoble, will enable better energy management, which is one of the key challenges for this trailblazing ÉcoCité program.building
tomorrow’s city
© J.M. Francillon / Ville de Grenoble
with better energy management and consumptionatos.netAtos, the Atos logo, Atos Consulting, Atos Worldgrid, Worldline, BlueKiwi, Bull, Canopy the Open Cloud Company, Yunano, Zero Email, Zero Email Certified and The Zero Email Company are 
registered trademarks of the Atos group. July 2015. © 2015 AtosFor more information:Ple

Case study: Y your business technologists