## Skeleton of Thought method of generation

LLM decoding is normally sequential. In this method of decoding, the LLM first generates a skeleton of the response. Then it elaborates on each point in the skeleton concurrently.

This method of decoding resembles how humans approach a problem - First generate the outline of a solution and then do parallel processing

You can read more in my blog here: https://generativeai.pub/skeleton-of-thought-processing-0980d9b75f52

![skeleton-of-thought-process.png](images/sot.jpg)

## Skeleton of thought step by step

In [4]:
import os
import json
from dotenv import load_dotenv
load_dotenv()

True

In [5]:
from openai import OpenAI
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") ## Put your OpenAI API key here

In [6]:
class Gpt4Turbo:
    def __init__(self):
        self.MODEL = 'gpt-3.5-turbo'
        self.TOKEN_LIMIT=4000
        self.client = OpenAI()

    def gptCall_json(self, temperature, messages: list):
        try:
            response = self.client.chat.completions.create(model=self.MODEL,
                                                    messages=messages,
                                                    temperature=temperature,
                                                    max_tokens=self.TOKEN_LIMIT,
                                                    stream=False,
                                                    response_format={"type": "json_object"}) ## Enforce output format

            output = response.choices[0].message.content
            return output

        except Exception as e:
            print(e)
            return ""

## Prompt to generate the skeleon outline

In [7]:
question = "How can I improve my time management skills?"
outline_prompt = f'''
You're an organizer responsible for only giving the skeleton (not the full content) for answering the question.
Provide the skeleton as a JSON to answer the question. Instead of writing a full sentence, each skeleton point should
be very short with only 2~5 words. Generally, the skeleton should have 3~10 points. The skeleton is an outline that would be expanded later.
Don't elaborate on the point in the skeleton.
Example:
\n\nQuestion:\nWhat are the typical types of Chinese dishes?: \n Response: {{"answer" : ["Dumplings" , "Noodles" , "Dim Sum" , "Hot Pot" , "Wonton", "Ma Po Tofu", "Char Siu", " Fried Rice"]}}.
\n\nQuestion:\nWhat are some practical tips for individuals to reduce their carbon emissions?\n Response: {{ "answer" :["Energy Conservation", "Efficient transportation", "Home Energy Efficiency", "Reduce Water Consumption", "Sustainable Diet", "Sustainable Travel"]}}

 \n\nNow, please provide the skeleton for the following question.\n{question}\n Response: {{"answer": [...]}}
'''

In [8]:
TEMPERATURE=0.5
message=[]

message.append({"role": "system", "content": "You are a helpful assistant. You respond in JSON format."})
message.append({"role": "user", "content": outline_prompt})


final_output = []
gpt4_turbo = Gpt4Turbo()
result = gpt4_turbo.gptCall_json(TEMPERATURE,message)
result = json.loads(result)['answer']
print(result)

['Set goals', 'Prioritize tasks', 'Create a schedule', 'Limit distractions', 'Delegate when possible', 'Take breaks', 'Use time management tools']


Nice! So we got the model to five us a skeleton of the output

### Prompt to elaborate on a point

In [9]:
point = result[0]
point_prompt = f'''
You help elaborate on the point user wants. Your input is a question and one possible answer from the question, also called <point>. You will elaborate on the <point> and give a 2-3 sentence response
on how the <point> helps answer the question. Start your response by mentioning the <point> and then colon like point: and then your response
Your response will be in JSON format. Example: {{"answer": {point}: your response"}}
\n\nNow, please elaborate on the following point. Question: {question}\n <Point> : {point} \n Response: {{"answer": [...]}}
'''

In [10]:
TEMPERATURE=0.3
message=[]

message.append({"role": "system", "content": "You are a helpful assistant. You respond in JSON format."})
message.append({"role": "user", "content": point_prompt})


gpt4_turbo = Gpt4Turbo()
result = gpt4_turbo.gptCall_json(TEMPERATURE,message)
print(result)

{"answer": "Set goals: Setting goals is crucial for improving time management skills as it provides a clear direction and purpose for your tasks. By setting specific, measurable, achievable, relevant, and time-bound (SMART) goals, you can prioritize your activities, stay focused, and track your progress effectively."}


## Putting both together including concurrent calls

In [11]:
import concurrent.futures
import json

In [13]:
class Gpt4Turbo:
    def __init__(self):
        self.MODEL = 'gpt-3.5-turbo-1106'
        self.TOKEN_LIMIT=4000
        self.client = OpenAI()
        self.temperature =0.3
        self.streaming = False

    def gptCall_json(self, temperature, messages: list):
        try:
            response = self.client.chat.completions.create(model=self.MODEL,
                                                    messages=messages,
                                                    temperature=temperature,
                                                    max_tokens=self.TOKEN_LIMIT,
                                                    stream=False,
                                                    response_format={"type": "json_object"}) ## Enforce output format


            return response.choices[0].message.content

        except Exception as e:
            print(e)
            return ""

    def generate_skeleton(self):
        question = self.question
        outline_prompt = f'''
        You're an organizer responsible for only giving the skeleton (not the full content) for answering the question.
        Provide the skeleton as a JSON to answer the question. Instead of writing a full sentence, each skeleton point should
        be very short with only 2~5 words. Generally, the skeleton should have 3~10 points. The skeleton is an outline that would be expanded later.
        Don't elaborate on the point in the skeleton.
        Example:
        \n\nQuestion:\nWhat are the typical types of Chinese dishes?: \n Response: {{"answer" : ["Dumplings" , "Noodles" , "Dim Sum" , "Hot Pot" , "Wonton", "Ma Po Tofu", "Char Siu", " Fried Rice"]}}.
        \n\nQuestion:\nWhat are some practical tips for individuals to reduce their carbon emissions?\n Response: {{ "answer" :["Energy Conservation", "Efficient transportation", "Home Energy Efficiency", "Reduce Water Consumption", "Sustainable Diet", "Sustainable Travel"]}}

        \n\nNow, please provide the skeleton for the following question.\n{question}\n Response: {{"answer": [...]}}
        '''

        ## Make the message
        message=[]
        message.append({"role": "system", "content": "You are a helpful assistant. You respond in JSON format."})
        message.append({"role": "user", "content": outline_prompt})

        result = self.gptCall_json(self.temperature, message)
        result = json.loads(result)
        self.result = result['answer']


    def elaborate_point(self, point):

        question = self.question

        point_prompt = f'''
        You help elaborate on the point user wants. Your input is a question and one possible answer from the question, also called <point>. You will elaborate on the <point> and give a 2-3 sentence response
        on how the <point> helps answer the question. Start your response by mentioning the <point> and then colon like point: and then your response
        Your response will be in JSON format. Example: {{"answer": {point}: your response"}}
        \n\nNow, please elaborate on the following point. Question: {question}\n <Point> : {point} \n Response: {{"answer": [...]}}
        '''

        ## Make the message
        message=[]
        message.append({"role": "system", "content": "You are a helpful assistant. You respond in JSON format."})
        message.append({"role": "user", "content": point_prompt})

        result = self.gptCall_json(self.temperature, message)
        point_elaborate = json.loads(result)
        return point_elaborate['answer']


    def concurrent_results(self, question):
        self.question = question
        self.generate_skeleton()
        num_points = len(self.result)
        # Create a thread pool executor with 5 threads
        with concurrent.futures.ThreadPoolExecutor(max_workers=num_points) as executor:
            # Submit the API calls to the executor
            outputs = [executor.submit(self.elaborate_point, point) for point in self.result]
            # Wait for the API calls to complete and get the results
            results = [future.result() for future in concurrent.futures.as_completed(outputs)]

        # Use list comprehension to add enumeration and "\n" each record
        string_list = [f"{i+1}. {record}\n" for i, record in enumerate(results)]

        # Join the string_list elements into a single string
        final_output = ''.join(string_list)
        return final_output

In [14]:
%%time
gpt4_turbo = Gpt4Turbo()
question = "How do I best manage my time?"
result_sot = gpt4_turbo.concurrent_results(question)
print(result_sot)

1. Using productivity tools can help you manage your time more effectively by providing features such as task lists, reminders, and time tracking. These tools can help you prioritize tasks, set deadlines, and track your progress, ultimately leading to better time management and increased productivity.
2. Set goals: Setting goals helps you prioritize your tasks and allocate time effectively. By defining clear objectives, you can focus on activities that align with your goals, leading to better time management and productivity.
3. Take breaks: Taking breaks can actually improve productivity and time management. By giving yourself short breaks, you can recharge and maintain focus, ultimately leading to better time management and efficiency in completing tasks.
4. Prioritize tasks: Prioritizing tasks helps you focus on the most important and urgent activities, allowing you to allocate your time and resources effectively. By identifying and tackling high-priority tasks first, you can ensure

In [15]:
import tiktoken
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
num_tokens = len(encoding.encode(result_sot))
print(f"Number of tokens: {num_tokens}")

Number of tokens: 420


It took 4.7s to generate an output that was 360 tokens



### General ChatGPT

In [18]:
%%time
gpt4_turbo = Gpt4Turbo()

message=[]
message.append({"role": "system", "content": "You are a helpful assistant. Respond in JSON format"})
message.append({"role": "user", "content": f'Answer the user question below as a LONG answer of atleast 8 sentences. Give the answer in bullets.  Question: {question}. Answer: {{"answer" : ...}}'})

single_result = gpt4_turbo.gptCall_json(temperature=0.3, messages=message)
single_result = json.loads(single_result)
print(single_result['answer'])

['Set clear goals and prioritize tasks based on their importance and deadlines', 'Create a daily schedule or to-do list to stay organized and focused', 'Use time management tools such as calendars, planners, or apps to track your activities', 'Break large tasks into smaller, manageable chunks to avoid feeling overwhelmed', 'Minimize distractions by setting specific times for checking emails and social media', 'Delegate tasks when possible to free up time for more important responsibilities', 'Take regular breaks to avoid burnout and maintain productivity', 'Evaluate your time management regularly and make adjustments as needed to improve efficiency']
CPU times: user 13.4 ms, sys: 4.86 ms, total: 18.2 ms
Wall time: 2.91 s


In [19]:
import tiktoken
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
num_tokens = len(encoding.encode(single_result['answer']))
print(f"Number of tokens: {num_tokens}")

TypeError: expected string or buffer

It took 3.6s to generate an output that was 356 tokens