# Summary

Toying around with a custom pdb class for language model-assisted debugging.

TODO

- [x] test prompt in playground (maybe exclude the "full source" kwarg?)
- [x] port prompt to yaml file
- [x] enable load_prompt/kwargs etc in LMdb init
- [x] consider how we filter locals and globals (currently filter out everything w/ a leading underscore and also do some rather clumsy filtering to make sure global is used in script. But might be able to do better here.)
- [x] consider whether to rm some fields (header, globals, code_full) from get_prompt_kwargs method OR include them in prompt
- consider if there's a good way to make this more conversational in case we need to ask multiple questions. If we just print gpt's response, this won't work so well. Could try to revise this to fit into ConvManager paradigm.
- consider tweaking prompt to use proxy/authority (e.g. "Answer Key")
- consider adding option for "I don't know"
    - Or maybe something like "If you don't know what's causing the bug, say "I don't know". Then write a list of 5 plausible causes that the developer can check for when debugging." (take advantage of its strength at generating list, thinking of possibilities we might not)
- consider how to handle huge data structures (big df, long list, etc.)
~ - See if we can get this to work like ipdb where you can call it only AFTER an error occurs.
- hide user warning about using codex model name.
- debug slowness when using magic (is it calling query multiple times?)
~ - add option to add new cell w/ gpt-fixed function below (may need to adjust prompt a bit to encourage it to provide this)

UPDATE: Something weird going on here. Openai response sometimes looks normal, sometimes very weird (like function was called many times repeatedly - maybe some multiproc/multithreaded thing happening?). When I tried hardcoding other backends (search "partial" or see DebugMagic.lmdb method), the reply appears to be empty. However, the global var `_roboduck_last_completion` gets updated with the expected response. Might be related to the sys.displayhook usage in the self.shell.debugger call (uncomment the source.getlines calls in the DebugMagic.lmdb method).

UPDATE 2: sometimes just need to restart kernel. mock/repeat backends now work as expected.

- maybe update prompt(s) to indicate that we are inside a debugger? Otherwise it might be confusing -  if all locals are params, it might seem like we're just telling gpt3 the args.
    - should we be passing in 1 code snippet but a whole sequence of states? That might be better.
- Think more about whether main use case is error explanation (in which case customized stack trace like pretty_errors might make more sense), natural language debugging (in which case we want to focus more on the conversational/sequential nature, maintain series of states, etc.), or static analysis (in which case a jupyter extension or magic that lets us type questions might be ideal).

NOTES

Considerations on how to enter qa mode:

Option 1. Launch some sort of repl here, then let the user type
natural language questions until they want to exit. This would be
nice but maybe a bit tricky - seems like pdb may use toolkit already
because using prompt here throws an error indicating we're already
in an event loop.

Option 2: prefix every question with "chat" or some command "Q:".
Have to check if that's possible.

Option 3: try to override default action selection so that if we
type something that looks like natural language rather than a couple
variable names (maybe something ending in or containing a question 
mark) we query gpt instead of trying to eval vars.

In [1]:
import cmd
from colorama import Fore, Style
from contextlib import redirect_stdout
import hashlib
import inspect
from IPython.display import display, Javascript
from IPython.core.magics import NamespaceMagics
from IPython.core.magic import cell_magic, line_cell_magic, line_magic, \
    magics_class, Magics, no_var_expand
from IPython.core.magic_arguments import argument, magic_arguments, \
    parse_argstring
import ipynbname
import pandas as pd
from pdb import Pdb
from prompt_toolkit import prompt
import pyperclip
import sys
import time

from htools import *
from jabberwocky.openai_utils import GPT, load_prompt, GPTBackend

Object loaded from /Users/hmamin/jabberwocky/data/misc/gooseai_sample_responses.pkl.


In [2]:
def save_notebook(file_path):
    """Adapted from
    https://stackoverflow.com/questions/32237275/save-an-ipython-notebook-programmatically-from-within-itself/57814673#57814673
    """
    def file_md5(path):
        with open(path, 'rb') as f:
            text = f.read()
        return hashlib.md5(text).hexdigest()
    
    start_md5 = file_md5(file_path)
    display(Javascript('IPython.notebook.save_checkpoint();'))
    current_md5 = start_md5
    
    while start_md5 == current_md5:
        time.sleep(1)
        current_md5 = file_md5(file_path)

In [3]:
# Adapted from cli.ReadmeUpdater method.
def load_ipynb(path, save_if_self=True):
    """Loads ipynb and formats cells into 1 big string.

    Parameters
    ----------
    path: Path

    Returns
    -------
    str
    """
    if save_if_self:
        try:
            self_path = ipynbname.path()
        except FileNotFoundError:
            pass
        else:
            if self_path == path:
                save_notebook(path)

    with open(path, 'r') as f:
        cells = json.load(f)['cells']
        
    cell_str = ''
    for cell in cells:
        if not cell['source']: continue
        source = '\n' + ''.join(cell['source']) + '\n'
        if cell['cell_type'] == 'code':
            source = '\n```' + source + '```\n'
        cell_str += source
    return cell_str

In [4]:
def colored(text, color):
    color = getattr(Fore, color.upper())
    return f'{color}{text}{Style.RESET_ALL}'

In [5]:
{name: is_ipy_name(name)
 for name in ('_1', '_99', '_', '__', '_1_', '_a', '__1')}

{'_1': True,
 '_99': True,
 '_': True,
 '__': True,
 '_1_': False,
 '_a': False,
 '__1': True}

In [6]:
# Set new var on line below and do NOT save.
qqq = 'xcz,vl lzvjxc'
tmp = load_ipynb(ipynbname.path())
assert qqq in tmp

<IPython.core.display.Javascript object>

In [7]:
# Set new var on line below and do NOT save. If you don't change the var, the
# test will generally fail bc a previous version of the nb will have had the
# var value.
qqq = 'zazzzzzzzz eoiqur wqopasdfasferu'
tmp = load_ipynb(ipynbname.path(), save_if_self=False)
assert qqq not in tmp

In [48]:
class RoboDuckDB(Pdb):
    
    def __init__(self, *args, backend='openai', model=None, 
                 full_context=False, log=False, **kwargs):
        super().__init__(*args, **kwargs)
        self.prompt = '>>> '
        self.duck_prompt = '[RoboDuck] '
        self.gpt = GPTBackend(log_stdout=False)
        # TODO: this does seem to remove the handler from handlers but their
        # must be some other trace of it because we still log to stdout.
        self.gpt.handlers = [handler for handler in self.gpt.logger.handlers 
                             if 'stdout' not in str(handler)]
        self.query_kwargs = load_prompt(
            'debug_full' if full_context else 'debug', 
            verbose=False
        )
        self.prompt_template = self.query_kwargs.pop('prompt')
        if model is not None:
            self.query_kwargs['model'] = model
        self.backend = backend
        self.full_context = full_context
        self.log = log
        self._last_completion = ''
    
    def _get_prompt_kwargs(self):
        res = {}
        
        # Get current code snippet.
        try:
            res['code'] = inspect.getsource(self.curframe)
        except OSError as err:
            self.error(err)
        res['local_vars'] = {k: v for k, v in self.curframe_locals.items() 
                             if not is_ipy_name(k)}
            
        # Get full source code if necessary.
        if self.full_context:            
            # File is a string, either a file name or something like
            # <ipython-input-50-e97ed612f523>.
            file = inspect.getsourcefile(self.curframe.f_code)
            if file.startswith('<ipython'):
                res['full_code'] = load_ipynb(ipynbname.path())
                res['file_type'] = 'jupyter notebook'
            else:
                res['full_code'] = load(file, verbose=False)
                res['file_type'] = 'python script'
            used_tokens = set(res['full_code'].split())
        else:   
            # This is intentionally different from the used_tokens line in the
            # if clause - we only want to consider local code here.
            used_tokens = set(res['code'].split())
            
        # TODO: code.split() might not work so well in some cases.
        # Namespace is often polluted with lots of unused globals (htools is
        # very much guilty of this 😬) and we don't want to clutter up the 
        # prompt with these.
        res['global_vars'] = {k: v for k, v in self.curframe.f_globals.items() 
                              if k in used_tokens and not is_ipy_name(k)}
        return res

    def onecmd(self, line):
        """Interpret the argument as though it had been typed in response
        to the prompt.
        Checks whether this line is typed at the normal prompt or in
        a breakpoint command list definition.
        """
        if not self.commands_defining:
            if '?' in line:
                return self.ask_language_model(line)
            return cmd.Cmd.onecmd(self, line)
        else:
            return self.handle_command_def(line)
        
    def ask_language_model(self, question):
        # TODO: maybe should reconstruct each time q is asked? State changes,
        # that's the whole point of this debugger.
        prompt_kwargs = self._get_prompt_kwargs()
        prompt = self.prompt_template.format(question=question, 
                                             **prompt_kwargs)
        if len(prompt.split()) > 1_000:
            warnings.warn(
                'Prompt is very long (>1k words). You\'re approaching a risky'
                ' zone where your prompt + completion might exceed the max '
                'sequence length.'
            )
        # TODO rm
        print(colored(prompt, 'red'))
        
        # TODO: could we somehow use convmanager here? Given that I envisioned
        # this as a conversation with the kernel/interpreter/script/something.
        # TODO: maybe add option in gpt.query to avoid printing to stdout. For
        # now, just use redirect_stdout here to see what result will look 
        # like.
        # TODO: temporarily disabled logging.
        print(colored('Typing...', 'green'), end='\r')
        with self.gpt(self.backend, verbose=False):
            res, full = self.gpt.query(prompt, **self.query_kwargs, 
                                       log=self.log)
        answer = res[0].strip() or 'Sorry, I don\'t know. Can you try '\
            'rephrasing your question?'
        print(colored(f'{self.duck_prompt} {answer}', 'green'))
        
        # TODO: when called from magic, ipython seems to delete reference to 
        # this obj so for now store it as a global var so we can try inserting
        # a new cell.
        self._last_completion = answer
        global _roboduck_last_completion
        _roboduck_last_completion = answer

In [49]:
@magics_class
class DebugMagic(Magics):

    @magic_arguments()
    @argument('-i', action='store_true', 
              help='Boolean flag: if provided, INSERT a new code cell with '
                   'the suggested code fix.')
    @line_magic
    def duck(self, line='', cell=None):
        """Silence warnings for a cell. The -p flag can be used to make the
        change persist, at least until the user changes it again.
        """
        args = parse_argstring(self.duck, line)
        cls = self.shell.debugger_cls
        # TODO: change partial back to just RoboDuckDB
        self.shell.debugger_cls = self.shell.InteractiveTB.debugger_cls = partial(
            RoboDuckDB, backend='openai', log=True)
#         print(inspect.getsource(self.shell.debugger))
#         hr()
#         print(inspect.getsource(self.shell.InteractiveTB.debugger))
#         hr()
#         print(self.shell.InteractiveTB.debugger_cls)
#         print(self.shell.InteractiveTB.pdb)

        print('pdb:', self.shell.pdb)
        self.shell.debugger(force=True)
        print('pdb:', self.shell.pdb)
        if args.i:
#             self.shell.set_next_input(self.shell.pdb._last_completion, 
#                                       replace=False)
            self.shell.set_next_input(_roboduck_last_completion, 
                                      replace=False)
        self.shell.debugger_cls = self.shell.InteractiveTB.debugger_cls = cls
        
get_ipython().register_magics(DebugMagic)

In [50]:
def roboduck(backend='openai', model=None):
    # Equivalent of native breakpoint().
    RoboDuckDB(backend=backend, model=model).set_trace(sys._getframe().f_back)

In [51]:
def foo(x):
    for i in range(x):
        roboduck()
        print(2 / (i - 3))

In [52]:
# def bubble_sort(nums):
#     for i in range(len(nums)):
#         for j in range(len(nums)):
#             if nums[j] > nums[j + 1]:
#                 nums[j], nums[j + 1] = nums[j + 1], nums[j]
#             roboduck()
#     return nums

In [53]:
def bubble_sort(nums):
    for i in range(len(nums)):
        for j in range(len(nums) - 1):
            if nums[j] > nums[j + 1]:
                nums[j + 1] = nums[j]
                nums[j] = nums[j + 1]
            roboduck()
    return nums

In [54]:
nums_ = [9, 9, 9]

In [55]:
# def bubble_sort(nums):
#     for i in range(len(nums)):
#         for j in range(len(nums) - 1):
#             if nums[j] > nums[j + 1]:
#                 nums[j], nums[j + 1] = nums[j + 1], nums[j]
# #             roboduck()
#     return nums_

In [56]:
# Set some globals.
z = 100
a = ['a', 'b', 'c']

In [57]:
print('This is some output.')

This is some output.


In [None]:
bubble_sort([5, 2, 4, 4, 3, 1, 9, 17, 7])

[27] > [33;01m<ipython-input-53-eaceea408aaf>[00m([36;01m3[00m)bubble_sort()
-> for j in range(len(nums) - 1):
   1 frame hidden (try 'help hidden_frames')
>>> nums
[5, 5, 4, 4, 3, 1, 9, 17, 7]
>>> Why does nums contain multiple 5s when the input has 1?
[31m"""
This code snippet is not working as expected. Help me debug it. First read my question, then examine the snippet of code that is causing the issue and look at the values of the local and global variables. Ignore the roboduck() function call - it is merely for debugging. Finally, explain what the problem is and how to fix it. If you don't know what the problem is, list a few possible causes or things I could try in order to identify the issue. Use simple language because I am a beginning programmer.

QUESTION:
Why does nums contain multiple 5s when the input has 1?

CURRENT CODE SNIPPET:
def bubble_sort(nums):
    for i in range(len(nums)):
        for j in range(len(nums) - 1):
            if nums[j] > nums[j + 1]:
        

In [69]:
# Uncomment roboduck() line in func def cell before running this one.
buggy_sort([5, 2, 4, 4, 3, 1, 9, 17, 7])

IndexError: list index out of range

In [42]:
# Re-comment the chat_db() line.
buggy_sort([5, 2, 4, 4, 3, 1, 9, 17, 7])

IndexError: list index out of range

In [23]:
# Note: couldn't get cell magic version working so far. Says:
# "UsageError: %%lmdb is a cell magic, but the cell body is empty. Did you
# mean the line magic %lmdb (single %)?"
# Even when I try to define the method with all the same settings as the 
# default class.
%duck -i

pdb: False
[1] > [33;01m<ipython-input-16-2c5183ef161b>[00m([36;01m4[00m)buggy_sort()
-> if nums[j] > nums[j + 1]:
[RoboDuck]j
8
[RoboDuck]Why did this code work for the first 8 iterations but only failed on the 9th?
This code snippet is not working as expected. Help the developer debug it. First read their question, then examine the snippet of code that is causing the issue and look at the values of the local and global variables. Ignore the roboduck() function call - it is merely for debugging. Finally, explain what the problem is and how to fix it. If you don't know what the problem is, list a few possible causes or things the developer could try in order to narrow in on the issue. Use simple language a beginning programmer could understand.

QUESTION:
Why did this code work for the first 8 iterations but only failed on the 9th?

CURRENT CODE SNIPPET:
def buggy_sort(nums):
    for i in range(len(nums)):
        for j in range(len(nums)):
            if nums[j] > nums[j + 1]:
   

  f'Allowing model "{model}" to pass through because '


[RoboDuck] The problem is that the second for loop is iterating over the entire list, including the last element. This causes an error because the last element has no element after it to compare to. The fix is to change the second for loop to iterate over the list up to the second to last element.

def buggy_sort(nums):
    for i in range(len(nums)):
        for j in range(len(nums) - 1):
            if nums[j] > nums[j + 1]:
                nums[j], nums[j + 1] = nums[j + 1], nums[j]
#             roboduck()
    return nums

QUESTION:
Why did this code work for the first 8 iterations but only failed on the 9th?

CURRENT CODE SNIPPET:
def buggy_sort(nums):
[RoboDuck]q
pdb: False


In [None]:
The problem is that the second for loop is iterating over the entire list, including the last element. This causes an error because the last element has no element after it to compare to. The fix is to change the second for loop to iterate over the list up to the second to last element.

def buggy_sort(nums):
    for i in range(len(nums)):
        for j in range(len(nums) - 1):
            if nums[j] > nums[j + 1]:
                nums[j], nums[j + 1] = nums[j + 1], nums[j]
#             roboduck()
    return nums

QUESTION:
Why did this code work for the first 8 iterations but only failed on the 9th?

CURRENT CODE SNIPPET:
def buggy_sort(nums):

In [None]:
The code is trying to access an index that doesn't exist. The index error is raised on the line with the if statement. The problem is that the code is trying to access nums[j + 1] when j is equal to 8. The last index in the list is 7, so there is no index 8.

The code should be fixed by changing the range of the inner for loop to range(len(nums) - 1). This will prevent the code from trying to access an index that doesn't exist.

In [None]:
The problem is that the range of the second for loop is len(nums), which is 9. The last iteration of the loop will be when j is 8, which means that nums[j + 1] will be nums[9], which is out of range.

The fix is to change the range of the second for loop to range(len(nums) - 1).