# Summary

Trying to figure out how ConversationManager will work. Having a hard time planning this out so I'm thinking it may be best to try one approach to building it, see what issues arise, and then it will be easier to fix them. Start by trying to subclass PromptManager.

In [1]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [212]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
from pathlib import Path
import re

from jabberwocky.config import C
from jabberwocky.external_data import wiki_data, _wiki_text_cleanup
from jabberwocky.openai_utils import load_prompt, load_openai_api_key, \
    PromptManager, print_response, query_gpt3
from htools import *

In [3]:
cd_root()

Current directory: /Users/hmamin/jabberwocky


In [4]:
@auto_repr
class ConversationManager(PromptManager):
    
    def __init__(self, verbose=True, log_dir='data/logs'):
        super().__init__('conversation', verbose=verbose, log_dir=log_dir)
        
    def query(self, text, debug=False, extra_kwargs=None, **kwargs):
        # TODO: reconsider later but trying to simplify things for now.
        assert not extra_kwargs or \
            all(arg not in extra_kwargs for arg in ('stream', 'return_full'))
        assert all (arg not in kwargs for arg in ('stream', 'return_full'))
        
        prompt, resp = super().query('conversation', text, debug=debug, 
                                     extra_kwargs=extra_kwargs, **kwargs)
        return prompt, resp

In [5]:
conv = ConversationManager()
conv

conversation: Your message must start with 'Hi {name}.'. Might want to try tweaking frequency penalty.
-------------------------------------------------------------------------------



ConversationManager(verbose=True, log_dir='data/logs')

In [6]:
conv.prompts

{'conversation': {'engine_i': 3,
  'frequency_penalty': 0.1,
  'max_tokens': 250,
  'prompt': 'This is a conversation with {name}. {summary}\n\nMe: {message}\n\n{name}:',
  'stop': ['Me:', 'This is a conversation with'],
  'temperature': 0.5}}

In [7]:
txt = 'Hi Michael Jordan. Who are some athletes from other sports who you ' \
      'most respect, and why?'
prompt, resp = conv.query(txt)

Writing data to data/logs/query_kwargs.json.


In [10]:
print_response(prompt, resp)

[1mThis is a conversation with Michael Jordan. Michael Jeffrey Jordan (born February 17, 1963), also known by his initials MJ, is an American businessman and former professional basketball player. He is the principal owner and chairman of the Charlotte Hornets of the National Basketball Association (NBA) and of 23XI Racing in the NASCAR Cup Series. He played 15 seasons in the NBA, winning six championships with the Chicago Bulls. His biography on the official NBA website states: "By acclamation, Michael Jordan is the greatest basketball player of all time." He was integral in helping to popularize the NBA around the world in the 1980s and 1990s, becoming a global cultural icon in the process.Jordan played college basketball for three seasons under coach Dean Smith with the North Carolina Tar Heels. As a freshman, he was a member of the Tar Heels' national championship team in 1982. Jordan joined the Bulls in 1984 as the third overall draft pick, and quickly emerged as a league star, e

In [14]:
conv.kwargs('conversation')

{'engine_i': 3,
 'frequency_penalty': 0.1,
 'logprobs': None,
 'max_tokens': 250,
 'mock': False,
 'mock_func': None,
 'mock_mode': 'raise',
 'return_full': False,
 'stop': ['Me:', 'This is a conversation with'],
 'stream': False,
 'strip_output': True,
 'temperature': 0.5}

In [13]:
# Minor Issue: kwargs method expects 'task' param but I'd like that to be 
# automatic. But if we create some sort of partial method there then 
# PromptManager.query() will break.
inspect.signature(conv.kwargs)

<Signature (task, fully_resolved=True, return_prompt=False, extra_kwargs=None, **kwargs)>

First task  
- construct person-specific prompt  
base_prompt = "This is a conversation with {Michael Jordan}. {wiki bio}"

Second task  
- query w/ user text  
GET gpt3_response  
UPDATE running_prompt := base_prompt + 'Me: {user_input}' + gpt3_response  
RETURN running_prompt

In [104]:
def _wiki_text_cleanup(text):
    text = re.sub('\[\d*\]', '', text)
    match = re.search('\(.*\)', text)
    if match: 
        match = match.group()
        match_parts = [x for x in match.partition(';') if x]
        if len(match_parts) > 1:
            text = text.replace(match, '(' + match_parts[-1].strip())
    return re.sub('\s{2,}', ' ', text)

In [105]:
txts = [
"Barack Hussein Obama II (bə-RAHK hoo-SAYN oh-BAH-mə;[1] born August 4, 1961) is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. A member of the Democratic Party, Obama was the first African-American president of the United States. He previously served as a U.S. senator from Illinois from 2005 to 2008 and as an Illinois state senator from 1997 to 2004.",
"Joanne Rowling ( ROH-ling;[1] born 31 July 1965), better known by her pen name J. K. Rowling, is a British author, philanthropist, film producer, television producer, and screenwriter. She is best known for writing the Harry Potter fantasy series, which has won multiple awards and sold more than 500 million copies,[2][3] becoming the best-selling book series in history.[4] The books are the basis of a popular film series, over which Rowling had overall approval on the scripts[5] and was a producer on the final films.[6] She also writes crime fiction under the pen name Robert Galbraith.",
"Michael Herbert Schur[1] (born c.  1975/1976)[1] is an American television producer, writer, and character actor. He was a producer and writer for the comedy series The Office, and co-created Parks and Recreation with Office producer Greg Daniels. He created The Good Place, co-created the comedy series Brooklyn Nine-Nine and was a producer on the series Master of None. He also played Mose Schrute in The Office. In 2021, he co-created a comedy series Rutherford Falls."
]

In [107]:
for t in txts:
    print(_wiki_text_cleanup(t))
    print()

Barack Hussein Obama II (born August 4, 1961) is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. A member of the Democratic Party, Obama was the first African-American president of the United States. He previously served as a U.S. senator from Illinois from 2005 to 2008 and as an Illinois state senator from 1997 to 2004.

Joanne Rowling (born 31 July 1965), better known by her pen name J. K. Rowling, is a British author, philanthropist, film producer, television producer, and screenwriter. She is best known for writing the Harry Potter fantasy series, which has won multiple awards and sold more than 500 million copies, becoming the best-selling book series in history. The books are the basis of a popular film series, over which Rowling had overall approval on the scripts and was a producer on the final films. She also writes crime fiction under the pen name Robert Galbraith.

Michael Herbert Schur (born c. 1975/1976) is an Am

In [73]:
tmp = 'before (pronounced abc; born 1892; died 1900). Still'
match = re.search('\(.*\)', tmp)
if match: 
    match = match.group()
    print(tmp.replace(match, '(' + match.partition(';')[-1].strip()))

before (born 1892; died 1900). Still


In [273]:
class ConversationManager:
    img_exts = {'.jpg', '.jpeg', '.png'}
    
    def __init__(self, verbose=True, data_dir='./data'):
        # Set directories for data storage, logging, etc.
        self.verbose = verbose
        self.data_dir = Path(data_dir)
        self.persona_dir = self.data_dir/'conversation_personas'
        self.log_dir = self.data_dir/'logs'
        os.makedirs(self.log_dir, exist_ok=True)
        self.log_path = Path(self.log_dir)/'conversation_query_kwargs.json'
        
        # Load prompt, default query kwargs, and existing personas.
        self._kwargs = load_prompt('conv_proto')
        self._base_prompt = self._kwargs.pop('prompt')
        self.name2summary, self.name2img_path = self.load_personas()
        self.name2base = {}
        self.name2wiki_name = {}
        self.running_prompt = ''
        for k, v in self.name2summary.items():
            self.update_persona_dicts(k, v, self.name2img_path[k])

    def load_personas(self):
        name2summary = {}
        name2img_path = {}
        for path in self.persona_dir.iterdir():
            if not path.is_dir(): continue
            name2summary[path.stem] = load(path/'summary.txt')
            name2img_path[path.stem] = [p for p in path.iterdir() 
                                        if p.suffix in self.img_exts][0]
        return name2summary, name2img_path
        
    def load_persona(self, name, download_if_necessary=False):
        processed_name = self.process_name(name)
        try:
            summary = self.name2summary[processed_name]
            path = self.name2img_path[processed_name]
        except KeyError as e:
            if not download_if_necessary:
                raise e
            summary, path = self.add_persona(name, return_data=True)
        self.running_prompt = self.name2base[processed_name]
        return summary, path
        
    def add_persona(self, name, download=True, return_data=False,
                    strict=True):
        processed_name = self.process_name(name)
        summary, _, img_path = wiki_data(
            name, img_dir=self.persona_dir/processed_name, fname='profile'
        )
        if strict and '(born' not in summary:
            raise ValueError('Expected summary to contain "(born". Summary '
                             f'may be malformed: {summary}')
        save(summary, self.persona_dir/processed_name/'summary.txt')
        self.update_persona_dicts(processed_name, summary, img_path)
        if return_data: return summary, img_path
        
    def update_persona_dicts(self, processed_name, summary, img_path):
        """Helper to update our various name2{something} dicts.
        """
        self.name2wiki_name[processed_name] = self._name_from_summary(summary)
        self.name2summary[processed_name] = summary
        self.name2img_path[processed_name] = img_path
        self.name2base[processed_name] = self._base_prompt.format(
            name=self.name2wiki_name[processed_name],
            summary=summary
        )
        
    def _name_from_summary(self, summary):
        return summary.partition('(born')[0].strip()
        
    def process_name(self, name, inverse=False):
        if inverse:
            return name.replace('_', ' ').title()
        return name.lower().replace(' ', '_')
    
    def personas(self, pretty=True):
        names = list(self.name2base)
        if pretty: return [self.process_name(name, True) for name in names]
        return names
    
    def kwargs(self, name='', fully_resolved=True, return_prompt=False, 
               extra_kwargs=None, **kwargs):
        # Name param should be pretty version, i.e. no underscores. Only 
        # needed when return_prompt is True.
        if 'prompt' in kwargs:
            raise RuntimeError(
                'Arg "prompt" should not be in query kwargs. It will be ' 
                'constructed within this method and passing it in will '
                'override the new version.'
            )
        kwargs = {**self._kwargs, **kwargs}
        for k, v in (extra_kwargs or {}).items():
            v_cls = type(v)
            # Make a new object instead of just using get() or setdefault
            # since the latter two methods both mutate our default kwargs.
            curr_val = v_cls(kwargs.get(k, v_cls()))
            if isinstance(v, Iterable):
                curr_val.extend(v)
            elif isinstance(v, Mapping):
                curr_val.update(v)
            else:
                raise TypeError(f'Key {k} has unrecognized type {v_cls} in '
                                '`extra_kwargs`.')
            kwargs[k] = curr_val

        if fully_resolved: kwargs = dict(bound_args(query_gpt3, [], kwargs))
        if name and return_prompt:
            kwargs['prompt'] = self.name2base[self.process_name(name)]
        return kwargs
    
    def query(self, name, text, debug=False, extra_kwargs=None, **kwargs):
        kwargs = self.kwargs(name, fully_resolved=False, return_prompt=True,
                             extra_kwargs=extra_kwargs, **kwargs)
        prompt = self.format_prompt(name, kwargs.pop('prompt'), 
                                    user_text=text)
        if debug:
            print('prompt:\n' + prompt)
            print(spacer())
            print('kwargs:\n', kwargs)
            print(spacer())
            print('fully resolved kwargs:\n',
                  dict(bound_args(query_gpt3, [], kwargs)))
            return
        save({'prompt': prompt, **kwargs}, self.log_path)
        prompt, resp = query_gpt3(prompt, **kwargs)
        # GPT3 prefers prompts that don't end with spaces and query_gpt3()
        # strips output, but we want a space after the colon.
        self.running_prompt = prompt + ' ' + resp
        return prompt, resp
    
    def end_conversation(self):
        self.running_prompt = ''
    
    def format_prompt(self, name, prompt, user_text):
#         return prompt + f'\n\nMe: {user_text.strip()}\n\n'\
#                         f'{self.name2wiki_name[name]}:'
        return prompt + f'\n\nMe: {user_text.strip()}\n\n'\
                        f'{self.process_name(name, inverse=True)}:'
    
    def __contains__(self, name):
        return self.process_name(name) in self.name2base

In [274]:
conv = ConversationManager()
conv

conv_proto: Your message must start with 'Hi {name}.'. Might want to try tweaking frequency penalty.
-------------------------------------------------------------------------------



<__main__.ConversationManager at 0x12a7cdd68>

In [266]:
print_response(
    *conv.query('barack obama', 'What\'s your favorite flavor of coffee?')
)

Writing data to data/logs/conversation_query_kwargs.json.
[1mThis is a conversation with Barack Hussein Obama II. Barack Hussein Obama II (born August 4, 1961) is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. A member of the Democratic Party, Obama was the first African-American president of the United States. He previously served as a U.S. senator from Illinois from 2005 to 2008 and as an Illinois state senator from 1997 to 2004.

Me: What's your favorite flavor of coffee?

Barack Obama:[0mI don't drink coffee.


In [275]:
print_response(
    *conv.query('Robert Sapolsky', 'What\'s your favorite type of music?')
)

Writing data to data/logs/conversation_query_kwargs.json.
[1mThis is a conversation with Robert Morris Sapolsky. Robert Morris Sapolsky (born April 6, 1957) is an American neuroendocrinology researcher and author. He is currently a professor of biology, and professor of neurology and neurological sciences and, by courtesy, neurosurgery, at Stanford University. In addition, he is a research associate at the National Museums of Kenya.

Me: What's your favorite type of music?

Robert Sapolsky:[0mI love all kinds of music. I love music that is about the moment, like jazz, and I love music that is about the past, like classical music.


In [276]:
print(conv.running_prompt)

This is a conversation with Robert Morris Sapolsky. Robert Morris Sapolsky (born April 6, 1957) is an American neuroendocrinology researcher and author. He is currently a professor of biology, and professor of neurology and neurological sciences and, by courtesy, neurosurgery, at Stanford University. In addition, he is a research associate at the National Museums of Kenya.

Me: What's your favorite type of music?

Robert Sapolsky: I love all kinds of music. I love music that is about the moment, like jazz, and I love music that is about the past, like classical music.


In [226]:
conv.personas()

['Jk Rowling', 'Barack Obama', 'Robert Sapolsky', 'Bill Gates']

In [169]:
conv.personas(pretty=False)

['jk_rowling', 'barack_obama', 'robert_sapolsky', 'bill_gates']

In [152]:
conv.load_persona('Bill Gates', download_if_necessary=True)

('William Henry Gates III (born October 28, 1955) is an American business magnate, software developer, investor, author, and philanthropist. He is the co-founder of Microsoft Corporation. During his career at Microsoft, Gates held the positions of chairman, chief executive officer (CEO), president and chief software architect, while also being the largest individual shareholder until May 2014. He is considered one of the best known entrepreneurs of the microcomputer revolution of the 1970s and 1980s.',
 PosixPath('data/conversation_personas/bill_gates/profile.jpg'))

In [153]:
conv.load_persona('Barack Obama', download_if_necessary=True)

('Barack Hussein Obama II (born August 4, 1961) is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. A member of the Democratic Party, Obama was the first African-American president of the United States. He previously served as a U.S. senator from Illinois from 2005 to 2008 and as an Illinois state senator from 1997 to 2004.',
 PosixPath('data/conversation_personas/barack_obama/profile.jpg'))

In [154]:
conv.load_persona('JK Rowling', download_if_necessary=True)

('Joanne Rowling (born 31 July 1965), better known by her pen name J. K. Rowling, is a British author, philanthropist, film producer, television producer, and screenwriter. She is best known for writing the Harry Potter fantasy series, which has won multiple awards and sold more than 500 million copies, becoming the best-selling book series in history. The books are the basis of a popular film series, over which Rowling had overall approval on the scripts and was a producer on the final films. She also writes crime fiction under the pen name Robert Galbraith.',
 PosixPath('data/conversation_personas/jk_rowling/profile.jpg'))

In [156]:
conv.load_persona('Robert Sapolsky', download_if_necessary=True)

Writing data to data/conversation_personas/robert_sapolsky/summary.txt.


('Robert Morris Sapolsky (born April 6, 1957) is an American neuroendocrinology researcher and author. He is currently a professor of biology, and professor of neurology and neurological sciences and, by courtesy, neurosurgery, at Stanford University. In addition, he is a research associate at the National Museums of Kenya.',
 'data/conversation_personas/robert_sapolsky/profile.jpg')

In [159]:
with assert_raises(KeyError):
    _ = conv.load_persona('Michael Phelps', download_if_necessary=False)

As expected, got KeyError('michael_phelps').


'Barack Hussein Obama II'

## OwnerAccess metaclass

Unrelated: working on this for htools.meta module.

In [171]:
import warnings

In [280]:
class OwnerAccess(type):
    
    def __new__(cls, name, bases, methods, **meta_kwargs):
        print('meta kwargs', meta_kwargs)
        class_ = type.__new__(cls, name, bases, methods)
        class_._meta_attrs = meta_kwargs.get('attrs', [])
        return class_
        
    def __call__(cls, *args, **kwargs):
        inst = cls.__new__(cls, *args, **kwargs)
        inst.__init__(*args, **kwargs)
        
        # If user specified some subset of 'attrs' to give owner access, only
        # give those access. Otherwise, try to give all instance attrs access.
        inst_dict = vars(inst)
        inst._meta_attrs = inst._meta_attrs or [k for k in inst_dict
                                                if k != '_meta_attrs']
        
        for k in inst._meta_attrs:
            v = inst_dict[k]
            try:
                v.owner = inst
            except Exception as e:
                # Eventually revert to warning, but print is cleaner for now.
                print(f'Failed to give owner access to {k}.')
#                 warnings.warn(f'Failed to give owner access to {k}.')
        return inst

In [281]:
class Bar:
    def __init__(self, x):
        self.x = x

In [282]:
class Foo(metaclass=OwnerAccess):
    
    def __init__(self, a, *args, b=3, c=None, **kwargs):
        self.a = a
        self.b = b
        self.new_name = c
        self.nums = list(args)
        self.kwargs = dict(kwargs)

meta kwargs {}


In [283]:
f = Foo('abc', z=False, c=Bar(444))

Failed to give owner access to a.
Failed to give owner access to b.
Failed to give owner access to nums.
Failed to give owner access to kwargs.


In [284]:
f.new_name.x, f.new_name.owner

(444, <__main__.Foo at 0x11cc1bcc0>)

In [286]:
class Foo(metaclass=OwnerAccess, attrs=['a']):
    
    def __init__(self, a, *args, b=3, c=None, **kwargs):
        self.a = a
        self.b = b
        self.new_name = c
        self.nums = list(args)
        self.kwargs = dict(kwargs)

meta kwargs {'attrs': ['a']}


In [287]:
f = Foo(a=Bar(99), z=False, c=Bar(444))

In [288]:
f._meta_attrs

['a']

In [289]:
f.a, f.a.owner

(<__main__.Bar at 0x11cc29e80>, <__main__.Foo at 0x11cc29b00>)

In [290]:
f.new_name.x, hasattr(f.new_name, 'owner')

(444, False)

In [293]:
class Foo(metaclass=OwnerAccess, attrs=['a', 'new_name']):
    
    def __init__(self, a, *args, b=3, c=None, **kwargs):
        self.a = a
        self.b = b
        self.new_name = c
        self.nums = list(args)
        self.kwargs = dict(kwargs)

meta kwargs {'attrs': ['a', 'new_name']}


In [294]:
f = Foo(a=Bar(99), z=False, c=Bar(444))

In [295]:
f._meta_attrs

['a', 'new_name']

In [296]:
f.a, f.a.owner

(<__main__.Bar at 0x11cc11630>, <__main__.Foo at 0x11cc116a0>)

In [298]:
f.new_name.x, f.new_name.owner

(444, <__main__.Foo at 0x11cc116a0>)

In [300]:
f.a.owner.new_name.x

444

In [324]:
def owner_access(cls=None, *, attrs=()):
    if cls is None: return partial(owner_access, attrs=attrs)
    old_init = cls.__init__
    def _init(self, *args, **kwargs):
        old_init(self, *args, **kwargs)
        for k in attrs or vars(self):
            v = getattr(self, k)
            try:
                v.owner = self
            except Exception as e:
                # Eventually revert to warning, but print is cleaner for now.
                print(f'Failed to give owner access to {k}.')
#                 warnings.warn(f'Failed to give owner access to {k}.')
        
    cls.__init__ = _init
    return cls

In [325]:
@owner_access(attrs=['a', 'new_name'])
class Foo:
    
    def __init__(self, a, *args, b=3, c=None, **kwargs):
        self.a = a
        self.b = b
        self.new_name = c
        self.nums = list(args)
        self.kwargs = dict(kwargs)

In [326]:
f = Foo(Bar(999), b=Bar(77), c=Bar(3333333))

In [329]:
f.a.owner, f.new_name.owner

(<__main__.Foo at 0x11cd3dda0>, <__main__.Foo at 0x11cd3dda0>)

In [331]:
assert not hasattr(f.b, 'owner')