# Hallucination Encounter

## Levenshtein distance

In [1]:
import Levenshtein
import jedi

### Similairty between two strings

1. Wrong Attribute
2. List of Correct Attributes

#### Example

- list of attributes is `small`
- comparing 
    - `retrieve_system_prompt` string
    - with listed attributes by Jedi

In [2]:
curr_source_code = """import openai
from llmcoder.utils import get_conversations_dir, get_openai_key, get_system_prompt, get_system_prompt_dir

client = openai.OpenAI(api_key=get_openai_key())
client.chat.completions.create(messages=self.messages, model=model, temperature=temperature, n=n)
"""

In [3]:
script = jedi.Script(curr_source_code)
attributes = script.get_names()

In [4]:
%%timeit -n 1000
sorted([(Levenshtein.distance("retrieve_system_prompt", attribute.name), attribute.name) for attribute in attributes if attribute.type == "function"], key=lambda similarity: similarity[0])[0:10]

68.9 µs ± 29.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


### Example numpy

- **huge** number of attributes from `numpy` 
- comparing 
    - `intarray` string
    - with listed attributes by Jedi

In [5]:
source = """import numpy as np
np.
"""

In [6]:
script = jedi.Script(source)
attributes = script.complete(line=2, column=3)

In [7]:
%%timeit -n 1000
sorted([(Levenshtein.distance("intarray", attribute.name), attribute.name) for attribute in attributes if attribute.type == "function"], key=lambda similarity: similarity[0])[0:10]

The slowest run took 5.97 times longer than the fastest. This could mean that an intermediate result is being cached.
1.07 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [8]:
sorted([(Levenshtein.distance("intarray", attribute.name), attribute.name) for attribute in attributes if attribute.type == "function"], key=lambda similarity: similarity[0])[0:10]

[(3, 'array'),
 (3, 'asarray'),
 (3, 'asfarray'),
 (4, 'asanyarray'),
 (4, 'interp'),
 (5, 'geterr'),
 (5, 'histogram'),
 (5, 'inner'),
 (5, 'insert'),
 (5, 'isfortran')]