Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phi-2 token out of bounds error #682

Closed
nking-1 opened this issue Mar 7, 2024 · 0 comments
Closed

phi-2 token out of bounds error #682

nking-1 opened this issue Mar 7, 2024 · 0 comments

Comments

@nking-1
Copy link
Contributor

nking-1 commented Mar 7, 2024

The bug
When generating text with phi-2, sometimes the model attempts to sample a token out of bounds of the tokenizer's token list

To Reproduce
This might take multiple attempts due to the random sampling.

from guidance import models
phi2 = models.Transformers("microsoft/phi-2", device_map="mps")

msg_no_e = phi2 + "Hello, " + with_temperature(any_char_but('e'), temperature=0.9)
for i in range(250):
    msg_no_e += with_temperature(any_char_but('e'), temperature=0.9)

msg_no_e

Error trace:

{
	"name": "IndexError",
	"message": "index 51164 is out of bounds for axis 0 with size 50295",
	"stack": "---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[12], line 3
      1 msg_no_e = phi2 + \"Hello, \" + with_temperature(any_char_but('e'), temperature=0.9)
      2 for i in range(250):
----> 3     msg_no_e += with_temperature(any_char_but('e'), temperature=0.9)
      5 msg_no_e

File ~/code/ms/guidance/guidance/models/_model.py:915, in Model.__add__(self, value)
    913 # run stateless functions (grammar nodes)
    914 elif isinstance(value, GrammarFunction):
--> 915     out = lm._run_stateless(value)
    917 # run stateful functions
    918 else:
    919     out = value(lm)

File ~/code/ms/guidance/guidance/models/_model.py:1111, in Model._run_stateless(self, stateless_function, temperature, top_p, n)
   1109 delayed_bytes = b\"\"
   1110 # last_is_generated = False
-> 1111 for chunk in gen_obj:
   1112 
   1113     # we make everything full probability if we are not computing uncertainty
   1114     # if not self.engine.compute_log_probs:
   1115     #     chunk.new_bytes_prob = 1.0
   1116     
   1117     # convert the bytes to a string (delaying if we don't yet have a valid unicode string)
   1118     lm.token_count += chunk.new_token_count
   1119     chunk.new_bytes = delayed_bytes + chunk.new_bytes

File ~/code/ms/guidance/guidance/models/_model.py:373, in Engine.__call__(self, parser, grammar, ensure_bos_token)
    371 # loop over the tokens looking for a valid one
    372 for i,sampled_token_ind in enumerate(sampling_order):
--> 373     sampled_token = self.tokenizer.tokens[sampled_token_ind]
    375     # break out if we have reach impossible tokens
    376     if logits[sampled_token_ind] <= -np.inf:

IndexError: index 51164 is out of bounds for axis 0 with size 50295"
}

System info (please complete the following information):

  • OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): Mac OS, using mps device
  • Guidance Version (guidance.__version__): 0.1.11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants