Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Good First Issue]: Verify mpt-7b-chat with GenAI text_generation #267

Open
p-wysocki opened this issue Mar 1, 2024 · 17 comments
Open

[Good First Issue]: Verify mpt-7b-chat with GenAI text_generation #267

p-wysocki opened this issue Mar 1, 2024 · 17 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@p-wysocki
Copy link
Collaborator

Context

This task regards enabling tests for mpt-7b-chat. You can find more details under openvino_notebooks LLM chatbot README.md.

Please ask general questions in the main issue at #259

What needs to be done?

Described in the main Discussion issue at: #259

Example Pull Requests

Described in the main Discussion issue at: #259

Resources

Contact points

Described in the main Discussion issue at: #259

Ticket

No response

@qxprakash
Copy link
Contributor

.take

Copy link

github-actions bot commented Mar 5, 2024

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

@Utkarsh-2002
Copy link

.take

Copy link

github-actions bot commented Mar 5, 2024

Thanks for being interested in this issue. It looks like this ticket is already assigned to a contributor. Please communicate with the assigned contributor to confirm the status of the issue.

@p-wysocki
Copy link
Collaborator Author

Hello @qxprakash, are you still working on this? Is there anything we could help you with?

@qxprakash
Copy link
Contributor

hello @p-wysocki yes I am working on it , I faced error while trying to run mpt-chat

python3 ../../../llm_bench/python/convert.py --model_id mosaicml/mpt-7b-chat --output_dir ./MPT_CHAT --precision FP16

[ INFO ] Removing bias from module=LPLayerNorm((4096,), eps=1e-05, elementwise_affine=True).
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:02<00:00, 31.27s/it]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 121/121 [00:00<00:00, 781kB/s]
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:87: UserWarning: Propagating key_padding_mask to the attention module and applying it within the attention module can cause unnecessary computation/memory usage. Consider integrating into attn_bias once and passing that to each attention module instead.
  warnings.warn('Propagating key_padding_mask to the attention module ' + 'and applying it within the attention module can cause ' + 'unnecessary computation/memory usage. Consider integrating ' + 'into attn_bias once and passing that to each attention ' + 'module instead.')
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/modeling_mpt.py:311: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert S <= self.config.max_seq_len, f'Cannot forward input with seq_len={S}, this model only supports seq_len<={self.config.max_seq_len}'
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/modeling_mpt.py:253: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  _s_k = max(0, attn_bias.size(-1) - s_k)
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:78: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  _s_q = max(0, attn_bias.size(2) - s_q)
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:79: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  _s_k = max(0, attn_bias.size(3) - s_k)
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:81: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_bias.size(-1) != 1 and attn_bias.size(-1) != s_k or (attn_bias.size(-2) != 1 and attn_bias.size(-2) != s_q):
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:89: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if is_causal and (not q.size(2) == 1):
/home/prakash/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/attention.py:90: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  s = max(s_q, s_k)
[ WARNING ] Failed to send event with the following error: <urlopen error EOF occurred in violation of protocol (_ssl.c:2426)>
[ WARNING ] Failed to send event with the following error: <urlopen error EOF occurred in violation of protocol (_ssl.c:2426)>```

@p-wysocki
Copy link
Collaborator Author

cc @pavel-esir

@qxprakash
Copy link
Contributor

@pavel-esir I hope this is not a memory issue ?

@pavel-esir
Copy link
Contributor

@qxprakash thanks for your update. I see only connections warning in your logs. Did you get the converted IR? If so, did you run it with c++ sample?

@tushar-thoriya
Copy link

@qxprakash what is the progress ?

@p-wysocki
Copy link
Collaborator Author

Hello @qxprakash, are you still working on that issue? Do you need any help?

@qxprakash
Copy link
Contributor

Hi @p-wysocki I was stuck , currently I'm not working on it.

@rk119
Copy link

rk119 commented Jul 18, 2024

.take

Copy link

Thanks for being interested in this issue. It looks like this ticket is already assigned to a contributor. Please communicate with the assigned contributor to confirm the status of the issue.

@rk119
Copy link

rk119 commented Jul 18, 2024

@p-wysocki I'd like to take upon this issue as well.

@Wovchena Wovchena assigned rk119 and unassigned qxprakash Jul 19, 2024
@Wovchena
Copy link
Collaborator

I reassigned to @rk119 because I'm not sure if @Utkarsh-2002 is still here. If you still want to work on that, let us know. If @rk119 confirms that they didn't start working on it by the time @Utkarsh-2002 replies, @Utkarsh-2002 can take the task.

@rk119
Copy link

rk119 commented Jul 23, 2024

Hi @Wovchena,

Before raising a PR, I want to address that for samples prompt_lookup_decoding_lm and speculative_decoding_lm, I am getting an error of Exception from src/inference/src/cpp/infer_request.cpp:193: Check '::getPort(port, name, {_impl->get_inputs(), _impl->get_outputs()})' failed at src/inference/src/cpp/infer_request.cpp:193: Port for tensor name position_ids was not found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
Status: Assigned
Development

No branches or pull requests

7 participants