AttributeError: 'LlamaLikeModel' object has no attribute 'layers' #5778

VinPre · 2024-03-31T17:44:36Z

Describe the bug

I am not able to generate text using AWQ models since i updated.
I am able to load the model but once I write something no text gets returned and the console displays an AttributeError.

Is there an existing issue for this?

I have searched the existing issues

Reproduction

Update oobabooga / Or reinstall it

Load a model like TheBloke_Mythalion-Kimiko-v2-AWQ
Go to chat
Write something to get an answer
Empty answer gets returned

Screenshot

No response

Logs

19:36:40-544335 INFO     Starting Text generation web UI
19:36:40-549404 INFO     Loading the extension "gallery"

Running on local URL:  http://127.0.0.1:7860

19:37:27-118196 INFO     Loading "TheBloke_Mythalion-Kimiko-v2-AWQ"
Replacing layers...: 100%|█████████████████████████████████████████████████████████████| 40/40 [00:02<00:00, 14.02it/s]
Fusing layers...: 100%|████████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 49.76it/s]
19:37:33-908362 INFO     LOADER: "AutoAWQ"
19:37:33-909891 INFO     TRUNCATION LENGTH: 4096
19:37:33-910910 INFO     INSTRUCTION TEMPLATE: "Metharme"
19:37:33-911947 INFO     Loaded the model in 6.79 seconds.
Traceback (most recent call last):
  File "E:\gpt\text-generation-webui\modules\callbacks.py", line 61, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\gpt\text-generation-webui\modules\text_generation.py", line 389, in generate_with_callback
    shared.model.generate(**kwargs)
  File "E:\gpt\text-generation-webui\installer_files\env\Lib\site-packages\awq\models\base.py", line 110, in generate
    return self.model.generate(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\gpt\text-generation-webui\installer_files\env\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "E:\gpt\text-generation-webui\installer_files\env\Lib\site-packages\transformers\generation\utils.py", line 1575, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "E:\gpt\text-generation-webui\installer_files\env\Lib\site-packages\transformers\generation\utils.py", line 2694, in _sample
    model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\gpt\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1250, in prepare_inputs_for_generation
    past_key_values = getattr(getattr(self.model.layers[0], "self_attn", {}), "past_key_value", None)
                                      ^^^^^^^^^^^^^^^^^
  File "E:\gpt\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1688, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'LlamaLikeModel' object has no attribute 'layers'
Output generated in 0.26 seconds (0.00 tokens/s, 0 tokens, context 72, seed 234193679)

System Info

NVIDIA geforce 3090ti
32GB ram
7800x3d
Windows 10

realcoloride · 2024-04-01T06:31:15Z

Bumping this, happens to all the AWQ (thebloke) models I've tried

EDIT: try ticking no_inject_fused_attention

VinPre · 2024-04-01T12:34:57Z

Bumping this, happens to all the AWQ (thebloke) models I've tried

EDIT: try ticking no_inject_fused_attention

Thanks ticking no_inject_fused_attention works. But as far as I understand it this slows it down a bit. The speed is still high but I think this needs fixing especially because it worked on the older versions.

(Should have done a backup before updating) Maybe I will look into how to go back to an older version using git if that is possible.

Sadly this is still just a workaround not a real fix.

realcoloride · 2024-04-01T13:13:54Z

Agreed, it seems to ruin the output quality aswell. Bumping

VinPre · 2024-04-02T00:20:15Z

Agreed, it seems to ruin the output quality aswell. Bumping

As far as I understand it it just slows the process. But my understanding is very limited.
Do you mean the quality also suffers?

realcoloride · 2024-04-02T09:23:17Z

Yes.

calmtortoise · 2024-04-03T00:48:28Z

I also have the exact same problem. The quality of model output is noticeably worse for me. Speed is actually about the same after checking the infused attention do-dad.

JaiQiBoi · 2024-04-04T09:59:11Z

Same issue here, are there solid alternatives to using AWQ (thebloke) models?

realcoloride · 2024-04-05T23:31:17Z

For quality, use gguf, for speed, use exl2.

VinPre · 2024-04-06T07:36:59Z

For quality, use gguf, for speed, use exl2.

Any recommended gguf models for a 24gb 3090 ti + 32gb of RAM? Focused on comprehensive roleplay. Direct sources would be great.

Kitterss · 2024-04-06T12:30:42Z

you mentioned bug appeared after update. Did you try to install older snapshot? What version was you before updating?
hard to believe that this bug is really appeared for everyone, and only 5 people reacted. Maybe we just did something wrong

VinPre · 2024-04-06T12:34:05Z

you mentioned bug appeared after update. Did you try to install older snapshot? What version was you before updating? hard to believe that this bug is really appeared for everyone, and only 5 people reacted. Maybe we just did something wrong

Have not tested it with an older version. Installing it new does not work.

I am not that well versed using GitHub and have not looked into how to install an older version. Would be great if you give me a short tutorial on how to do that.

feldgendler · 2024-04-06T12:40:54Z

Had the same problem, switched to GPTQ.

VinPre · 2024-04-06T13:28:46Z

you mentioned bug appeared after update. Did you try to install older snapshot? What version was you before updating? hard to believe that this bug is really appeared for everyone, and only 5 people reacted. Maybe we just did something wrong

I just think there are not that many people who
-use the bloke model
-did an update
-search for the solution and find this

I think we are not doing something wrong. It seems to be happening with everybody who tries the bloke models on an updated installation

You can try and replicate it too prove that it can work.

VinPre · 2024-04-06T13:29:52Z

Sorry, accidentally tapped the close issue button

Kitterss · 2024-04-06T13:50:42Z

I deleted rocm itself from my system like this: (uninstalling chapter)
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/native-install/ubuntu.html#
And yet models are loading, and text is generating with no_inject_fused_attention ticked.
So, maybe the issue is connected to rocm wrong installation.

Oh, you have NVIDEA. But still, seems like some necessary parts of a system are not detected.

VinPre · 2024-04-06T15:35:29Z

I deleted rocm itself from my system like this: (uninstalling chapter) https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/native-install/ubuntu.html# And yet models are loading, and text is generating with no_inject_fused_attention ticked. So, maybe the issue is connected to rocm wrong installation.

Oh, you have NVIDEA. But still, seems like some necessary parts of a system are not detected.

Ticking no_inject_fused_attention is a workaround that makes it work but also makes the output worse.

We get the error message on default settings.

If I understand your message right than you thought we only get the error with it ticked, but it is the other way around.

calmtortoise · 2024-04-06T15:43:35Z

Agree with above. With it unticked/not checked it throws the error and does not work. Ticking it aka turning it off makes it work but the out put is very low quality. I tried rolling back to previous version but it did not work. Same error.

…

On Sat, Apr 6, 2024, 9:35 AM VinPre ***@***.***> wrote: I deleted rocm itself from my system like this: (uninstalling chapter) https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/native-install/ubuntu.html# And yet models are loading, and text is generating with no_inject_fused_attention ticked. So, maybe the issue is connected to rocm wrong installation. Oh, you have NVIDEA. But still, seems like some necessary parts of a system are not detected. Ticking no_inject_fused_attention is a workaround that makes it work but also makes the output worse. We get the error message on default settings. If I understand your message right than you thought we only get the error with it ticked, but it is the other way around. — Reply to this email directly, view it on GitHub <#5778 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANKWAE3R27CUEWXJH6GOGA3Y4AI5NAVCNFSM6AAAAABFQSXYEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGEZDAMRSGY> . You are receiving this because you commented.Message ID: ***@***.***>

VinPre · 2024-04-06T15:46:37Z

Can you try going back to a version from January? That's what I had before I updated.

Agree with above. With it unticked/not checked it throws the error and does not work. Ticking it aka turning it off makes it work but the out put is very low quality. I tried rolling back to previous version but it did not work. Same error.

calmtortoise · 2024-04-06T16:10:04Z

Yes. I will roll back to a January version and report back.

…

On Sat, Apr 6, 2024, 9:47 AM VinPre ***@***.***> wrote: Can you try going back to a version from January? That's what I had before I updated. Agree with above. With it unticked/not checked it throws the error and does not work. Ticking it aka turning it off makes it work but the out put is very low quality. I tried rolling back to previous version but it did not work. Same error. — Reply to this email directly, view it on GitHub <#5778 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANKWAEZP324LM6VOHP6SJBDY4AKHJAVCNFSM6AAAAABFQSXYEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGEZDENBWGQ> . You are receiving this because you commented.Message ID: ***@***.***>

calmtortoise · 2024-04-06T16:10:43Z

My previous rollback was to the version right before the most recent update.

…

On Sat, Apr 6, 2024, 9:47 AM VinPre ***@***.***> wrote: Can you try going back to a version from January? That's what I had before I updated. Agree with above. With it unticked/not checked it throws the error and does not work. Ticking it aka turning it off makes it work but the out put is very low quality. I tried rolling back to previous version but it did not work. Same error. — Reply to this email directly, view it on GitHub <#5778 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANKWAEZP324LM6VOHP6SJBDY4AKHJAVCNFSM6AAAAABFQSXYEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGEZDENBWGQ> . You are receiving this because you commented.Message ID: ***@***.***>

VinPre · 2024-04-06T16:22:28Z

My previous rollback was to the version right before the most recent update.

Yeah that's to recent I would say.

Yes. I will roll back to a January version and report back.

Ok great

Kitterss · 2024-04-06T17:33:35Z

I deleted rocm itself from my system like this: (uninstalling chapter) https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/native-install/ubuntu.html# And yet models are loading, and text is generating with no_inject_fused_attention ticked. So, maybe the issue is connected to rocm wrong installation.
Oh, you have NVIDEA. But still, seems like some necessary parts of a system are not detected.

Ticking no_inject_fused_attention is a workaround that makes it work but also makes the output worse.

We get the error message on default settings.

If I understand your message right than you thought we only get the error with it ticked, but it is the other way around.

No, my problem is the same as yours, ticking that flag let generate text, but i don't wanna tick it. And what is strange is that ticking that box allowed me to generate text even without ROCm.

I tried to roll back and got nothing. Well, actually, i got a bunch of different errors. Maybe i just couldn't make it right.

Kitterss · 2024-04-07T12:39:42Z

Switched to trying gguf model. There another current bug with those ones. (#5812)
But that thread gave me idea, why we can't rollback. #5812 (comment)
The one_click.py (which is loaded after loading start_*****.sh) has an update function in it. So one does not simply rollback. Some tricks needed.

ruizcrp · 2024-04-08T12:12:26Z

you mentioned bug appeared after update. Did you try to install older snapshot? What version was you before updating? hard to believe that this bug is really appeared for everyone, and only 5 people reacted. Maybe we just did something wrong

I'm having the exact same issue. Turning the mentioned option off, also works. Plus other models such as GGUF. Freshly installed it last week.

realcoloride · 2024-04-08T15:31:21Z

GGUF is premium tier quality depending on the model and what you look for (roleplay in OP's case) but it is painfully slow sadly (especially evaluation)

EvgeneKuklin · 2024-04-09T14:05:19Z

Same here, with AWQ models

realcoloride · 2024-04-09T15:05:14Z

Small trick I found for GGUF that lowers quality though is changing the context size to something lower.

jgreiner1024 · 2024-04-10T17:48:39Z

I also have this issue, I'm not sure if it's an issue with this itself, or a bug in one of the updated libraries that was also updated when I ran the update_windows.bat file. Unfortunately I also didn't make a back-up before running the update (lesson learned ><) but I'm fairly proficient in GIT, so I will try and disable to GIT/Code update then re-set my version back to an older one and report back.

jgreiner1024 · 2024-04-11T00:54:11Z

I downloaded "text-generation-webui-snapshot-2024-02-11" (that was roughly when I first installed the tool). I commented out line 271 in onclick.py to prevent it from updating
(for non-programmers just add a pound sign in front of this linelike below)
#run_cmd("git pull --autostash", assert_success=True, environment=True)

Then I ran the normal start_windows.bat and let it download/install all the libraries and such as expected.

After that I was able to load the same AWQ model that threw the error on the later versions and chat without errors.

The model used was "TheBloke_Wizard-Vicuna-7B-Uncensored-AWQ".

In the latest version, this model will load but when you attempt to chat it will give the error "AttributeError: 'LlamaLikeModel' object has no attribute 'layers'" as others mentioned above. When loading the exact same model (just copied from models folder)

I know there are a lot of snapshots between the 2024-02-11 and latest, but I wanted to go back to a version that I was super confident worked.

VinPre · 2024-04-12T06:31:11Z

I downloaded "text-generation-webui-snapshot-2024-02-11" (that was roughly when I first installed the tool). I commented out line 271 in onclick.py to prevent it from updating (for non-programmers just add a pound sign in front of this linelike below) #run_cmd("git pull --autostash", assert_success=True, environment=True)

Then I ran the normal start_windows.bat and let it download/install all the libraries and such as expected.

After that I was able to load the same AWQ model that threw the error on the later versions and chat without errors.

The model used was "TheBloke_Wizard-Vicuna-7B-Uncensored-AWQ".

In the latest version, this model will load but when you attempt to chat it will give the error "AttributeError: 'LlamaLikeModel' object has no attribute 'layers'" as others mentioned above. When loading the exact same model (just copied from models folder)

I know there are a lot of snapshots between the 2024-02-11 and latest, but I wanted to go back to a version that I was super confident worked.

So @jgreiner1024 has found a good workaround people can try if they want to revert back and use AWQ on the older versions. I will keep the issue open and hope that somebody will fix this for future versions so people don't need to go through reverting back to older versions.

realcoloride · 2024-04-12T06:32:46Z

What even is the difference between that causes the error?

VinPre · 2024-04-12T06:34:59Z

What even is the difference between that causes the error?

Looks like nobody here knows. Was hoping for somebody able to identify the problem to arrive here but that has not happened yet.

jgreiner1024 · 2024-04-12T13:34:37Z

@VinPre I'm going to find out exactly what snapshot it breaks this weekend, I'll work my way through them until it breaks then do a diff on the snapshot and the previous snapshot, that should help identify at least where to start looking.

EvgeneKuklin · 2024-04-12T13:37:29Z

@jgreiner1024 I'll work my way through them until it breaks

That will be inefficient, use binary search instead

jgreiner1024 · 2024-04-12T20:56:09Z

@jgreiner1024 I'll work my way through them until it breaks

That will be inefficient, use binary search instead

yes that was my plan, working through them now, wish me luck! lol

VinPre · 2024-04-12T21:11:01Z

yes that was my plan, working through them now, wish me luck! lol

Ok

architectdrone · 2024-04-12T21:14:13Z

@jgreiner1024 I'll work my way through them until it breaks

That will be inefficient, use binary search instead

yes that was my plan, working through them now, wish me luck! lol

Same issue here. Godspeed o7

jgreiner1024 · 2024-04-13T03:16:02Z

I have been working on snapshot 2024-03-10, there's a second git pull somewhere that only happens during the conda download and install, if I copy over the existing conda install it won't update the code.

With my other testing so far, I believe it may be a dependency issue, not actually an issue with the code inside this git repos, but I haven't confirmed that yet as I ran into this second git pull that updates the code on me.

I was hoping to have an answer but the second git pull threw me for a loop as it was a while before I noticed it was happening and it messed up a lot of my testing. I'm taking a break for the night and will try and look into it more tomorrow.

PS. if someone knows where the git pull that happens during the conda download/install could comment that would be very helpful so I don't have to chase it down tomorrow.

clrabbit · 2024-04-16T23:19:09Z

Chiming in to say I'm also seeing this behavior. Did a fresh install of Windows today (unrelated) and so I cloned the latest repository of OB and most things work as expected, however AWQ does throw 'LlamaLikeModel' object has no attribute 'layers' when trying to generate a response and no_inject_fused_attention is not ticked. This was working on a previous build but I believe that one was originally from like February 2024.

gilith1 · 2024-04-18T18:28:30Z

I've fixed this locally by changing line 1250 in modeling_llama.py (see the traceback above for the full path) to this:
past_key_values = getattr(getattr(self.model.blocks[0], "self_attn", {}), "past_key_value", None)

I doubt it's the proper fix, I'm not familiar with this code, it's just what I've found by looking at LlamaLikeModel class. It's probably some incompatibility between transformers and awq module versions. Anyway, with this change I can get responses from AWQ models. Maybe it'll be useful for someone to get this fixed properly.

HaysianSmelt · 2024-04-20T02:51:14Z

I ticked the

Bumping this, happens to all the AWQ (thebloke) models I've tried

EDIT: try ticking no_inject_fused_attention

I tried this and it worked for me. I was only getting one model to work from the "Bloke" collection. Its was a Mistral one.

alok-abhishek · 2024-04-28T03:07:48Z

I think this is an AutoAWQ issue and needs to be reported to AutoAWQ github.com/casper-hansen/AutoAWQ

I did post a discussion thread in Q&A casper-hansen/AutoAWQ#462

jllrr · 2024-05-30T12:56:35Z

Also having this issue.
But commenting out line 271 as suggested doesn't work for me; it just freezes on the miniconda install. How do I rollback to an older version without it properly updating to the main branch?

VinPre added the bug Something isn't working label Mar 31, 2024

VinPre closed this as completed Apr 6, 2024

VinPre reopened this Apr 6, 2024

AttributeError: 'LlamaLikeModel' object has no attribute 'layers' #5778

AttributeError: 'LlamaLikeModel' object has no attribute 'layers' #5778

Comments

VinPre commented Mar 31, 2024 • edited Loading

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

realcoloride commented Apr 1, 2024 • edited Loading

VinPre commented Apr 1, 2024

realcoloride commented Apr 1, 2024

VinPre commented Apr 2, 2024

realcoloride commented Apr 2, 2024

calmtortoise commented Apr 3, 2024

JaiQiBoi commented Apr 4, 2024

realcoloride commented Apr 5, 2024

VinPre commented Apr 6, 2024

Kitterss commented Apr 6, 2024

VinPre commented Apr 6, 2024

feldgendler commented Apr 6, 2024

VinPre commented Apr 6, 2024

VinPre commented Apr 6, 2024

Kitterss commented Apr 6, 2024 • edited Loading

VinPre commented Apr 6, 2024

calmtortoise commented Apr 6, 2024 via email

VinPre commented Apr 6, 2024

calmtortoise commented Apr 6, 2024 via email

calmtortoise commented Apr 6, 2024 via email

VinPre commented Apr 6, 2024

Kitterss commented Apr 6, 2024

Kitterss commented Apr 7, 2024

ruizcrp commented Apr 8, 2024

realcoloride commented Apr 8, 2024

EvgeneKuklin commented Apr 9, 2024

realcoloride commented Apr 9, 2024

jgreiner1024 commented Apr 10, 2024

jgreiner1024 commented Apr 11, 2024

VinPre commented Apr 12, 2024

realcoloride commented Apr 12, 2024

VinPre commented Apr 12, 2024

jgreiner1024 commented Apr 12, 2024

EvgeneKuklin commented Apr 12, 2024 • edited Loading

jgreiner1024 commented Apr 12, 2024

VinPre commented Apr 12, 2024 • edited Loading

architectdrone commented Apr 12, 2024

jgreiner1024 commented Apr 13, 2024

clrabbit commented Apr 16, 2024

gilith1 commented Apr 18, 2024

HaysianSmelt commented Apr 20, 2024

alok-abhishek commented Apr 28, 2024

jllrr commented May 30, 2024

VinPre commented Mar 31, 2024 •

edited

Loading

realcoloride commented Apr 1, 2024 •

edited

Loading

Kitterss commented Apr 6, 2024 •

edited

Loading

EvgeneKuklin commented Apr 12, 2024 •

edited

Loading

VinPre commented Apr 12, 2024 •

edited

Loading