New 65K token storywriter model -- mpt-7b-storywriter #1815

Alchete · 2023-05-05T15:54:06Z

Alchete
May 5, 2023

Has anyone gotten this to load into the UI?

There's a new 65K token storywriter that was just released...
https://huggingface.co/mosaicml/mpt-7b-storywriter

I added --trust-remote-code to webui.py

def run_model():
    os.chdir("text-generation-webui")
    run_cmd("python server.py --auto-devices --chat --model-menu --trust-remote-code")

But still get an error:

ValueError: Loading models\mosaicml_mpt-7b-storywriter requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.

oobabooga · 2023-05-05T17:28:35Z

oobabooga
May 5, 2023
Maintainer

Try updating the web UI, you are at an older commit. For me it worked after installing einops:

pip install einops
python server.py --model mosaicml_mpt-7b-storywriter --listen --trust-remote-code

I could only get to 7000 input tokens with 24gb vram, and the generation was super slow:

Output generated in 97.18 seconds (0.20 tokens/s, 19 tokens, context 6993, seed 562989315)

Using triton + bf16 it gets a bit faster:

Output generated in 33.79 seconds (0.56 tokens/s, 19 tokens, context 6993, seed 1618975480)

These changes were necessary to modules/models.py:

+    config = transformers.AutoConfig.from_pretrained(
+      Path(f"{shared.args.model_dir}/{model_name}"),
+      trust_remote_code=True
+    )
+    config.attn_config['attn_impl'] = 'triton'
+
     # Load the model in simple 16-bit mode by default
     if not any([shared.args.cpu, shared.args.load_in_8bit, shared.args.wbits, shared.args.auto_devices, shared.args.disk, shared.args.gpu_memory is not None, shared.args.cpu_memory is not None, shared.args.deepspeed, shared.args.flexgen, shared.model_type in ['rwkv', 'llamacpp']]):
-        model = LoaderClass.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}"), low_cpu_mem_usage=True, torch_dtype=torch.bfloat16 if shared.args.bf16 else torch.float16, trust_remote_code=trust_remote_code)
+        model = LoaderClass.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}"), low_cpu_mem_usage=True, torch_dtype=torch.bfloat16 if shared.args.bf16 else torch.float16, trust_remote_code=trust_remote_code, config=config)
+        print("yes")

Then start with

python server.py --model mosaicml_mpt-7b-storywriter --listen --trust-remote-code --bf16

6 replies

oobabooga May 5, 2023
Maintainer

8bit mode also doesn't work

Alchete May 5, 2023
Author

Thanks @oobabooga , I've gotten to the einops missing library error. Where are you typing "pip install einops"? Are you setting a venv before running the command?

jepjoo May 5, 2023

This worked for me (your path and env name may vary):

cd ~/text-generation-webui
conda activate textgen
pip install einops

Model loads and works fine after this.

oobabooga May 5, 2023
Maintainer

You can use the cmd.bat script if you used the one click installer

Felixqian4160 May 5, 2023

i can not run it
Loading mosaicml_mpt-7b-storywriter...
Traceback (most recent call last):
File "D:\chatgpt\installer_files\env\lib\site-packages\transformers\configuration_utils.py", line 629, in _get_config_dict
resolved_config_file = cached_file(
File "D:\chatgpt\installer_files\env\lib\site-packages\transformers\utils\hub.py", line 409, in cached_file
resolved_file = hf_hub_download(
File "C:\Users\Felix\AppData\Roaming\Python\Python310\site-packages\huggingface_hub\utils_validators.py", line 112, in _inner_fn
validate_repo_id(arg_value)
File "C:\Users\Felix\AppData\Roaming\Python\Python310\site-packages\huggingface_hub\utils_validators.py", line 166, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or '-', '', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'models\mosaicml_mpt-7b-storywriter'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\chatgpt\text-generation-webui\server.py", line 914, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "D:\chatgpt\text-generation-webui\modules\models.py", line 71, in load_model
shared.model_type = find_model_type(model_name)
File "D:\chatgpt\text-generation-webui\modules\models.py", line 59, in find_model_type
config = AutoConfig.from_pretrained(Path(f'{shared.args.model_dir}/{model_name}'))
File "D:\chatgpt\installer_files\env\lib\site-packages\transformers\models\auto\configuration_auto.py", line 925, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\chatgpt\installer_files\env\lib\site-packages\transformers\configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\chatgpt\installer_files\env\lib\site-packages\transformers\configuration_utils.py", line 650, in _get_config_dict
raise EnvironmentError(
OSError: Can't load the configuration of 'models\mosaicml_mpt-7b-storywriter'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'models\mosaicml_mpt-7b-storywriter' is the correct path to a directory containing a config.json file

EnviralDesign · 2023-05-05T19:53:46Z

EnviralDesign
May 5, 2023

I was able to get this working by running

update_windows.bat
cmd_windows.bat
2.1) pip install einops
updated webui.py run command to this run_cmd("python server.py --model-menu --notebook --model mosaicml_mpt-7b-storywriter --trust-remote-code")

when I prompted it to write some stuff, both times it started out coherent, then started devolving into madness, eventually reaching a point where it just randomly spat out words one after the other:

Is there anything we have to do to properly expand that token limit length? Maybe I'm not doing that correctly.

7 replies

EnviralDesign May 8, 2023

tried this and changing the repetition penality lower caused it to do even weird things with repetition. couldn't quite find a sweet spot.
Did you have success running this model?

I basically gave it the start of a made up novel premise and chapter 1: and it always ends up degenerating into madness. even with the boilerplate python code on huggingface.. so not sure at the moment whats what.

P4l1ndr0m May 8, 2023

Same observation as EnviralDesign on my end, I give it a large corpus (5k tokens) as starting point and the next 50 or so generated tokens kinda match the input I provided but then the model starts to hallucinate, it either repeats the same thing over and over again or switches to something completely different / incoherent.

EnviralDesign May 8, 2023

Was hoping I had something setup wrong. Perhaps the mpt architecture needs something else or something different to work properly other than remote code

Felixqian4160 May 9, 2023

ValueError: Loading models\mpt-7b-storywriter-4bit-128g requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.

AssetDK May 9, 2023

Perhaps you can point me in the right direction??

"updated webui.py run command to this run_cmd("python server.py --model-menu --notebook --model mosaicml_mpt-7b-storywriter --trust-remote-code")"

But there is a "thousand" places where yuo can put this.. how do we know where is goes??

Also the dokumentation says somthing like this...: Note: This model requires that trust_remote_code=True be passed to the from_pretrained method. This is because we use a custom model architecture that is not yet part of the transformers package.

Byt there is no indication what so ever where you would put this code either... and further - would not know where to look for the right place to put/run this code??

config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-storywriter',
  trust_remote_code=True
)
config.attn_config['attn_impl'] = 'triton'

model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-storywriter',
  config=config,
  torch_dtype=torch.bfloat16,
  trust_remote_code=True
)
model.to(device='cuda:0')

I think i found it? in webui.py:

def run_model():
    os.chdir("text-generation-webui")
    
    # add something??     
    #run_cmd("python server.py --model-menu --notebook --model mosaicml_mpt-7b-storywriter --trust-remote-code")
   
     run_cmd("python server.py --chat --model-menu", environment=True)  # put your flags here!

The model loaded but CUDA ran out of memory.... so this is probably not for me?

efoxxfiles · 2023-05-06T17:29:32Z

efoxxfiles
May 6, 2023

How many memory to load thee model?

INFO:Loading mosaicml_mpt-7b-storywriter...
C:\Users\fox/.cache\huggingface\modules\transformers_modules\mosaicml_mpt-7b-storywriter\attention.py:148: UserWarning: Using attn_impl: torch. If your model does not use alibi or prefix_lm we recommend using attn_impl: flash otherwise we recommend using attn_impl: triton.
warnings.warn('Using attn_impl: torch. If your model does not use alibi or ' + 'prefix_lm we recommend using attn_impl: flash otherwise ' + 'we recommend using attn_impl: triton.')
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.55s/it]
Traceback (most recent call last):
File "D:\IA\oobabooga_windows\text-generation-webui\server.py", line 872, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "D:\IA\oobabooga_windows\text-generation-webui\modules\models.py", line 90, in load_model
model = model.cuda()
File "D:\IA\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 905, in cuda
return self._apply(lambda t: t.cuda(device))
File "D:\IA\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "D:\IA\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "D:\IA\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "D:\IA\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "D:\IA\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 905, in
return self._apply(lambda t: t.cuda(device))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 11.00 GiB total capacity; 10.26 GiB already allocated; 0 bytes free; 10.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

4 replies

NotSkynet May 7, 2023

It takes about 14 gb to get running and around 20+ if you want to actually have a decent length content window like more than most models.

EnviralDesign May 7, 2023

Is it possible to load this into CPU and max out the context length and have it generate correctly?

NotSkynet May 7, 2023

Is it possible to load this into CPU and max out the context length and have it generate correctly?

Not sure. I may try doing that tomorrow or seeing if it’s possible to do at least for me because I have 64gb of ddr5 ram. I’m guessing it would run super slow though, but if I get it working I’ll let you know. In theory I just have to change the code that loads the model around a bit, but I’m not gonna make any promises as it could require too much work for me to do in a few hours. Technically it shouldn’t be much work at all, but every time I think something isn’t going to be much work I end up spending 8 hours debugging. I don’t really have the time to screw around with it for more than a few hours.

TheNitzelBot May 8, 2023

Same issue here, but with the chat model. If it goes no where, no worries but want to follow along none the less.

sarutobiumon · 2023-05-07T04:42:17Z

sarutobiumon
May 7, 2023

amazing stuff!

0 replies

thistleknot · 2023-05-07T15:53:05Z

thistleknot
May 7, 2023

I ran an update this morning and then attempted to run with the following

slightly diff model (still mosaic that requires the remote flag)
(venv) [root@pve0 text-generation-webui]# python server.py --model mpt-1b-redpajama-200b-dolly --cpu --listen --trust-remote-code
INFO:Gradio HTTP request redirected to localhost :)
WARNING:trust_remote_code is enabled. This is dangerous.
bin /mnt/distvol/text-generation-webui/venv/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda115_nocublaslt.so
INFO:Loading mpt-1b-redpajama-200b-dolly...
Traceback (most recent call last):
File "/mnt/distvol/text-generation-webui/server.py", line 872, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "/mnt/distvol/text-generation-webui/modules/models.py", line 219, in load_model
model = LoaderClass.from_pretrained(checkpoint, **params)
File "/mnt/distvol/text-generation-webui/venv/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 441, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/mnt/distvol/text-generation-webui/venv/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 919, in from_pretrained
raise ValueError(
ValueError: Loading models/mpt-1b-redpajama-200b-dolly requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.

0 replies

Alchete · 2023-05-07T18:12:41Z

Alchete
May 7, 2023
Author

A 4-bit version is now available: https://huggingface.co/OccamRazor/mpt-7b-storywriter-4bit-128g

EDIT:
Am getting the following error, just trying to load the model normally:

ERROR: Can't determine model type from model name. Please specify it manually using --model_type argument

The model_type in the config.json file is listed as "mpt". Is that something that would need to be added to the base webui toolkit?

2 replies

kuso-ge May 7, 2023

safetensors, would this need an updated GPTQ-for-llama?

Felixqian4160 May 9, 2023

ValueError: Loading models\mpt-7b-storywriter-4bit-128g requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.

NeoLoger · 2023-05-07T19:55:11Z

NeoLoger
May 7, 2023

I tried to run the mpt-7b-storywriter-4bit-128g model with --trust-remote-code but I still get an error that the code not set to trusted

Traceback (most recent call last):
  File "V:\ChatAI\oobabooga_windows\text-generation-webui\server.py", line 872, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "V:\ChatAI\oobabooga_windows\text-generation-webui\modules\models.py", line 159, in load_model
    model = load_quantized(model_name)
  File "V:\ChatAI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py", line 179, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "V:\ChatAI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py", line 37, in _load_quant
    config = AutoConfig.from_pretrained(model)
  File "V:\ChatAI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\auto\configuration_auto.py", line 919, in from_pretrained
    raise ValueError(
ValueError: Loading models\OccamRazor_mpt-7b-storywriter-4bit-128g requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

Dose it need to be changed anywhere else?

1 reply

vibeordie May 7, 2023

Hi Mate, what worked for me to resolve this issue has been:
git stash (as I had some changes done at modules/models.py and server.py)
git pull origin main
Then edited the models/OccamRazor..../config.json and changed init_device from cpu to "cuda"
Restarted the webui, select in model type dropdown llama and then the Reload Model. Should fix it.

Atomizer74 · 2023-05-07T23:05:08Z

Atomizer74
May 7, 2023

I am also trying out the 4bit version of this model, I managed to get it to load with these settings:

But on inference I get this error:

INFO:Loading OccamRazor_mpt-7b-storywriter-4bit-128g...
ERROR:Can't determine model type from model name. Please specify it manually using --model_type argument
INFO:Loading OccamRazor_mpt-7b-storywriter-4bit-128g...
INFO:Found the following quantized model: models\OccamRazor_mpt-7b-storywriter-4bit-128g\model.safetensors
C:\Users\Atomizer/.cache\huggingface\modules\transformers_modules\OccamRazor_mpt-7b-storywriter-4bit-128g\attention.py:148: UserWarning: Using `attn_impl: torch`. If your model does not use `alibi` or `prefix_lm` we recommend using `attn_impl: flash` otherwise we recommend using `attn_impl: triton`.
  warnings.warn('Using `attn_impl: torch`. If your model does not use `alibi` or ' + '`prefix_lm` we recommend using `attn_impl: flash` otherwise ' + 'we recommend using `attn_impl: triton`.')
You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.
INFO:Using the following device map for the quantized model:
INFO:Loaded the model in 1096.84 seconds.
Traceback (most recent call last):
  File "D:\AIGen\oobabooga_windows\text-generation-webui\modules\text_generation.py", line 234, in generate_reply_HF
    output = shared.model.generate(**generate_params)[0]
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
    return self.sample(
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
    outputs = self(
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Atomizer/.cache\huggingface\modules\transformers_modules\OccamRazor_mpt-7b-storywriter-4bit-128g\modeling_mpt.py", line 238, in forward
    outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Atomizer/.cache\huggingface\modules\transformers_modules\OccamRazor_mpt-7b-storywriter-4bit-128g\modeling_mpt.py", line 184, in forward
    (x, past_key_value) = block(x, past_key_value=past_key_value, attn_bias=attn_bias, attention_mask=attention_mask, is_causal=self.is_causal)
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Atomizer/.cache\huggingface\modules\transformers_modules\OccamRazor_mpt-7b-storywriter-4bit-128g\blocks.py", line 36, in forward
    (b, _, past_key_value) = self.attn(a, past_key_value=past_key_value, attn_bias=attn_bias, attention_mask=attention_mask, is_causal=is_causal)
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Atomizer/.cache\huggingface\modules\transformers_modules\OccamRazor_mpt-7b-storywriter-4bit-128g\attention.py", line 172, in forward
    return (self.out_proj(context), attn_weights, past_key_value)
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\accelerate\hooks.py", line 160, in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\accelerate\hooks.py", line 280, in pre_forward
    set_module_tensor_to_device(module, name, self.execution_device, value=self.weights_map[name])
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\accelerate\utils\offload.py", line 123, in __getitem__
    return self.dataset[f"{self.prefix}{key}"]
  File "D:\AIGen\oobabooga_windows\installer_files\env\lib\site-packages\accelerate\utils\offload.py", line 170, in __getitem__
    weight_info = self.index[key]
KeyError: 'transformer.blocks.15.attn.out_proj.wf1'
Output generated in 13.04 seconds (0.00 tokens/s, 0 tokens, context 349, seed 461376739)

Final bit of information, Windows 10 with 32GB of RAM, GTX 1080 Ti with 11GB of VRAM, used the 1click installer maybe about a week ago to do a fresh install of oobabooga, installation went a bit smoother then the previous times, was attempting to get the reeducator vicuna 13b model running but wasn't able to, but I could use both the gozfarb pygmalion 7b 4bit model and able to use the gpt4-x alpaca 13b 4bit model, so the install is working fine.

6 replies

Erika-wby May 9, 2023

as of now, mpt does not support multiple gpus. on a server with 2 t4s it gave the same error.

After infer_auto_device_map if model._no_split_modules is none this error will be thrown.

mikolodz May 9, 2023

@Erika-wby I managed to run the 4bit version, although it wasn't worth it. Storywriter just throws in some random bs, so I believe that we're not there yet with the support. Better wait few more days instead of fighting with the unknown.

acon96 May 11, 2023

@Erika-wby I was able to get it working on multiple GPUs by making these 2 changes to modeling_mpt.py that is pulled down by the models:

Add to the following variable to the MPTPreTrainedModel class right after base_model_prefix here

_no_split_modules = ["MPTBlock"]

In the forward function where it defines the logits variable here replace the line with this:

logits = F.linear(outputs.last_hidden_state.to("cuda:0"), self.transformer.wte.weight.to("cuda:0"))

ekolawole May 12, 2023

Hello @acon96 did this change help it to stop hallucinating, and generating random words?

acon96 May 12, 2023

@ekolawole not from what I can tell.

I've had success when I set temperature fairly low (<=0.25), then expand the model's vocab so it can still be creative still by setting top_k to something above 60, and finally reduce repetition_penalty to 1.08-1.15 which will help it generate longer text sequences that have repetition, like dialog. The biggest thing is that the default temperature value for the app of 0.7 causes the model to output garbage.

Lastly, make sure you give the model a decent bit of context to go off of. The more context you give it the more it seems to prevent it from going off the rails.

You can try these presets which have worked for me:

do_sample=True
top_p=0.5
top_k=65
temperature=0.21
repetition_penalty=1.13
typical_p=1.0

New 65K token storywriter model -- mpt-7b-storywriter #1815

Replies: 8 comments · 26 replies

oobabooga May 5, 2023 Maintainer

oobabooga May 5, 2023 Maintainer

Alchete May 5, 2023 Author

oobabooga May 5, 2023 Maintainer

Alchete May 7, 2023 Author

Replies: 8 comments 26 replies

oobabooga
May 5, 2023
Maintainer

oobabooga May 5, 2023
Maintainer

Alchete May 5, 2023
Author

oobabooga May 5, 2023
Maintainer

Alchete
May 7, 2023
Author