Integrate text-generation pipeline from inference.py to TSModelForCausalLM #300

jiqing-feng · 2023-04-23T07:53:25Z

Hi @echarlaix . I have integrated my changes from inference.py to TSModelForCausalLM, refer to: #227.

I also use the function export_model to enable ipex model in TSModelForCausalLM. Would like your opinion. Thanks!

HuggingFaceDocBuilderDev · 2023-04-23T07:59:16Z

The documentation is not available anymore as the PR was closed or merged.

echarlaix

Thanks a lot for this integration @jiqing-feng

echarlaix · 2023-04-24T17:06:51Z

optimum/intel/generation/modeling.py

@@ -81,9 +81,34 @@ def load_model(file_name: Union[str, Path]):
        torch.jit.freeze(model.eval())
        return model

+    @staticmethod
+    def jit_trace(model: PreTrainedModel, task: str, config: PretrainedConfig, use_cache: bool = True):


I think it would make sense to create a separate function trace (working for all architectures not only causal LM) and to use it in TSModelForCausalLM._from_transformers instead of having jit_trace and export_model methods.

jiqing-feng · 2023-04-25T12:27:11Z

Hi @echarlaix , I have created a separate function trace in a new file, could you please help to review it? Thanks!

BTW, the failed check seems not related to my changes.

echarlaix · 2023-04-25T15:14:11Z

optimum/intel/generation/tracing.py

+from optimum.exporters import TasksManager
+
+
+def prepare_jit_inputs(model: PreTrainedModel, task: str, use_cache: bool = True):


Could we keep prepare_jit_inputs and jit_trace in modeling.py ?

echarlaix · 2023-04-25T15:17:29Z

optimum/intel/ipex/inference.py

+                                        if self._model.task == "text-generation":
+                                            jit_model = jit_trace(
+                                                model=model,
+                                                task=self._model.task,
+                                                use_cache=self._original.config.use_cache,
+                                            )
+                                            model = TSModelForCausalLM(
+                                                model=jit_model,
+                                                config=self._original.config,
+                                                use_cache=self._original.config.use_cache,
+                                            )
+                                        else:
+                                            jit_inputs = []
+                                            dummy_input = self._model.tokenizer("")
+                                            for key in dummy_input:
+                                                jit_inputs.append(
+                                                    torch.ones((1, len(dummy_input[key])), dtype=torch.long)
+                                                )
+                                            model = torch.jit.trace(model, jit_inputs, strict=False)
+                                            model = torch.jit.freeze(model)
+                                            model(*jit_inputs)
+                                            model(*jit_inputs)


Why not :

Suggested change

if self._model.task == "text-generation":

jit_model = jit_trace(

model=model,

task=self._model.task,

use_cache=self._original.config.use_cache,

)

model = TSModelForCausalLM(

model=jit_model,

config=self._original.config,

use_cache=self._original.config.use_cache,

)

else:

jit_inputs = []

dummy_input = self._model.tokenizer("")

for key in dummy_input:

jit_inputs.append(

torch.ones((1, len(dummy_input[key])), dtype=torch.long)

)

model = torch.jit.trace(model, jit_inputs, strict=False)

model = torch.jit.freeze(model)

model(*jit_inputs)

model(*jit_inputs)

model = jit_trace(

model=model,

task=self._model.task,

use_cache=self._original.config.use_cache,

)

if self._model.task == "text-generation":

model = TSModelForCausalLM(

model=model,

config=self._original.config,

use_cache=self._original.config.use_cache,

)

echarlaix

Thanks for integrating TSModelForCausalLM to inference_mode @jiqing-feng !!

jiqing-feng · 2023-04-26T03:30:24Z

Hi @echarlaix , thanks for your advice. I have updated the code. Could you please review it? Thanks!

echarlaix · 2023-04-26T12:50:50Z

optimum/intel/ipex/inference.py

-                                        model = torch.jit.freeze(model)
-                                        model(*jit_inputs)
-                                        model(*jit_inputs)
+                                        jit_model = jit_trace(


Shouldn't it be

Suggested change

jit_model = jit_trace(

model = jit_trace(

echarlaix · 2023-04-26T12:54:00Z

optimum/intel/ipex/inference.py

-                                if self._model.task == "text-generation":
-                                    self._model.model = _ModelGenerationWrapper(model, self._original)
+                                if self._model.task == "text-generation" and self._jit:
+                                    self._model.model = model


Shouldn't it be for all cases :
self._model.model = _ModelFallbackWrapper(model, self._original)

Hi @echarlaix . If I use _ModelFallbackWrapper, when I execute model.generate, it will go to the func __getattr__(self, item), and will return getattr(self._default, item) so I actually execute self._default.generate. I didn't use _ModelFallbackWrapper because I cannot use the generation of my optimized model.

Why not subclass from _ModelFallbackWrapper to enable it then ?

@echarlaix , I think we should enhance TSModelForCausalLM for all text-generation tasks, then for text-generation task, we don't need to use _ModelFallbackWrapper, is it OK?

echarlaix · 2023-04-26T12:55:53Z

optimum/intel/ipex/inference.py

@@ -188,16 +101,23 @@ def __enter__(self):
                            with torch.cpu.amp.autocast(enabled=(self._dtype == torch.bfloat16)), torch.no_grad():
                                if self._model.tokenizer is not None and self._jit:


Shouldn't it be :

Suggested change

if self._model.tokenizer is not None and self._jit:

if self._jit:

jiqing-feng · 2023-05-06T03:07:36Z

Hi, @echarlaix . I subclass a new class from _ModelFallbackWrapper and use it, but ipex.optimize() will change the shape of bloom models so the check failed. Refer to https://github.com/huggingface/transformers/blob/main/src/transformers/models/bloom/modeling_bloom.py#L355 and https://huggingface.co/hf-internal-testing/tiny-random-bloom/blob/main/config.json#L21

BTW, could we merge it if there are no big issues? I have the following PR for enabling text2text-generation and other generation tasks. Some issues like variable naming could be fixed in the next PR.
Thanks!

jiqing-feng · 2023-05-09T04:43:37Z

Hi @echarlaix , I have updated the test cases of 310. Could you help me have a look at the failed check? Thanks!

sywangyi · 2023-05-16T00:15:13Z

@mfuntowicz please help review, This PR utilizes the TSModelForCausalLM and fix the jit issue in llama text generation

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

jiqing-feng force-pushed the integrate branch from af071c5 to 658af4a Compare April 23, 2023 10:41

echarlaix reviewed Apr 24, 2023

View reviewed changes

echarlaix reviewed Apr 25, 2023

View reviewed changes

echarlaix reviewed Apr 26, 2023

View reviewed changes

jiqing-feng force-pushed the integrate branch from 59854a5 to 6ada2ad Compare May 9, 2023 01:39

jiqing-feng force-pushed the integrate branch 2 times, most recently from feb5ac7 to 2395ea2 Compare May 9, 2023 07:36

mfuntowicz self-assigned this May 16, 2023

jiqing-feng and others added 15 commits May 17, 2023 10:50

Integrate ipex and TSModelForCausalLM

c6ca393

fix code style

d95be9b

add jit_trace file

5203d65

fix code style

2d259c0

fix style

f00709f

fix jit trace

91c7c36

fix style

8f74ffb

remove tracing file

d98958e

subclass _ModelFallbackWrapper to enable generation

faaee47

.

88ea722

rm

60f38c1

update with ipex test

999536d

fix format

350fa68

outputs type

a8acb7a

fix all task jit issue

b745639

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

mfuntowicz force-pushed the integrate branch from 3b694fc to b745639 Compare May 17, 2023 08:54

mfuntowicz merged commit af974e3 into huggingface:main May 22, 2023

jiqing-feng mentioned this pull request May 25, 2023

add TSModelForSeq2SeqLM for enabling jit trace on text2text-generation #322

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate text-generation pipeline from inference.py to TSModelForCausalLM #300

Integrate text-generation pipeline from inference.py to TSModelForCausalLM #300

jiqing-feng commented Apr 23, 2023

HuggingFaceDocBuilderDev commented Apr 23, 2023 •

edited

Loading

echarlaix left a comment

echarlaix Apr 24, 2023

jiqing-feng commented Apr 25, 2023

echarlaix Apr 25, 2023

echarlaix Apr 25, 2023

echarlaix left a comment

jiqing-feng commented Apr 26, 2023

echarlaix Apr 26, 2023

echarlaix Apr 26, 2023

jiqing-feng Apr 27, 2023

echarlaix May 4, 2023

PenghuiCheng May 6, 2023

echarlaix Apr 26, 2023

jiqing-feng commented May 6, 2023 •

edited

Loading

jiqing-feng commented May 9, 2023 •

edited

Loading

sywangyi commented May 16, 2023

		from optimum.exporters import TasksManager


		def prepare_jit_inputs(model: PreTrainedModel, task: str, use_cache: bool = True):

		@@ -188,16 +101,23 @@ def __enter__(self):
		with torch.cpu.amp.autocast(enabled=(self._dtype == torch.bfloat16)), torch.no_grad():
		if self._model.tokenizer is not None and self._jit:

	if self._model.tokenizer is not None and self._jit:
	if self._jit:

Integrate text-generation pipeline from inference.py to TSModelForCausalLM #300

Integrate text-generation pipeline from inference.py to TSModelForCausalLM #300

Conversation

jiqing-feng commented Apr 23, 2023

HuggingFaceDocBuilderDev commented Apr 23, 2023 • edited Loading

echarlaix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiqing-feng commented Apr 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

jiqing-feng commented Apr 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiqing-feng commented May 6, 2023 • edited Loading

jiqing-feng commented May 9, 2023 • edited Loading

sywangyi commented May 16, 2023

HuggingFaceDocBuilderDev commented Apr 23, 2023 •

edited

Loading

jiqing-feng commented May 6, 2023 •

edited

Loading

jiqing-feng commented May 9, 2023 •

edited

Loading