Enable IPEXModel on XPU #663

jiqing-feng · 2024-04-16T03:42:54Z

Hi @echarlaix . I want to enable all model utils in ipex (modeling_utils) on XPU; it may need some changes including another if-branch in forward or 2 forward functions (1 for CPU and 1 for GPU), the k-v cache is also different.

Is there any XPU issue on optimum-intel that may block our work, like the XPU version and CI tests？ I also need your integration advice. Thx!

echarlaix · 2024-04-16T16:05:23Z

Hi @jiqing-feng, would it be a similar integration to what was integrated in ipex-llm ?

jiqing-feng · 2024-04-17T08:17:29Z

Hi @jiqing-feng, would it be a similar integration to what was integrated in ipex-llm ?

Not exactly the same, we plan to keep only one attn forward but will split into different parts and will let tensor device to chose which op should be used, like

llama_attn_forward:
        key_cache, value_cache = preprocess_for_optimize(hidden_states, past_key_value, kwargs)
        query, key, value = self.qkv_gemm(hidden_states, key_cache, value_cache, kwargs)
        key, value = self.rope(key, value, position_ids, past_key_value, kwargs)
        present = get_present(key, value, past_key_value)
        attn_output, attn_weight, past_key_value = self.sdpa(query, key, value, attention_mask, past_key_value, kwargs)
        attn_output = attn_output.transpose(1, 2)
        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
        if not output_attentions:
             attn_weights = None
        return attn_output, attn_weights, past_key_value 

self.sdpa:
        if cpu:   
            sdpa = self.ipex_scale_dot_product
        elif xpu:
           sdpa = self.sdpa_xpu
    
       (attn_output, attn_weights, past_key_value) =  sdpa (
                query,
                key,
                value,
                math.sqrt(self.head_dim),
                past_key_value,
                None,
                attention_mask,)
    
        return attn_output, attn_weights, past_key_value

echarlaix · 2024-04-22T12:38:26Z

For me it would make sense to keep this integration to ipex-llm and to only enable loading of exported model in optimum-intel (through IPEXModel), what do you think ?

echarlaix · 2024-05-14T23:38:50Z

Hi @jiqing-feng, I see that different llama modeling (and other additional architectures) were introduced in both ipex and ipex-llm to introduce ipex optimization. I think redefining the modeling of transformers modeling (for different architecture and different optimziation) is not something that we want to introduce in optimum-intel as it will results in significant code additions which will be difficult to maintain in the future and more importantly might cause issues in the future for future transformers release (this happened for example after transformers v4.40.0 release for the openvino export as the model is patched before export #682), having such additions could also results in having much constrained transformers or even torch version.

For these reasons I'd be in favor of keeping modeling_utils only for adding changes that are required for the export (like done in https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/utils/modeling_utils.py#L25) and to move the rest to an other repo (itrex or ipex-llm could be good candidates for example) that could be used by optimum-intel by checking for a specific transformers version if compatible in which case we could overwrite the modeling). What do you think ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable IPEXModel on XPU #663

Enable IPEXModel on XPU #663

jiqing-feng commented Apr 16, 2024 •

edited

echarlaix commented Apr 16, 2024

jiqing-feng commented Apr 17, 2024 •

edited

echarlaix commented Apr 22, 2024

echarlaix commented May 14, 2024 •

edited

Enable IPEXModel on XPU #663

Enable IPEXModel on XPU #663

Comments

jiqing-feng commented Apr 16, 2024 • edited

echarlaix commented Apr 16, 2024

jiqing-feng commented Apr 17, 2024 • edited

echarlaix commented Apr 22, 2024

echarlaix commented May 14, 2024 • edited

jiqing-feng commented Apr 16, 2024 •

edited

jiqing-feng commented Apr 17, 2024 •

edited

echarlaix commented May 14, 2024 •

edited