Support sp #2

HIT-cwh · 2024-04-16T07:28:20Z

No description provided.

* support mixtral varlen attn * support mixtral sp * add flash_attn_wo_mask, flash_attn_w_mask and varlen_flash_attn API * update mixtral readme

* fix position_ids bug * support qwen2 varlen attn and sp

fix attn mask

accept pytorch==2.2 as the bugs in triton 2.2 are fixed

* refine split_for_sequence_parallel API, the tensor to be split may not have shape (bs, seq_len, dim) * Expose the two interfaces, pre_process_for_sequence_parallel_attn and post_process_for_sequence_parallel_attn, to the user * Assert pytorch version != 2.1 * Remove the PyTorch version restriction when using sequence parallel * refine all_to_all op * split sequence in sft but not data_collate_fn * add sequence communications * fix lint * move all_to_all to communications * make compute_sequence_parallel_loss method private * fix sp docs * rename * add explanation about why grad_scale is needed * refine * add docstring * rename communications to comm

…nLM#568) Update dataset_info_hook.py

* refine split_for_sequence_parallel API, the tensor to be split may not have shape (bs, seq_len, dim) * Expose the two interfaces, pre_process_for_sequence_parallel_attn and post_process_for_sequence_parallel_attn, to the user * Assert pytorch version != 2.1 * Remove the PyTorch version restriction when using sequence parallel * refine all_to_all op * split sequence in sft but not data_collate_fn * add sequence communications * fix lint * move all_to_all to communications * make compute_sequence_parallel_loss method private * fix sp docs * rename * add explanation about why grad_scale is needed * refine * add docstring * rename communications to comm * add cohere prompt template * support cohere * add cohere readme * fix readme * fix readme

* fix rotary_seq_len in varlen attn * fix lint

* add custom sft dataset docs * add custom dataset template configs * add openai data format * refine doc * update (#2) * replace md with rst --------- Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com> Co-authored-by: pppppM <67539920+pppppM@users.noreply.github.com>

KooSung and others added 11 commits April 8, 2024 18:25

[Fix] Fix typo (InternLM#547)

bd6fe4c

[Feature] support mixtral varlen attn (InternLM#564)

cd71f66

* support mixtral varlen attn * support mixtral sp * add flash_attn_wo_mask, flash_attn_w_mask and varlen_flash_attn API * update mixtral readme

[Feature] Support qwen sp and varlen attn (InternLM#565)

0d89d94

* fix position_ids bug * support qwen2 varlen attn and sp

[Fix]Fix attention mask in default_collate_fn (InternLM#567)

899f87d

fix attn mask

Accept pytorch==2.2 as the bugs in triton 2.2 are fixed (InternLM#548)

8dc41d4

accept pytorch==2.2 as the bugs in triton 2.2 are fixed

[Fix] Enhance split_list to support value at the beginning (Inter…

ba9047d

…nLM#568) Update dataset_info_hook.py

[Fix] Fix rotary_seq_len in varlen attn in qwen (InternLM#574)

b155ece

* fix rotary_seq_len in varlen attn * fix lint

Merge branch 'main' of github.com:InternLM/xtuner into refactor-llm

6185c9b

support sp in TextFinetune

ca272bf

pppppM approved these changes Apr 19, 2024

View reviewed changes

pppppM merged commit b658d76 into pppppM:refactor-llm Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support sp #2

Support sp #2

HIT-cwh commented Apr 16, 2024

Support sp #2

Support sp #2

Conversation

HIT-cwh commented Apr 16, 2024