Skip to content

Latest commit

ย 

History

History
73 lines (49 loc) ยท 4.91 KB

perf_infer_cpu.md

File metadata and controls

73 lines (49 loc) ยท 4.91 KB

CPU์—์„œ ํšจ์œจ์ ์ธ ์ถ”๋ก ํ•˜๊ธฐ [[efficient-inference-on-cpu]]

์ด ๊ฐ€์ด๋“œ๋Š” CPU์—์„œ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์„ ํšจ์œจ์ ์œผ๋กœ ์ถ”๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ์ค‘์ ์„ ๋‘๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋” ๋น ๋ฅธ ์ถ”๋ก ์„ ์œ„ํ•œ BetterTransformer [[bettertransformer-for-faster-inference]]

์šฐ๋ฆฌ๋Š” ์ตœ๊ทผ CPU์—์„œ ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€ ๋ฐ ์˜ค๋””์˜ค ๋ชจ๋ธ์˜ ๋น ๋ฅธ ์ถ”๋ก ์„ ์œ„ํ•ด BetterTransformer๋ฅผ ํ†ตํ•ฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ํ†ตํ•ฉ์— ๋Œ€ํ•œ ๋” ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์ด ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

PyTorch JIT ๋ชจ๋“œ (TorchScript) [[pytorch-jitmode-torchscript]]

TorchScript๋Š” PyTorch ์ฝ”๋“œ์—์„œ ์ง๋ ฌํ™”์™€ ์ตœ์ ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ์„ ์ƒ์„ฑํ• ๋•Œ ์“ฐ์ž…๋‹ˆ๋‹ค. TorchScript๋กœ ๋งŒ๋“ค์–ด์ง„ ํ”„๋กœ๊ทธ๋žจ์€ ๊ธฐ์กด Python ํ”„๋กœ์„ธ์Šค์—์„œ ์ €์žฅํ•œ ๋’ค, ์ข…์†์„ฑ์ด ์—†๋Š” ์ƒˆ๋กœ์šด ํ”„๋กœ์„ธ์Šค๋กœ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. PyTorch์˜ ๊ธฐ๋ณธ ์„ค์ •์ธ eager ๋ชจ๋“œ์™€ ๋น„๊ตํ–ˆ์„๋•Œ, jit ๋ชจ๋“œ๋Š” ์—ฐ์‚ฐ์ž ๊ฒฐํ•ฉ๊ณผ ๊ฐ™์€ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋ก ์„ ํ†ตํ•ด ๋ชจ๋ธ ์ถ”๋ก ์—์„œ ๋Œ€๋ถ€๋ถ„ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

TorchScript์— ๋Œ€ํ•œ ์นœ์ ˆํ•œ ์†Œ๊ฐœ๋Š” PyTorch TorchScript ํŠœํ† ๋ฆฌ์–ผ์„ ์ฐธ์กฐํ•˜์„ธ์š”.

JIT ๋ชจ๋“œ์™€ ํ•จ๊ป˜ํ•˜๋Š” IPEX ๊ทธ๋ž˜ํ”„ ์ตœ์ ํ™” [[ipex-graph-optimization-with-jitmode]]

Intelยฎ Extension for PyTorch(IPEX)๋Š” Transformers ๊ณ„์—ด ๋ชจ๋ธ์˜ jit ๋ชจ๋“œ์—์„œ ์ถ”๊ฐ€์ ์ธ ์ตœ์ ํ™”๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. jit ๋ชจ๋“œ์™€ ๋”๋ถˆ์–ด Intelยฎ Extension for PyTorch(IPEX)๋ฅผ ํ™œ์šฉํ•˜์‹œ๊ธธ ๊ฐ•๋ ฅํžˆ ๊ถŒ์žฅ๋“œ๋ฆฝ๋‹ˆ๋‹ค. Transformers ๋ชจ๋ธ์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ์ผ๋ถ€ ์—ฐ์‚ฐ์ž ํŒจํ„ด์€ ์ด๋ฏธ jit ๋ชจ๋“œ ์—ฐ์‚ฐ์ž ๊ฒฐํ•ฉ(operator fusion)์˜ ํ˜•ํƒœ๋กœ Intelยฎ Extension for PyTorch(IPEX)์—์„œ ์ง€์›๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Multi-head-attention, Concat Linear, Linear+Add, Linear+Gelu, Add+LayerNorm ๊ฒฐํ•ฉ ํŒจํ„ด ๋“ฑ์ด ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ ํ™œ์šฉํ–ˆ์„ ๋•Œ ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. ์—ฐ์‚ฐ์ž ๊ฒฐํ•ฉ์˜ ์ด์ ์€ ์‚ฌ์šฉ์ž์—๊ฒŒ ๊ณ ์Šค๋ž€ํžˆ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค. ๋ถ„์„์— ๋”ฐ๋ฅด๋ฉด, ์งˆ์˜ ์‘๋‹ต, ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ๋ฐ ํ† ํฐ ๋ถ„๋ฅ˜์™€ ๊ฐ™์€ ๊ฐ€์žฅ ์ธ๊ธฐ ์žˆ๋Š” NLP ํƒœ์Šคํฌ ์ค‘ ์•ฝ 70%๊ฐ€ ์ด๋Ÿฌํ•œ ๊ฒฐํ•ฉ ํŒจํ„ด์„ ์‚ฌ์šฉํ•˜์—ฌ Float32 ์ •๋ฐ€๋„์™€ BFloat16 ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ๋ชจ๋‘์—์„œ ์„ฑ๋Šฅ์ƒ์˜ ์ด์ ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

IPEX ๊ทธ๋ž˜ํ”„ ์ตœ์ ํ™”์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์ •๋ณด๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

IPEX ์„ค์น˜: [[ipex-installation]]

IPEX ๋ฐฐํฌ ์ฃผ๊ธฐ๋Š” PyTorch๋ฅผ ๋”ฐ๋ผ์„œ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. ์ž์„ธํ•œ ์ •๋ณด๋Š” IPEX ์„ค์น˜ ๋ฐฉ๋ฒ•์„ ํ™•์ธํ•˜์„ธ์š”.

JIT ๋ชจ๋“œ ์‚ฌ์šฉ๋ฒ• [[usage-of-jitmode]]

ํ‰๊ฐ€ ๋˜๋Š” ์˜ˆ์ธก์„ ์œ„ํ•ด Trainer์—์„œ JIT ๋ชจ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด Trainer์˜ ๋ช…๋ น ์ธ์ˆ˜์— jit_mode_eval์„ ์ถ”๊ฐ€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

PyTorch์˜ ๋ฒ„์ „์ด 1.14.0 ์ด์ƒ์ด๋ผ๋ฉด, jit ๋ชจ๋“œ๋Š” jit.trace์—์„œ dict ์ž…๋ ฅ์ด ์ง€์›๋˜๋ฏ€๋กœ, ๋ชจ๋“  ๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ณผ ํ‰๊ฐ€๊ฐ€ ๊ฐœ์„ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

PyTorch์˜ ๋ฒ„์ „์ด 1.14.0 ๋ฏธ๋งŒ์ด๋ผ๋ฉด, ์งˆ์˜ ์‘๋‹ต ๋ชจ๋ธ๊ณผ ๊ฐ™์ด forward ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ˆœ์„œ๊ฐ€ jit.trace์˜ ํŠœํ”Œ ์ž…๋ ฅ ์ˆœ์„œ์™€ ์ผ์น˜ํ•˜๋Š” ๋ชจ๋ธ์— ๋“์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ๋ชจ๋ธ๊ณผ ๊ฐ™์ด forward ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆœ์„œ๊ฐ€ jit.trace์˜ ํŠœํ”Œ ์ž…๋ ฅ ์ˆœ์„œ์™€ ๋‹ค๋ฅธ ๊ฒฝ์šฐ, jit.trace๊ฐ€ ์‹คํŒจํ•˜๋ฉฐ ์˜ˆ์™ธ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ์˜ˆ์™ธ์ƒํ™ฉ์„ ์‚ฌ์šฉ์ž์—๊ฒŒ ์•Œ๋ฆฌ๊ธฐ ์œ„ํ•ด Logging์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

Transformers ์งˆ์˜ ์‘๋‹ต์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€ ์˜ˆ์‹œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

  • CPU์—์„œ jit ๋ชจ๋“œ๋ฅผ ์‚ฌ์šฉํ•œ ์ถ”๋ก :
python run_qa.py \
--model_name_or_path csarron/bert-base-uncased-squad-v1 \
--dataset_name squad \
--do_eval \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/ \
--no_cuda \
--jit_mode_eval 
  • CPU์—์„œ IPEX์™€ ํ•จ๊ป˜ jit ๋ชจ๋“œ๋ฅผ ์‚ฌ์šฉํ•œ ์ถ”๋ก :
python run_qa.py \
--model_name_or_path csarron/bert-base-uncased-squad-v1 \
--dataset_name squad \
--do_eval \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/ \
--no_cuda \
--use_ipex \
--jit_mode_eval