Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Conversation

@zhentaoyu
Copy link
Contributor

@zhentaoyu zhentaoyu commented Sep 15, 2023

Type of Change

BUG FIX && Enhancement
Align cpp beam_search with HF transformers repo implementation

Description

detail description
JIRA ticket: 835

TODO:

  • gen len reduction (early stopping) ---> stop gen more repeated text
  • deal with eos sentences and test more examples
  • accelerate beam score computations (log_softmax reductions) while keeping the same value as transformers
  • add beam search verbose output (debug, usage: cmake -DBEAM_SEARCH_VERBOSE=ON .., example: log.txt)
    - [ ] CI (will be added in another pr by using pybind )
  • pass MLPerf acc test (Q4_0 offline passed)

Expected Behavior & Potential Risk

numactl -l -C 0-55 ./build/bin/pybind_gptj fp32.bin && MLPerf offline acc test
acc reference: "ROUGE1", 42.9865 * 0.99, "ROUGE2", 20.1235 * 0.99, "ROUGEL", 29.9881 * 0.99, "GEN_LEN", 4016878*0.9
All TESTS TURN OFF kv_cache_jblas BY SETTING memory_type=KV_MEM_TYPE_F16

How has this PR been tested?

cpp: numactl -l -C 0-55 ./build/bin/pybind_gptj fp32.bin

hf:

from transformers import pipeline, set_seed, AutoModelForCausalLM, AutoTokenizer

model_dir = "finetuned-gptj"
tokenizer = AutoTokenizer.from_pretrained(model_dir)

model = AutoModelForCausalLM.from_pretrained(model_dir)
model.eval()

prompt = "Tell me 10 things about jazz music"
inputs = tokenizer(prompt, return_tensors="pt")

print("inputs", inputs)
generate_ids = model.generate(inputs.input_ids, max_new_tokens=128, min_new_tokens=30, early_stopping=True, num_beams=4)
ans = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(ans)

Dependency Change?

None

@zhentaoyu
Copy link
Contributor Author

zhentaoyu commented Sep 15, 2023

reference: "ROUGE1", 42.9865 * 0.99, "ROUGE2", 20.1235 * 0.99, "ROUGEL", 29.9881 * 0.99, "GEN_LEN", 4016878*0.9
model: finetuned-gpt-j-6b

MLPerf offline acc test:

  1. GPT-J Q4_0 (sample=500):
    {'rouge1': 43.5646, 'rouge2': 20.1762, 'rougeL': 29.8881, 'rougeLsum': 40.409, 'gen_len': 157276, 'gen_num': 500}
  2. GPT-J Q4_0 (all samples):
    {'rouge1': 43.4769, 'rouge2': 20.3621, 'rougeL': 30.118, 'rougeLsum': 40.5646, 'gen_len': 4277987, 'gen_num': 13368}
  3. GPT-J Q4_J_b128_f32_fp32_sym (500 sample): (Q4_J_group-size_compute-dtype_scale-dtype, same below)
    {'rouge1': 42.9608, 'rouge2': 19.6846, 'rougeL': 29.5861, 'rougeLsum': 40.0314, 'gen_len': 160590, 'gen_num': 500}
  4. GPT-J Q4_J_b128_int8_fp32_sym (500 sample):
    {'rouge1': 42.8281, 'rouge2': 19.6468, 'rougeL': 29.5079, 'rougeLsum': 39.6784, 'gen_len': 159330, 'gen_num': 500}
  5. GPT-J Q4_J_b32_f32_f32_sym (500 sample):
    {'rouge1': 43.6254, 'rouge2': 20.2235, 'rougeL': 29.9019, 'rougeLsum': 40.5376, 'gen_len': 159653, 'gen_num': 500}
  6. GPT-J Q4_J_b32_int8_f32_sym (500 sample):
    {'rouge1': 43.4207, 'rouge2': 20.0488, 'rougeL': 29.6654, 'rougeLsum': 40.3611, 'gen_len': 159351, 'gen_num': 500}

MLPerf server acc test:

  1. Q4_0 (500 sample, random):
    {'rouge1': 43.4207, 'rouge2': 20.0488, 'rougeL': 29.6654, 'rougeLsum': 40.3611, 'gen_len': 159351, 'gen_num': 500}
  2. Q4_0 (all samples):
    {'rouge1': 43.3847, 'rouge2': 20.3591, 'rougeL': 30.0988, 'rougeLsum': 40.4652, 'gen_len': 4299633, 'gen_num': 13368}

NEW ADD
Q4J_per_channel_asym (all samples, compute_type int8) offline acc: {'rouge1': 41.9995, 'rouge2': 18.9793, 'rougeL': 28.6274, 'rougeLsum': 39.1194, 'gen_len': 4485004, 'gen_num': 13368}
Q4J_perchannel_asym (all samples, compute_type: bf16) offline acc: {'rouge1': 43.3352, 'rouge2': 20.1831, 'rougeL': 29.8832, 'rougeLsum': 40.4381, 'gen_len': 4414947, 'gen_num': 13368}

@zhentaoyu zhentaoyu added WIP ITREX.cpp enhancement New feature or request labels Sep 15, 2023
@zhentaoyu zhentaoyu changed the title Align beam search [Cpp Graph] Align Cpp Beam Search Sep 15, 2023
@zhentaoyu
Copy link
Contributor Author

zhentaoyu commented Sep 19, 2023

scores and top_k tokens comparisons between cpp and hf transformers (fp32 dtye, prompt = "A spaceship lands on the moon" ):

  • hf transformers:
next_tokens: tensor([[  198,   198,   317,   475,   383, 38306,   319,   262]])
next_token_scores: tensor([[-1.4747, -2.2295, -4.0196, -4.0743, -4.5361, -4.9157, -4.9327, -5.1006]])
next_indices: tensor([[1, 0, 0, 2, 0, 2, 3, 2]])
beam_scores:  tensor([-1.4747, -2.2295, -4.0196, -4.0743])
beam_next_tokens: tensor([198, 198, 317, 475])
beam_idx: tensor([1, 0, 0, 2])
.....
......
=========== final==========
best: [tensor([   32, 40663,  8604,   319,   262,  8824,   764,   198,    32, 40663,
          468, 11406,   319,   262,  8824,   764,   198,    32, 40663,   468,
        11406,   319,   262,  8824,   764,   198,    32, 40663,   468, 11406,
          319,   262,  8824,   764,   198,    32, 40663,   468, 11406,   319,
          262,  8824,   764])]
best_score: tensor([-0.1742])
best_indices: [None]
A spaceship lands on the moon .
A spaceship has landed on the moon .
A spaceship has landed on the moon .
A spaceship has landed on the moon .
A spaceship has landed on the moon .
  • cpp:
13: . 
764:  . 
11: , 
1377:  -- 
====================== 
198: 
, score:  -1.474517, beam_idx: 1 
198: 
, score:  -2.230055, beam_idx: 0 
317:  A, score:  -4.019489, beam_idx: 0 
475:  but, score:  -4.074911, beam_idx: 2 
383:  The, score:  -4.536015, beam_idx: 0 
38306:  sparks, score:  -4.914916, beam_idx: 2 
319:  on, score:  -4.932207, beam_idx: 3 
262:  the, score:  -5.100197, beam_idx: 2 


Current beams:
beams[0]: length: 2, score:    -2.230055, eos: 0, tokens:
13: ., 198: 
, 
beams[1]: length: 2, score:    -4.019489, eos: 0, tokens:
13: ., 317:  A, 
beams[2]: length: 2, score:    -1.474517, eos: 0, tokens:
764:  ., 198: 
, 
beams[3]: length: 2, score:    -4.074911, eos: 0, tokens:
11: ,, 475:  but, 
.....
.....
Final beam:
length: 37, score:    -0.174138, eos: 0, tokens:
764:  ., 198: 
, 32: A, 40663:  spaceship, 468:  has, 11406:  landed, 319:  on, 262:  the, 8824:  moon, 764:  ., 198: 
, 32: A, 40663:  spaceship, 468:  has, 11406:  landed, 319:  on, 262:  the, 8824:  moon, 764:  ., 198: 
, 32: A, 40663:  spaceship, 468:  has, 11406:  landed, 319:  on, 262:  the, 8824:  moon, 764:  ., 198: 
, 32: A, 40663:  spaceship, 468:  has, 11406:  landed, 319:  on, 262:  the, 8824:  moon, 764:  ., 
A spaceship lands on the moon .
A spaceship has landed on the moon .
A spaceship has landed on the moon .
A spaceship has landed on the moon .
A spaceship has landed on the moon .

@zhentaoyu
Copy link
Contributor Author

offline test cases for reference (fp32 dtype, max_new_tokens=128, min_new_tokens=30)

  1. prompt = "Tell me 10 things about jazz music"

    • early_stopping=true
      hf: Tell me 10 things about jazz music and I will tell you 10 things about jazz music.” Jazz music is all over the radio and television, but few people understand it . This week, jazz music is back on the radio and television in a big way . Jazz music is all over the radio and television, but few people understand it .

      cpp: Tell me 10 things about jazz music and I will tell you 10 things about jazz music.” Jazz music is all over the radio and television, but few people understand it . This week, jazz music is back on the radio and television in a big way . Jazz music is all over the radio and television, but few people understand it .

    • early_stopping=false
      hf: Tell me 10 things about jazz music and I will tell you 10 things about jazz music.” Jazz music is all over the radio and television, but few people understand it . This week, jazz music is back on the radio and television in the U.S. Jazz music is all over the radio and television, but few people understand it . This week, jazz music is back on the radio and television in the U.S. Jazz music is all over the radio and television, but few people understand it . This week, jazz music is back on the radio and television in the U.S. Jazz music

      cpp: Tell me 10 things about jazz music and I will tell you 10 things about jazz music.” Jazz music is all over the radio and television, but few people understand it . This week, jazz music is back on the radio and television in the U.S. Jazz music is all over the radio and television, but few people understand it . This week, jazz music is back on the radio and television in the U.S. Jazz music is all over the radio and television, but few people understand it . This week, jazz music is back on the radio and television in the U.S. Jazz music

  2. prompt = "Once upon a time"

    • early_stopping=true
      hf: Once upon a time... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope...

      cpp: Once upon a time... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope...

    • early_stopping=false
      hf: Once upon a time... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young

      cpp: Once upon a time... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young and full of hope... ... when the world was young

  3. long prompt =

    "2017: It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing "
    "on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. "
    "There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went "
    "right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. "
    "Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of "
    "enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I "
    "could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that "
    "evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt "
    "permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a "
    "species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. "
    "That control goes against what evolution is supposed to be like. If you allow the user to pick how to evolve "
    "something, it's not evolution anymore - it's the equivalent of intelligent design, the fable invented by "
    "creationists to combat the idea of evolution. Being agnostic and a Pastafarian, that's not something that "
    "rubbed me the right way. Hence, my biggest dillema when deciding what to create was not with what I wanted to "
    "create, but with what I did not. I didn't want to create an 'intelligent design' simulator and wrongly call "
    "it evolution. This is a problem, of course, every other contestant also had to face. And judging by the "
    "entries submitted, not many managed to work around it. I'd say the only real solution was through the use of "
    "artificial selection, somehow. So far, I haven't seen any entry using this at its core gameplay. Alas, this "
    "is just a fun competition and after a while I decided not to be as strict with the game idea, and allowed "
    "myself to pick whatever I thought would work out. My initial idea was to create something where humanity "
    "tried to evolve to a next level, but had some kind of foe trying to stop them from doing so. I kind of had "
    "this image of human souls flying in space towards a monolith or a space baby (all based in 2001: A Space "
    "Odyssey of course) but I couldn't think of compelling (read: serious) mechanics for that. Borgs were my next "
    "inspiration, as their whole hypothesis fit pretty well into the evolution theme. But how to make it work? Are "
    "you the borg, or fighting the Borg? The third and final idea came to me through my girlfriend, who somehow "
    "gave me the idea of making something about the evolution of Pasta. The more I thought about it the more it "
    "sounded like it would work, so I decided to go with it. Conversations with my inspiring co-worker Roushey "
    "(who also created the 'Mechanical Underdogs' signature logo for my intros) further matured the concept, as it "
    "involved into the idea of having individual pieces of pasta flying around and trying to evolve until they "
    "became all-powerful. A secondary idea here was that the game would work to explain how the Flying Spaghetti "
    "Monster came to exist - by evolving from a normal dinner table. So the idea evolved more or less into this: "
    "you are sitting a table. You have your own plate, with is your 'base'. There are 5 other guests at the table, "
    "each with their own plate. Your plate can spawn little pieces of pasta. You do so by 'ordering' them through "
    "a menu. Some pastas are better than others; some are faster, some are stronger. They have varying 'costs', "
    "which are debited from your credits (you start with a number of credits). Once spawned, your pastas start "
    "flying around. Their instinct is to fly to other plates, in order to conquer them (the objective of the game "
    "is having your pasta conquer all the plates on the table). But they are really autonomous, so after being "
    "spawned, you have no control over your pasta (think DotA or LoL creeps). Your pasta doesn't like other "
    "people's pasta, so if they meet, they shoot sauce at each other until one dies. You get credits for other "
    "pastas your own pasta kill. Once a pasta is in the vicinity of a plate, it starts conquering it for its team. "
    "It takes around 10 seconds for a plate to be conquered; less if more pasta from the same team are around. If "
    "pasta from other team are around, though, they get locked down in their attempt, unable to conquer the plate, "
    "until one of them die (think Battlefield's standard 'Conquest' mode). You get points every second for every "
    "plate you own. Over time, the concept also evolved to use an Italian bistro as its main scenario. Carlos, "
    "Carlos' Bistro's founder and owner Setup No major changes were made from my work setup. I used FDT and "
    "Starling creating an Adobe AIR (ActionScript) project, all tools or frameworks I already had some knowledge "
    "with. One big change for me was that I livestreamed my work through a twitch.tv account. This was a new thing "
    "for me. As recommended by Roushey, I used a program called XSplit and I got to say, it is pretty amazing. It "
    "made the livestream pretty effortless and the features are awesome, even for the free version. It was great "
    "to have some of my friends watch me, and then interact with them and random people through chat. It was also "
    "good knowing that I was also recording a local version of the files, so I could make a timelapse video later. "
    "Knowing the video was being recorded also made me a lot more self-conscious about my computer use, as if "
    "someone was watching over my shoulder. It made me realize that sometimes I spend too much time in seemingly "
    "inane tasks (I ended up wasting the longest time just to get some text alignment the way I wanted - it'll "
    "probably drive someone crazy if they watch it) and that I do way too many typos where writing code. I pretty "
    "much spend half of the time writing a line and the other half fixing the crazy characters in it. My own "
    "stream was probably boring to watch since I was coding for the most time. But livestreaming is one of the "
    "cool things to do as a spectator too. It was great seeing other people working - I had a few tabs opened on "
    "my second monitor all the time. It's actually a bit sad, because if I could, I could have spent the whole "
    "weekend just watching other people working! But I had to do my own work, so I'd only do it once in a while, "
    "when resting for a bit. Design Although I wanted some simple, low-fi, high-contrast kind of design, I ended "
    "up going with somewhat realistic (vector) art. I think it worked very well, fitting the mood of the game, but "
    "I also went overboard. For example: to know the state of a plate (who owns it, who's conquering it and how "
    "much time they have left before conquering it, which pasta units are in the queue, etc), you have to look at "
    "the plate's bill. The problem I realized when doing some tests is that people never look at the bill! They "
    "think it's some kind of prop, so they never actually read its details. Plus, if you're zoomed out too much, "
    "you can't actually read it, so it's hard to know what's going on with the game until you zoom in to the area "
    "of a specific plate. One other solution that didn't turn out to be as perfect as I thought was how to "
    "indicate who a plate base belongs to. In the game, that's indicated by the plate's decoration - its color "
    "denotes the team owner. But it's something that fits so well into the design that people never realized it, "
    "until they were told about it. In the end, the idea of going with a full physical metaphor is one that should "
    "be done with care. Things that are very important risk becoming background noise, unless the player knows its "
    "importance. Originally, I wanted to avoid any kind of heads-up display in my game. In the end, I ended up "
    "adding it at the bottom to indicate your credits and bases owned, as well as the hideous "
    "out-of-place-and-still-not-obvious 'Call Waiter' button. But in hindsight, I should have gone with a simple "
    "HUD from the start, especially one that indicated each team's colors and general state of the game without "
    "the need for zooming in and out. Development Development went fast.",

    • early_stopping=true or false
      hf: I mean, I had a lot of talk about what to do, but in the end, the only thing that really mattered was getting it done. I had a lot of talk about what to do, but in the end, the only thing that really mattered was getting it done. I had a lot of talk about what to do, but in the end, the only thing that really mattered was getting it done. I had a lot of talk about what to do, but in the end, the only thing that really mattered was getting it done. I had a lot of talk about what to do, but in the end, the

      cpp: I mean, I had a lot of talk about what to do, but in the end, the only thing that really mattered was getting it done. I had a lot of talk about what to do, but in the end, the only thing that really mattered was getting it done. I had a lot of talk about what to do, but in the end, the only thing that really mattered was getting it done. I had a lot of talk about what to do, but in the end, the only thing that really mattered was getting it done. I had a lot of talk about what to do, but in the end, the

@zhentaoyu zhentaoyu force-pushed the align_beam_search branch 2 times, most recently from dd144c0 to 2abccb6 Compare September 21, 2023 06:52
@a32543254
Copy link
Contributor

consider use extension test to keep this beam search feature acc is good

@zhentaoyu
Copy link
Contributor Author

zhentaoyu commented Sep 22, 2023

consider use extension test to keep this beam search feature acc is good

We should. How about running the pybind_gptj binary in CI machine as we testing now, @VincyZhang ? The output text will not change in fp32 dtype except if someone modifies beam_search incorrectly.

@zhentaoyu zhentaoyu force-pushed the align_beam_search branch 3 times, most recently from 4704024 to 27184af Compare September 25, 2023 08:27
@zhentaoyu zhentaoyu marked this pull request as ready for review September 25, 2023 08:28
@zhentaoyu zhentaoyu requested a review from airMeng as a code owner September 25, 2023 08:28
@zhentaoyu zhentaoyu force-pushed the align_beam_search branch 3 times, most recently from fdd061b to d262675 Compare September 26, 2023 08:10
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>
Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants