You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I have been studying your code. However, It seems to me that your implemention will not expand the kv cache during the decoding phase. The follow code is excerpted from the function def _concatenate_to_cache in llama.py.
In this function, we will only update cached_key and cached_value with the newly-generated key/value in the decoding phase, instead of pushing back them into the cached_key and cached_value. However, it seems to me that a correct implementation of kvcache should make the size of kvcache grow and become longer.
Maybe I do not fully understand your code, but I am looking forward to your reply.
The text was updated successfully, but these errors were encountered:
Recently I have been studying your code. However, It seems to me that your implemention will not expand the kv cache during the decoding phase. The follow code is excerpted from the function
def _concatenate_to_cache
in llama.py.In this function, we will only update
cached_key
andcached_value
with the newly-generated key/value in the decoding phase, instead of pushing back them into thecached_key
andcached_value
. However, it seems to me that a correct implementation of kvcache should make the size of kvcache grow and become longer.Maybe I do not fully understand your code, but I am looking forward to your reply.
The text was updated successfully, but these errors were encountered: