-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding FIM finetuned model hosted on fireworks #4245
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -83,8 +95,10 @@ const MODEL_MAP = { | |||
'llama-code-13b': 'fireworks/accounts/fireworks/models/llama-v2-13b-code', | |||
|
|||
// Fine-tuned model mapping | |||
'fireworks-completions-fine-tuned': | |||
'fireworks/accounts/sourcegraph/models/codecompletion-mixtral-rust-152k-005e', | |||
'fim-fine-tuned-model-variant-1': FIREWORKS_FIM_FINE_TUNED_MODEL_1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have constants already, we can:
'fim-fine-tuned-model-variant-1': FIREWORKS_FIM_FINE_TUNED_MODEL_1, | |
[FIREWORKS_FIM_FINE_TUNED_MODEL_1]: FIREWORKS_FIM_FINE_TUNED_MODEL_1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
async function resolveFinetunedModelProviderFromFeatureFlags(): Promise<{ | ||
provider: string | ||
model?: FireworksOptions['model'] | AnthropicOptions['model'] | ||
} | null> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep return types in sync automatically:
async function resolveFinetunedModelProviderFromFeatureFlags(): Promise<{ | |
provider: string | |
model?: FireworksOptions['model'] | AnthropicOptions['model'] | |
} | null> { | |
async function resolveFinetunedModelProviderFromFeatureFlags(): ReturnType< | |
typeof resolveDefaultProviderFromVSCodeConfigOrFeatureFlags | |
> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow nice, didn't about this.
Changed as per suggestion
|
||
if (finetunedFIMModelExperiment) { | ||
// The traffic in this feature flag is interpreted as a traffic allocated to the fine-tuned experiment. | ||
return await resolveFinetunedModelProviderFromFeatureFlags() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return await resolveFinetunedModelProviderFromFeatureFlags() | |
return resolveFinetunedModelProviderFromFeatureFlags() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
async function resolveFinetunedModelProviderFromFeatureFlags(): Promise<{ | ||
provider: string | ||
model?: FireworksOptions['model'] | AnthropicOptions['model'] | ||
} | null> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic
// Enable various feature flags to experiment with FIM trained fine-tuned models via Fireworks | ||
CodyAutocompleteFIMFineTunedModelBaseFeatureFlag = 'cody-autocomplete-fim-finetuned-model-base-flag', | ||
CodyAutocompleteFIMFineTunedModelControl = 'cody-autocomplete-fim-finetuned-model-control', | ||
CodyAutocompleteFIMFineTunedModelVariant1 = 'cody-autocomplete-fim-finetuned-model-variant1', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we standardize the naming (prefix, variantX, spelling of fine-tuned) between this and internal/completions/client/fireworks/fireworks.go so we don't have to redeploy due to typos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally makes sense, changed the string names so that they are inline with names at other places.
I have followed the following convention here: _<variant_name>, eg: cody-autocomplete-fim-fine-tuned-model-variant-1
Does this look alright.
@@ -69,6 +68,19 @@ const PROVIDER_IDENTIFIER = 'fireworks' | |||
const EOT_STARCODER = '<|endoftext|>' | |||
const EOT_LLAMA_CODE = ' <EOT>' | |||
|
|||
// Fireworks hosted model identifier strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to the actual models backing those "virtual" identifiers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add a link to sourcegraph repo where the acutal mapping is defined ?
case FIREWORKS_FIM_FINE_TUNED_MODEL_4: { | ||
// We use llama3 8b and mixtral 8x7b variants for the fine-tuning model which support 8_192, 32_768 tokens respectively. | ||
// Take a buffer of 1000 tokens | ||
return 7192 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is going to increase the number of context we send by a lot which might have an impact on the comparison and the latency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing this philipp. Earlier I did a load test on different context token length from a GCP VM in us-central-1a and got ~100ms delta for p75, so I went ahead with this context length. But I realise the user is going to query this from their machine and bigger context can lead to increased latency when client is local.
changing this back to 2048 same as others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be too concerned about the user/cpu overheard. 100ms for a 4x in tokens could be great tread off, though. The problem we ran into the last time was that it reduced throughput a lot as well. Not sure how this affects the setup.
Context
cody-autocomplete-fim-finetuned-model-base-flag
added in the setup is treated as the base feature flag and the traffic on this feature flag will decide the traffic of the whole experiment. The traffic on this feature will be divided further into the variant & control feature flags.Test plan
Traffic split sanity:
I have tested the approach of feature flags introduced in this PR on random generated
100k
userids offline and verified that it produces near equal traffic split among the variants and control. For anyone want to reproduce this results, can be done via this script.New modeling changes:
Local testing