Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding FIM finetuned model hosted on fireworks #4245

Merged
merged 5 commits into from
May 23, 2024

Conversation

hitesh-1997
Copy link
Contributor

@hitesh-1997 hitesh-1997 commented May 21, 2024

Context

  1. This PR adds the fine-tuning model setup, trained on context aware FIM examples on top of bigcode/the-stack. More details about the training dataset/ modeling/ offline evaluation etc can be found in this RFC.
  2. We wanted to iterate faster on the modeling changes, i.e. don't want to replace the model string in the client every time we change a model in the backend - So, this PR adds four variant feature flags and control feature flag and the variant based feature-flag names will be send to cody-gateway as is, where the routing to the actual fine-tuned model happens.
  3. The feature flag cody-autocomplete-fim-finetuned-model-base-flag added in the setup is treated as the base feature flag and the traffic on this feature flag will decide the traffic of the whole experiment. The traffic on this feature will be divided further into the variant & control feature flags.

Test plan

Traffic split sanity:

I have tested the approach of feature flags introduced in this PR on random generated 100k userids offline and verified that it produces near equal traffic split among the variants and control. For anyone want to reproduce this results, can be done via this script.

New modeling changes:

Local testing

Copy link
Member

@valerybugakov valerybugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -83,8 +95,10 @@ const MODEL_MAP = {
'llama-code-13b': 'fireworks/accounts/fireworks/models/llama-v2-13b-code',

// Fine-tuned model mapping
'fireworks-completions-fine-tuned':
'fireworks/accounts/sourcegraph/models/codecompletion-mixtral-rust-152k-005e',
'fim-fine-tuned-model-variant-1': FIREWORKS_FIM_FINE_TUNED_MODEL_1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have constants already, we can:

Suggested change
'fim-fine-tuned-model-variant-1': FIREWORKS_FIM_FINE_TUNED_MODEL_1,
[FIREWORKS_FIM_FINE_TUNED_MODEL_1]: FIREWORKS_FIM_FINE_TUNED_MODEL_1,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Comment on lines 157 to 160
async function resolveFinetunedModelProviderFromFeatureFlags(): Promise<{
provider: string
model?: FireworksOptions['model'] | AnthropicOptions['model']
} | null> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep return types in sync automatically:

Suggested change
async function resolveFinetunedModelProviderFromFeatureFlags(): Promise<{
provider: string
model?: FireworksOptions['model'] | AnthropicOptions['model']
} | null> {
async function resolveFinetunedModelProviderFromFeatureFlags(): ReturnType<
typeof resolveDefaultProviderFromVSCodeConfigOrFeatureFlags
> {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow nice, didn't about this.
Changed as per suggestion


if (finetunedFIMModelExperiment) {
// The traffic in this feature flag is interpreted as a traffic allocated to the fine-tuned experiment.
return await resolveFinetunedModelProviderFromFeatureFlags()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return await resolveFinetunedModelProviderFromFeatureFlags()
return resolveFinetunedModelProviderFromFeatureFlags()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 157 to 160
async function resolveFinetunedModelProviderFromFeatureFlags(): Promise<{
provider: string
model?: FireworksOptions['model'] | AnthropicOptions['model']
} | null> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic

// Enable various feature flags to experiment with FIM trained fine-tuned models via Fireworks
CodyAutocompleteFIMFineTunedModelBaseFeatureFlag = 'cody-autocomplete-fim-finetuned-model-base-flag',
CodyAutocompleteFIMFineTunedModelControl = 'cody-autocomplete-fim-finetuned-model-control',
CodyAutocompleteFIMFineTunedModelVariant1 = 'cody-autocomplete-fim-finetuned-model-variant1',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we standardize the naming (prefix, variantX, spelling of fine-tuned) between this and internal/completions/client/fireworks/fireworks.go so we don't have to redeploy due to typos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally makes sense, changed the string names so that they are inline with names at other places.
I have followed the following convention here: _<variant_name>, eg: cody-autocomplete-fim-fine-tuned-model-variant-1 Does this look alright.

@@ -69,6 +68,19 @@ const PROVIDER_IDENTIFIER = 'fireworks'
const EOT_STARCODER = '<|endoftext|>'
const EOT_LLAMA_CODE = ' <EOT>'

// Fireworks hosted model identifier strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the actual models backing those "virtual" identifiers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add a link to sourcegraph repo where the acutal mapping is defined ?

case FIREWORKS_FIM_FINE_TUNED_MODEL_4: {
// We use llama3 8b and mixtral 8x7b variants for the fine-tuning model which support 8_192, 32_768 tokens respectively.
// Take a buffer of 1000 tokens
return 7192
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is going to increase the number of context we send by a lot which might have an impact on the comparison and the latency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this philipp. Earlier I did a load test on different context token length from a GCP VM in us-central-1a and got ~100ms delta for p75, so I went ahead with this context length. But I realise the user is going to query this from their machine and bigger context can lead to increased latency when client is local.
changing this back to 2048 same as others.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be too concerned about the user/cpu overheard. 100ms for a 4x in tokens could be great tread off, though. The problem we ran into the last time was that it reduced throughput a lot as well. Not sure how this affects the setup.

@hitesh-1997 hitesh-1997 merged commit 4b9fb4c into main May 23, 2024
19 checks passed
@hitesh-1997 hitesh-1997 deleted the hitesh/finetuned-fim-model-integration branch May 23, 2024 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants