Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support for Azure AI Studio #779

Merged

Conversation

riedgar-ms
Copy link
Collaborator

@riedgar-ms riedgar-ms commented Apr 26, 2024

Add support for models deployed in Azure AI Studio. This has been done by combining the code for OpenAI models, and the same provided by Azure AI Studio.

Since there are a bunch of common test cases which need to run with multiple models, start refactoring those a bit as well (and hook the Azure OpenAI tests into this). This isn't using the same mechanism as the testing of local models, since we won't be running into trouble with fitting multiple LLMs on a single machine.

@riedgar-ms riedgar-ms requested review from slundberg and Harsha-Nori and removed request for slundberg April 26, 2024 15:06
Copy link
Collaborator Author

@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions about how best to implement this

def _generator(self, prompt, temperature: float):
# Initial parts of this straight up copied from OpenAIChatEngine

# The next loop (or one like it) appears in several places,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on this?

This is a straight-up copy of what is in _open_ai.py, but there are almost-but-not-quite the same versions elsewhere. Chopping up the internal 'guidance state string' into individual chat messages feels like a task that all chat model will need - and hence should be centrall provided. Individual engines can then turn that format into whatever they precisely need.

def __init__(
self,
*,
tokenizer,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never try setting the tokeniser, and it appears that it eventually defaults to GPT2. I don't quite see why a remote model like this would even need it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so theoretically for token healing. However, I have a feeling that trying to figure out what tokeniser to use will be an exercise in fragility.

self._reset_shared_data(prompt[:pos], temperature)

# Use cache only when temperature is 0
if temperature == 0:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caching logic is a bit of a concern - at least for models where T=0 doesn't actually get determinism. And in general, it means that a bunch of our tests might not quite be doing what we think they're doing, because they may just be hitting the cache.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, the less I like the idea of a disk-based cache. In some ways it's worse on the OpenAI side, where both AzureOpenAI and OpenAI will wind up sharing the same cache.

How much speed up does it really give, compared to the Heisenbug potential it represents?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most models have more reliable temp=0 determinism now, but agree that perhaps sharing between AzureOpenAI and OpenAI is problematic (though there shouldn't be differences between the two APIs in theory?). I do think caching is a nice feature to have in general, as production workflows often have shared inputs coming in that save time and money to reuse.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a clear_cache argument to the constructor. I think I should add this to the OpenAI side as well, in a separate PR.

guidance/models/_azureai_studio.py Outdated Show resolved Hide resolved
guidance/models/_azureai_studio.py Outdated Show resolved Hide resolved
req = urllib.request.Request(self._endpoint, body, headers)

response = urllib.request.urlopen(req)
result = json.loads(response.read())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic around result is another thing which might be model specific.

@codecov-commenter
Copy link

codecov-commenter commented Apr 26, 2024

Codecov Report

Attention: Patch coverage is 22.89157% with 64 lines in your changes are missing coverage. Please review.

Project coverage is 62.05%. Comparing base (8caf911) to head (2327c8a).

Files Patch % Lines
guidance/models/_azureai_studio.py 21.95% 64 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #779      +/-   ##
==========================================
+ Coverage   55.42%   62.05%   +6.63%     
==========================================
  Files          55       56       +1     
  Lines        4076     4159      +83     
==========================================
+ Hits         2259     2581     +322     
+ Misses       1817     1578     -239     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

assert isinstance(lm, models.AzureAIStudioChat)
lm.engine.cache.clear()

# No "system" role for Mistral?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me unhappy

@@ -58,13 +58,23 @@ jobs:
python -c "import torch; assert torch.cuda.is_available()"
- name: Test with pytest
env:
# Configure endpoints
# Configure endpoints for Azure OpenAI
AZUREAI_CHAT_ENDPOINT: ${{ secrets.AZUREAI_CHAT_ENDPOINT }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably change the 'ENDPOINT' variables here to be vars rather than secrets

AZUREAI_CHAT_ENDPOINT: ${{ secrets.AZUREAI_CHAT_ENDPOINT }}
AZUREAI_CHAT_KEY: ${{ secrets.AZUREAI_CHAT_KEY }}
AZUREAI_CHAT_MODEL: ${{ secrets.AZUREAI_CHAT_MODEL }}
AZUREAI_COMPLETION_ENDPOINT: ${{ secrets.AZUREAI_COMPLETION_ENDPOINT }}
AZUREAI_COMPLETION_KEY: ${{ secrets.AZUREAI_COMPLETION_KEY }}
AZUREAI_COMPLETION_MODEL: ${{ secrets.AZUREAI_COMPLETION_MODEL }}
# Configure endpoints for Azure AI Studio
AZURE_AI_STUDIO_PHI3_ENDPOINT: ${{ vars.AZURE_AI_STUDIO_PHI3_ENDPOINT }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Harsha-Nori we will need to work together on getting these into the repository

Comment on lines +59 to +61
("system", b"<|im_start|>system\n"),
("user", b"<|im_start|>user\n"),
("assistant", b"<|im_start|>assistant\n"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do AzureAI models uniformly use the same role tags across their models? I don't think we can hard code a check for these start_bytes in this class

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't coming from the model, surely? They're coming from guidance as it sends its current prompt back into this class, to be formatted and forwarded on to the model. Or.... have I managed to miss another action-at-a-distance part of guidance?

Comment on lines 9 to 10
print(f"{dir(lm.engine.tokenizer)=}")
assert hasattr(lm.engine.tokenizer,"sp_model")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to have this assert for every model -- not every tokenizer is sentencepiece based.

@riedgar-ms riedgar-ms changed the title [WIP][Feature] Support for Azure AI Studio [Feature] Support for Azure AI Studio May 2, 2024
@riedgar-ms riedgar-ms merged commit 631ff1a into guidance-ai:main May 6, 2024
104 checks passed
@riedgar-ms riedgar-ms deleted the riedgar-ms/azure-ai-studio-support-01 branch May 6, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants