-
-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration and/or system tests #144
Comments
I'm going to try implementing a simple https://cucumber.io/ test. It might work well here, but if it doesn't add value we don't have to use it:
|
@mattlindsey Do you envision that this would actually run in CI? I'm also struggling a bit figuring out what value these feature tests would bring to this library? |
If you run them in CI I think you'd catch errors sooner, like I think there's a gem dependency error now. (Might be wrong.) |
I hope @technicalpickles doesn't mind that I pull him in. There was a mention of executing Jupyter notebooks or README code snippets in Discord. Would you have to have any thoughts here? |
Also see where I implemented a couple of tests here to give a better idea: #145 |
And for a wider range of testing it would be good if someone implemented Langchain::LLM::HuggingFace#complete. |
I was doing a course on deeplearning.ai, it was talking about how if you set a
Yep! Here is what I suggested: I've been thinking about getting the code in the README and in examples to be run as part of CI. did something like that for openfeature-sdk (open-feature/ruby-sdk#40) ... I think the challenge for the README is making sure the fragment is complete enough to run, as well as having the right environment variables to make the call. In both cases, I'm starting to think we could get pretty far by stubbing the response from the LLM. That could help cover everything leading up to the request. The most common way I've done this is with VCR and/or webmock. The main downside there it doesn't capture changes that happen with the remote end, obviously. If we are using existing libraries to do those interactions though, it's probably a pretty good tradeoff. |
Thanks @technicalpickles. I'm going to try the method you used in open-feature to run our README examples with temperature=0. It will still have to be an optional script or spec, since it would require env variables - like you said. When you say stubbing the response from the LLM, do you mean like below? Or recording responses with VCR for every example? Because the idea was to run everything against live services. https://github.com/andreibondarev/langchainrb/blob/9dd8add0703c8cc9f5d250ee7a3559f45053d7e3/spec/langchain/llm/openai_spec.rb#L68 |
We need to figure out whether we'd like these tests to run against real (non-mocked) services, with actual API keys/creds. If yes -- then let's go with Jupyter notebooks. These would need to be run locally by a dev, we can't run these in CI because it costs $$$ to run. If not -- then these tests/scripts should be in Rspec. We have a pretty large testing matrix: think "Num of vectorsearch DBs X Num of LLMs X", i.e. we're saying that any LLM in the project (that supports @mattlindsey @technicalpickles Thoughts? |
That is what I meant, yeah. I think we can get still get some value out of having everything but the LLM response, since there are plenty of other moving parts.
If that is going to require providing an API key anyways, so may as well do it in plain ruby. We could even have a rspec tag to indicate something uses the API, and have that automatically included/excluded when the ENV['OPENAI_API_KEY'] is present. describe Whatever, :openai_integration => true do
it "works" do
# ...
end
end Then run:
To exclude by defaut, we can add
Saying that, it makes me wonder if they have any policies for open source development? OpenAI is also on Azure, and Azure has Open Source credits we could apply to https://opensource.microsoft.com/azure-credits/ |
@andreibondarev Can Jupyter notbooks run ruby? I'm thinking rspec in a separate 'integration' directory with the tags described by Josh sounds good. Looks like Azure takes 3-4 weeks to reply in case you want to request to use their 'OpenAI Azure' (https://learn.microsoft.com/en-us/azure/cognitive-services/openai/overview). But would that mean a new LLM class in langchainrb? I don't see any ruby examples in the documentation so I'm not sure. |
I saw it in the boxcars gem, which is in the same space as this gem: |
@technicalpickles I added a similar 'getting started' Jupyter notebook in #185, but it was somewhat difficult to get working and seems to give errors sometimes. Take a look if you want, but I don't want to waste your time! |
I did get a notebook working, but it's very picky and may not be worth the effort to maintain. |
I think we need the ability to add and run some 'Integration' tests that exercise interactions in high level components and use actual apis and keys. They would be run only on request and could be run before each release.
Start with a simple question to ChainOfThought with openai like in the README, with expectation that the result should be similar but not exactly equal to the result given in the README, since I assume the ai can respond slightly differently each time the test is called.
The text was updated successfully, but these errors were encountered: