Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenges of Large Language Models #48

Closed
jan-janssen opened this issue Jun 22, 2024 · 6 comments
Closed

Challenges of Large Language Models #48

jan-janssen opened this issue Jun 22, 2024 · 6 comments

Comments

@jan-janssen
Copy link
Owner

Open Source
Unfortunately most llama based and other free models fail to work with the tools defined by langchain. It works for single functions but already the current complexity of langsim they struggle.

ChatGPT

  • ChatGPT 3.5 turbo can execute the calculation of one nobel metal but fails to execute a loop over all nobel metals. It seems like the abstract structure of a loop which is implicitly defined is not clear to ChatGPT 3.
  • ChatGPT 4 works fine with one state available in branch working_with_chatgpt4 but fails with the current main branch with an JSONDecodeError.
  • ChatGPT 4o works fine with the latest changes - in particular the state in branch working_with_chatgpt4o. The interesting part is when it comes to the implicit loop ChatGPT 4 executes the steps (generate the crystal structure, equilibrate it and calculate the bulk modulus) for one element out of the nobel metals and then moves to the next, in contrast Chat GPT 4o first executes the first step of generating the crystal structure for all elements, then equilibrates all resulting structures and finally calculates the bulk modulus for all equilibrated structures.

The behaviour seems to be somewhat reproducible so I wanted to quickly summarise it here.

@jan-janssen
Copy link
Owner Author

Different large language models can be tested by setting environment variables:

Antropic:

import os, getpass
os.environ["LANGSIM_PROVIDER"] = "anthropic" 
os.environ["LANGSIM_API_KEY"] = os.environ['ANTHROPIC_API_KEY']
os.environ["LANGSIM_MODEL"] = "claude-3-5-sonnet-20240620"

OpenAPI:

import os, getpass
os.environ["LANGSIM_API_KEY"] = os.environ['OPENAI_API_KEY']
os.environ["LANGSIM_MODEL"] = "gpt-4o"

KISSKI:

import os, getpass
os.environ["LANGSIM_API_KEY"] = os.environ['KISSKI_API']
os.environ["LANGSIM_API_URL"] = "https://chat-ai.academiccloud.de/v1"
os.environ["LANGSIM_MODEL"] = "meta-llama-3-8b-instruct"

@ltalirz ltalirz mentioned this issue Jul 8, 2024
@fraricci
Copy link
Collaborator

Hi there! I was looking at the llm.py code and I'm not sure the above mentioned way of changing the model can actually work.
Looking at the get_executor() it seems to me that the model is hard coded and set depending on the provider.

@jan-janssen
Copy link
Owner Author

It is only hard coded for the case when it is None as far as I can see.

@fraricci
Copy link
Collaborator

fraricci commented Jul 19, 2024

I see, right, I can specify a model in the executor. But, how can it reads it from the env variable then?

@fraricci
Copy link
Collaborator

ok, I think the answer to my question is in the magics.py ;-)

@jan-janssen
Copy link
Owner Author

The benchmarking of LLMs is now also discussed in the developer section of the website.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants