Challenges of Large Language Models #48

jan-janssen · 2024-06-22T07:51:52Z

Open Source
Unfortunately most llama based and other free models fail to work with the tools defined by langchain. It works for single functions but already the current complexity of langsim they struggle.

ChatGPT

ChatGPT 3.5 turbo can execute the calculation of one nobel metal but fails to execute a loop over all nobel metals. It seems like the abstract structure of a loop which is implicitly defined is not clear to ChatGPT 3.
ChatGPT 4 works fine with one state available in branch working_with_chatgpt4 but fails with the current main branch with an JSONDecodeError.
ChatGPT 4o works fine with the latest changes - in particular the state in branch working_with_chatgpt4o. The interesting part is when it comes to the implicit loop ChatGPT 4 executes the steps (generate the crystal structure, equilibrate it and calculate the bulk modulus) for one element out of the nobel metals and then moves to the next, in contrast Chat GPT 4o first executes the first step of generating the crystal structure for all elements, then equilibrates all resulting structures and finally calculates the bulk modulus for all equilibrated structures.

The behaviour seems to be somewhat reproducible so I wanted to quickly summarise it here.

The text was updated successfully, but these errors were encountered:

jan-janssen · 2024-06-26T15:04:32Z

Different large language models can be tested by setting environment variables:

Antropic:

import os, getpass
os.environ["LANGSIM_PROVIDER"] = "anthropic" 
os.environ["LANGSIM_API_KEY"] = os.environ['ANTHROPIC_API_KEY']
os.environ["LANGSIM_MODEL"] = "claude-3-5-sonnet-20240620"

OpenAPI:

import os, getpass
os.environ["LANGSIM_API_KEY"] = os.environ['OPENAI_API_KEY']
os.environ["LANGSIM_MODEL"] = "gpt-4o"

KISSKI:

import os, getpass
os.environ["LANGSIM_API_KEY"] = os.environ['KISSKI_API']
os.environ["LANGSIM_API_URL"] = "https://chat-ai.academiccloud.de/v1"
os.environ["LANGSIM_MODEL"] = "meta-llama-3-8b-instruct"

fraricci · 2024-07-19T07:47:17Z

Hi there! I was looking at the llm.py code and I'm not sure the above mentioned way of changing the model can actually work.
Looking at the get_executor() it seems to me that the model is hard coded and set depending on the provider.

jan-janssen · 2024-07-19T07:59:51Z

It is only hard coded for the case when it is None as far as I can see.

fraricci · 2024-07-19T08:05:04Z

I see, right, I can specify a model in the executor. But, how can it reads it from the env variable then?

fraricci · 2024-07-19T08:12:11Z

ok, I think the answer to my question is in the magics.py ;-)

jan-janssen · 2024-09-13T12:19:56Z

The benchmarking of LLMs is now also discussed in the developer section of the website.

ltalirz mentioned this issue Jul 8, 2024

test suite #53

Open

jan-janssen closed this as completed Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Challenges of Large Language Models #48

Challenges of Large Language Models #48

jan-janssen commented Jun 22, 2024

jan-janssen commented Jun 26, 2024

fraricci commented Jul 19, 2024

jan-janssen commented Jul 19, 2024

fraricci commented Jul 19, 2024 •

edited

Loading

fraricci commented Jul 19, 2024

jan-janssen commented Sep 13, 2024

Challenges of Large Language Models #48

Challenges of Large Language Models #48

Comments

jan-janssen commented Jun 22, 2024

jan-janssen commented Jun 26, 2024

fraricci commented Jul 19, 2024

jan-janssen commented Jul 19, 2024

fraricci commented Jul 19, 2024 • edited Loading

fraricci commented Jul 19, 2024

jan-janssen commented Sep 13, 2024

fraricci commented Jul 19, 2024 •

edited

Loading