## Degree of autonomy of an LLM application
1. Have an LLM decide the output of a step.
2. Have an LLM decide the next step to take.
3. Have an LLM decide what steps are available to take.

## Cognitive architecture levels
An LLM cognitive architecture can be defined as a recipe for the steps to be taken by an LLM application.

* 0-3: Human driven
* 4-7: Agent

0. Code: Not an LLM cognitive architecture. Just regular software.
1. LLM call: Makes use of an LLM for achieving a specific task, say translating or summarizing a piece of text.
2. Chain: Use of multiple LLM calls in a predefined sequence (static sequence steps). Ex: text-to-SQL, first LLM call to generate a SQL query from NL query and database
contents provided by dev, then another LLM call to write an explanation of the query appropriate for a nontechnical user.
3. Router: LLM define what steps to take (conditionally), these steps is predefined by dev. Ex, RAG , tool call,... using an LLM to evaluate each
incoming query and decide which index (tool, document) it should use for that particular query. Before the advent of LLMs,
the usual way of solving this problem would be to build a classifier model.



## Agent architectures
Agent is "something that acts". Act details:
* Acting requires some capacity for deciding what to do.
* Deciding what to do implies having access to more than one possible course of action. After all, a decision without options is no decision at all.
* In order to decide, the agent also needs access to information about the external environment (anything outside of the agent itself).

So an agentic LLM application must be one that uses an LLM to pick from one or more possible courses of action. Given some context about
 the current state of the world or some desired next state, these attributes are usually implemented by mixing two prompting techniques:
 Tool calling and Chain-of-thought.

What makes the agent architecture different from the architectures discussed above is concept of the LLM-driven loop: Planning an action(s)
and Executing said action(s). This is call ReAct. (see more in graph.ipynb)






### Always Calling a Tool First
In the standard agent architecture, the LLM is always called upon to decide what tool to call next. It gives the LLM ultimate flexibility to adapt the behavior of the application to each user query that comes in. But it come with a cost of unpredictability.

Some time dev know that search tool should always be called first, as it will skip the first LLM call and prevent the LLM from erroneously deciding it doesn’t need to call the tools. So do this if your prompt always have something like "call search tool first then answer".

### Dealing with many tools
When given many tools (say, more than 10) the planning performance (that is, choosing the right tool) starts to suffer.
* Solutions: Use a RAG step to preselect the most relevant tools.




### Reflection aka self-critique
Creation of a loop between a creator prompt and a reviser prompt. We create two nodes, generate and reflect. We can run loop in fix times or
let reflect node decide when to finish.
> If you were writing a code-generation agent, you could have a step before reflect that would run the code through a linter or compiler and report any errors as input to reflect.


### Multi-agents
Architects:
1. Network: Each agent can communicate with every other agent. Any agent can decide which other agent is to be executed next.
2. Supervisor: Each agent communicates with a single agent, called the supervisor, called the supervisor. The supervisor agent makes decisions on which agent (or agents) should be called next.
3. Hierarchical: Supervisor of supervisors
4. Custom

## Design keys
1. Streaming/intermediate output
2. Structured output: Low temperature is usually a good fit for that.
3. Human in the loop
4. Double texting modes