BEAMEngineUpdates_1.0 #496

keithclift24 · 2024-04-09T05:42:04Z

Refines the BEAM "Compare", and "Fusion" features for improved results.

…factories.ts

vercel · 2024-04-09T05:42:08Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
big-agi-open	✅ Ready (Inspect)	Visit Preview	Apr 15, 2024 1:26am

enricoros

(pardon the spelling errors, didn't use AI for this important stuff) Thanks, feedback inlined. I need to test side-by-side into my installs, on a dozen cases and then make the subjective call. I tested for days the existing prompts in the weeks prior to release, so naturally I'm more familiar.

Review (see inline comments):

highlight: increased precision, improved table, improve ranges, and better commands
I can attest some of the changes will make a positive impact and will be merged for sure

Important: some prompts are more commanding (larger) than the baseline today in Big-AGI. The tradeoff with giving more instructions is the following: we get a more predictable and controllable output, but often at the expense of degrees of freedom in the response space. Meaning we get what we ask for, but we take away the chance of LLMs performing some operation that would be surprising and useful to us.

This happens in:

the Table has 2 columns that are fixed, and set a "foundation" to build the other columns upon
the Fusion's user prompt is larger. This will make for better fusions but not all the time. Example below:

Fusion.userPrompt contains ".. ensuring the narrative is comprehensive, nuanced.. ". Now, imagine the first prompt of the user was to summarize a PDF. The Fusion user prompt will "undo" the summarization request, and instead fuse the elements in a comprehensive and nuanced way (and predictably will be "less" of the summary that the user expects).

Sorry for the length, but I owe you some explanations on why I'm not merging right away, as you have some very good work here. I have been thinking on this space at length, and I am glad I can have this sort of discussion with a peer.

Future: 1. (likely) selective merge, but need to get to a few more important issues first, 2. another approach would be to have a query compiler, which takes a good instruction (like yours!) and the initial user prompts, and reformulates the fusion instruction to tailor it to the query of the user. I have the feeling that 2 could work. 3. I will need to hire someone to work on this, so we can have the best fusion, compilers, and stores, etc.

Now I'll go to merge it into one of my private builds to test.

enricoros · 2024-04-09T06:42:51Z

src/modules/beam/gather/instructions/beam.gather.factories.ts

+| R2       | ...           | ...             | ...                         | ...                         | ... | ...   |
+| ...      | ...           | ...             | ...                         | ...                         | ... | ...   |
+| RN       | ...           | ...             | ...                         | ...                         | ... | ...   |
+Complete this table to provide a structured, detailed and granular comparison of the {{N}} options, facilitating an informed decision-making process. Finally, are careful review of the results, 


Are -> After?

Yes, "after"

enricoros · 2024-04-09T06:46:47Z

src/modules/beam/gather/instructions/beam.gather.factories.ts

-Synthesize the perfect response that merges the key insights and provides clear guidance or answers based on the collective intelligence of the alternatives.`.trim(),
+        Your task is to orchestrate a synthesis of elements from {{N}} response alternatives, derived from separate LLMs, each powered by unique architectures and training paradigms. Your role involves:
+
+        Analyzing the diverse array of responses to unearth common themes, address contradictions, exclude inaccuracies, and spotlight unique insights and content. 


Note, I'll need to remove spacing in the lines as the ` ... ` blocks keep literal indentation (all the spaces at the start). I've also just found this package, "https://www.npmjs.com/package/dedent" that can do it. I'll take care of this.

enricoros · 2024-04-09T06:47:47Z

src/modules/beam/gather/instructions/beam.gather.factories.ts

-Your response should integrate the most relevant insights from these inputs into a cohesive and actionable answer.
-
-Synthesize the perfect response that merges the key insights and provides clear guidance or answers based on the collective intelligence of the alternatives.`.trim(),
+        Your task is to orchestrate a synthesis of elements from {{N}} response alternatives, derived from separate LLMs, each powered by unique architectures and training paradigms. Your role involves:


👍 like the improved precision of your commands

enricoros · 2024-04-09T06:49:44Z

src/modules/beam/gather/instructions/beam.gather.factories.ts


-The checklist should contain no more than 3-9 items orthogonal items, especially points of difference, in a single brief line each (no end period).
+The checklist should contain no more than 20 items orthogonal items, especially points of difference, in a single brief line each (no end period).


3-9 was too low and vague, 20 possibly too much, depends on the scope of the answer. would be good to give a "sizing" of the checklist that's commensurate to the input, so for an easy job (a simple joke) you get 5 options, and for a legal doc you get 15.

Agree, after testing 20 is a bit much. Could have it decide number based on its own given assessment. Did you try no limit?

Yes, tried and the models usually don't have a "scale" to refer to. Usually you get ~10 options. For a "hello" fusion, or a legal document.

enricoros · 2024-04-09T06:50:01Z

src/modules/beam/gather/instructions/beam.gather.factories.ts

@@ -122,44 +129,50 @@ The final output should reflect a deep understanding of the user's preferences a
    addLabel: 'Add Breakdown',
    cardTitle: 'Evaluation Table',
    Icon: TableViewRoundedIcon,
-    description: 'Analyzes and compares AI responses, offering a structured framework to support your response choice.',
+    description: 'Analyzes and compares AI responses, offering a structured framework to support your response choice. Model names are hidden and coded (R1, R2, etc.) to remove potential bias.',


Love the explanation of coding of the model names.

enricoros · 2024-04-09T06:53:03Z

src/modules/beam/gather/instructions/beam.gather.factories.ts

 Now that you have reviewed the {{N}} alternatives, proceed with the following steps:

-1. **Identify Criteria:** Define the most important orthogonal criteria for evaluating the responses. Identify up to 2 criteria for simple evaluations, or up to 6 for more complex evaluations. Ensure these criteria are distinct and relevant to the responses provided.
+1. **Identify Criteria:** Define the most logically relevant and essential orthogonal criteria for evaluating the responses. Always include Accuracy and Pertinence as primary criteria. 


Do you think Accuracy and Pertinence are a must? It's a good idea, but have to see if adding this constraint removes degrees of freedom in the other criteria.

Selecting Accuracy and Pertinence defining those 2 as the most important vector in any message decomposition. It's possible that they are, and it's important to set those 2 vectors for setting a reliable and repeatable framework and not leave too much room to the RNG.

There's some brilliance to this - need to test.

( Accuracy may need to be defined further - Pertinence has probably a more narrow definition, good)

I spent a lot of time debating with AI itself over what really matters to get a net higher quality fusion response. Relevancy never quite fit, and I think pertinence nails it. Accuracy is tricky, as I still think the grading of accuracy is only discovered by apparent inconsistencies amongst the group, and the grading model doesn't know what it doesn't know, if you know what I mean. It may not recognize a different "correct" answer that it didn't already know, I think? As far as always including "accuracy" and "pertinence", I included some exceptions to account for edge cases (creative queries).

Accuracy is tricky also because it can mean different things to different models. I'm almost leaning towards preferring Pertinence over accuracy.

Yeah I need to think more. "correctness"? Here's some we can consider:

• Precision: Exactness in measurement or performance. • Exactness: The degree of conformity to a standard or truth. • Fidelity: Faithfulness to the original or to a standard. • Veracity: Conformity to facts; accuracy. • Validity: The extent to which a concept, conclusion, or measurement is well-founded and corresponds accurately to the real world. • Credibility: Worthy of belief or confidence. • Accuracy: Correctness or precision of information or measurement. • Conformity: Agreement or compliance with standards, rules, or laws. • Consistency: Uniformity or steadiness in quality or performance. • Exactitude: The quality of being exact or precise in detail. • Meticulousness: Extreme or excessive care in the consideration or treatment of details. • Rigor: Strictness, severity, or thoroughness in maintaining standards. • Faithfulness: Accuracy in reproducing a sound or image. • Correctness: Free from error; in accordance with fact or truth. • Adherence: The quality of sticking strictly to standards, rules, or practices. • Exactness: The quality of being very accurate and precise. • Precision: Exactness in the language model’s responses to queries. • Fidelity: Faithfulness of the model’s output to the facts or source material. • Veracity: Adherence to truth and accuracy of factual information provided. • Relevance: The degree to which the model’s responses pertain to the given tasks or questions. • Adaptability: The model’s ability to adjust its responses based on new information or feedback. • Critical Thinking: The model’s ability to analyze, evaluate, and synthesize information in its responses. • Innovation: Originality and creativity in generating solutions or responses. • Factual Accuracy: Correctness of factual statements, essential for trustworthiness. • Logical Reasoning: Clear and sound reasoning in constructing arguments or explanations. • Problem-Solving Skills: Effectiveness in identifying and proposing solutions. • Detail Orientation: Attention to and incorporation of significant details in responses. • Coherence: Logical consistency and clarity in responses, ensuring they are understandable and follow a logical flow. • User Engagement: The ability to maintain the user’s interest and promote further interaction through engaging and relevant content.

• Comprehensiveness: The extent to which the model can cover all relevant topics or knowledge areas for a given task.
• Synthesis Ability: The model’s capacity to integrate and combine information from various sources into a coherent whole, showing depth of understanding.
• Clarity: The ease with which users can understand the model’s responses, emphasizing clear and accessible language.
• Insightfulness: The depth of understanding and the ability to provide novel insights or perspectives in responses.
• Responsiveness: The precision with which the model addresses and adapts to the specific parts of a prompt or question, including the nuanced understanding of user intent.
• Emotional Intelligence: The model’s ability to recognize and appropriately respond to emotional cues in text, demonstrating sensitivity to the emotional context.
• Technical Proficiency: The accuracy and depth of knowledge in responses to queries requiring specialized understanding or technical expertise.

enricoros · 2024-04-09T06:56:07Z

src/modules/beam/gather/instructions/beam.gather.factories.ts

-3. **Generate Table:** Organize your analysis into a table. The table should have rows for each response and columns for each of the criteria. Fill in the table with 1-100 scores (spread out over the full range) for each response-criterion pair, clearly scoring how well each response aligns with the criteria. 
+3. **Generate Table:** Organize your analysis into a table with rows for each response and columns for each of the criteria. Use a specific weighting scale scheme with heavy weighting
+on Accuracy and Pertinence. Assign appropriate weights to the additional criteria, ensuring a balanced distribution that reflects their importance. Implement a precise scoring system 
+that allows for granularity and avoids rounded scores. Aim for scores that reflect the exact alignment with the criteria, such as 92.3 or 87.6, rather than rounded figures like 90 or 85. 


Good job in better defining the distribution.

This is an another area where it could be tightened up lengthwise (and elsewhere), I don't know if "don't round" is really that important. Was just trying to yield more exact, differentiated results.

I love this one.

keithclift24 · 2024-04-09T08:00:02Z

I imagined you'd have comments 😃, and you spent time massaging the language already, since some parameters were very particular. I agree the length for some part is an issue. Notice I tried to account for creative and brevity query scenarios at the expense of tokens. From my testing it does actually a pretty good job at handling both.

enricoros · 2024-04-09T08:01:15Z

Yes, I can tell that you really did a lot of work. I need to BEAM this thread, to prioritize the changes :) I may just cherry-pick the safe parts in for now, while I test the more extensive parts (Fusion, Table)

enricoros · 2024-04-09T08:57:24Z

Merged to my personal branch.

keithclift24 added 11 commits April 8, 2024 22:34

Update BeamFusionGrid.tsx

01fa26d

Update BeamFusionGrid.tsx 1.01

703bfa7

Update table format in beam.gather.factories.ts

0e2dbf7

Update description in beam.gather.factories.ts

596469d

Update table format in beam.gather.factories.ts

7ed3475

Update table format and description in beam.gather.factories.ts

cd57a1e

Refactor table format and description in beam.gather.factories.ts

b1f2292

Update table format and description in beam.gather.factories.ts

a329e4e

Update table format and description in beam.gather.factories.ts

29f68aa

Update criteria for evaluating AI-generated responses in beam.gather.…

46dfa11

…factories.ts

Update table format and description in beam.gather.factories.ts

ce88e3b

vercel bot deployed to Preview April 9, 2024 05:43 View deployment

enricoros reviewed Apr 9, 2024

View reviewed changes

Merge branch 'enricoros:main' into BEAMEngineUpdates_1.0

66f5a64

vercel bot deployed to Preview April 15, 2024 01:26 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BEAMEngineUpdates_1.0 #496

BEAMEngineUpdates_1.0 #496

keithclift24 commented Apr 9, 2024

vercel bot commented Apr 9, 2024 •

edited

enricoros left a comment •

edited

enricoros Apr 9, 2024

keithclift24 Apr 9, 2024

enricoros Apr 9, 2024

enricoros Apr 9, 2024

enricoros Apr 9, 2024

keithclift24 Apr 9, 2024

enricoros Apr 9, 2024

enricoros Apr 9, 2024

enricoros Apr 9, 2024

keithclift24 Apr 9, 2024

enricoros Apr 9, 2024

keithclift24 Apr 9, 2024 •

edited

enricoros Apr 9, 2024

keithclift24 Apr 9, 2024

enricoros Apr 9, 2024

keithclift24 commented Apr 9, 2024

enricoros commented Apr 9, 2024

enricoros commented Apr 9, 2024


		The checklist should contain no more than 3-9 items orthogonal items, especially points of difference, in a single brief line each (no end period).
		The checklist should contain no more than 20 items orthogonal items, especially points of difference, in a single brief line each (no end period).

BEAMEngineUpdates_1.0 #496

Are you sure you want to change the base?

BEAMEngineUpdates_1.0 #496

Conversation

keithclift24 commented Apr 9, 2024

vercel bot commented Apr 9, 2024 • edited

enricoros left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keithclift24 Apr 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keithclift24 commented Apr 9, 2024

enricoros commented Apr 9, 2024

enricoros commented Apr 9, 2024

vercel bot commented Apr 9, 2024 •

edited

enricoros left a comment •

edited

keithclift24 Apr 9, 2024 •

edited