Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to design a system to provide long answers #102

Closed
drawal1 opened this issue Sep 1, 2023 · 6 comments
Closed

How to design a system to provide long answers #102

drawal1 opened this issue Sep 1, 2023 · 6 comments

Comments

@drawal1
Copy link
Contributor

drawal1 commented Sep 1, 2023

I am not sure how to train DSPy to synthesize long, detailed answers such as those are required for "how to" questions. So far, I have tried training on long examples with RAG and SimplifiedBaleen; and answer_exact_match frac set to different values ranging from 0.5-0.8

In all cases, the system truncates the answers or gives incomplete answers, even when the answers are retrieved fully formed in the context

What is needed is a way to retrieve the answers to the various questions, and then instead of selecting one answer (which is what Predict does), to build/assemble the final answer from the answers to all the sub-questions

I tried doing this using manual prompt engineering and langchain and it works great! Unfortunately, its complicated and requires special handling for questions that are more suitable for "factoid" answers

Are there any examples to demonstrate synthesizing long, detailed answers?

@okhat
Copy link
Collaborator

okhat commented Sep 1, 2023

Thanks @drawal1. We have that as a planned tutorial in the README (see section 4, within Advanced Demos).

It's pretty simple actually but I agree with you having a concrete example will go a long way. Will see who on our end has cycles to do this quickly

@okhat
Copy link
Collaborator

okhat commented Sep 1, 2023

Btw I've been thinking of long-form answers for factoid questions, like "David Gregory inherited Kinnairdy castle. Kinnairdy castle has 5 floors. Therefore, the castle that David Gregory inherited has 5 floors."

You mention "how to" questions though. If you think that will look different from what I have in mind, do you have one full example question + answer?

@drawal1
Copy link
Contributor Author

drawal1 commented Sep 1, 2023

Here is a how-to question from my specific domain with the exact answer and the truncated answer from DSPy:

Question: How do I upgrade my RadiantOne Platform?

DSPy truncated answer:
To upgrade your RadiantOne platform, please follow these steps:

  1. Determine your current version: Check which version of RadiantOne you are currently using. This information will help you determine the appropriate upgrade path.

  2. Access the upgrade guide: You can find the upgrade guide for upgrading from version 7.2 to 7.4 at this link: [RadiantOne Upgrade Guide v7.2 to v7.4](https://support.radiantlogic.com/hc/en-us/articles/790844406261

Exact answer:
To upgrade your RadiantOne platform, please follow these steps:
1. Determine your current version: Check which version of RadiantOne you are currently using. This information will help you determine the appropriate upgrade path.
2. Access the upgrade guide: You can find the upgrade guide for upgrading from version 7.2 to 7.4 at this link: RadiantOne Upgrade Guide v7.2 to v7.4. This guide provides step-by-step instructions on how to perform the upgrade.
4. Contact your Radiant Logic Account Representative: If you are using version 7.2 or 7.3 and require upgrade assistance, please reach out to your Radiant Logic Account Representative. They will be able to provide guidance and support for your specific upgrade needs.

        Remember to always backup your data before proceeding with any upgrades to ensure the safety of your information.
        If you have any further questions or need additional assistance, feel free to ask.

        Examples of an upgrade are moving from v7.2 to v7.3, v7.2 to v7.4, or v7.3 to v7.4. The upgrade process is not applied with an update installer and may require a professional services engagement to assess the current usage/configuration of RadiantOne to determine an appropriate upgrade strategy and time requirement. All customers currently running v7.2 and v7.3 where ICS/Global Sync is deployed should contact their Radiant Logic Account Representative to have their configuration reviewed and the scope/complexity of their upgrade path assessed. If a customer has only RadiantOne FID deployed (no ICS/Global Sync) and can upgrade their own deployment, they can leverage the upgrade guides available in the Radiant Logic Knowledge Base: [Radiant Logic Knowledge Base](<https://support.radiantlogic.com>).

Here is a general QA example using ChatGPT (3.5). Challenge is how to replicate this kind of answer quality for our own documents?

Question: How do you build and train a deep neural network for playing video game pong from scratch?

Answer:
Building and training a deep neural network to play the video game Pong from scratch is a great way to learn about reinforcement learning and deep reinforcement learning. Pong is a simple but suitable environment for this task. Here's a step-by-step guide on how to do it using Python and popular libraries like TensorFlow and OpenAI Gym:

Environment Setup:

Install Python and necessary libraries like TensorFlow, NumPy, and Gym.
Install Gym with Atari support: pip install gym[atari].
Create the Environment:

Import the necessary libraries.
Create the Pong environment using Gym:
python
Copy code
import gym
env = gym.make('Pong-v0')
Preprocessing:

Preprocess the game frames to reduce the complexity of the input data. This usually involves resizing, converting to grayscale, and normalizing pixel values.
Neural Network Architecture:

Design the neural network architecture. Convolutional Neural Networks (CNNs) are commonly used for processing game frames.
Create a model that maps game frames to actions (move paddle up, down, or stay still).
python
Copy code
import tensorflow as tf

model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (8, 8), strides=(4, 4), activation='relu', input_shape=(84, 84, 4)),
tf.keras.layers.Conv2D(64, (4, 4), strides=(2, 2), activation='relu'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(2, activation='softmax')
])
Define the Training Loop:

Define a training loop that uses reinforcement learning algorithms like Proximal Policy Optimization (PPO) or Deep Q-Networks (DQN). Here's a simplified example using PPO:
python
Copy code
from stable_baselines3 import PPO

model = PPO("CnnPolicy", env, verbose=1)
model.learn(total_timesteps=100000)
Training:

Train the neural network by interacting with the environment. The agent collects experience, updates the policy, and improves its performance over time.
Evaluation:

Evaluate the trained model by running it in the environment and measuring its performance.
Track metrics such as the average reward, win rate, and other relevant statistics.
Hyperparameter Tuning:

Experiment with different hyperparameters (e.g., learning rate, network architecture, training duration) to optimize performance.
Save and Load Model:

Save the trained model weights to disk so that you can use them later for inference without retraining.
python
Copy code
model.save("pong_model")
You can load the model later with model.load("pong_model").

Deployment:

If desired, integrate the trained model into a game-playing application or use it to control a virtual game of Pong.
Iterate and Refine:

Iterate on the training process, making improvements based on evaluation results and insights.
Experiment with different algorithms and architectures for better performance.
Monitoring and Maintenance:

Continuously monitor the model's performance and behavior.
Revisit the model to retrain it with new data or improvements as needed.
Please note that training a deep reinforcement learning agent for Pong from scratch can be computationally intensive and time-consuming. Consider using GPU acceleration and distributed training if available. Additionally, it's helpful to have a solid understanding of reinforcement learning concepts and algorithms before embarking on this project.

@detaos
Copy link
Collaborator

detaos commented Sep 1, 2023

DSPy has a small default for max tokens. Override it to get a longer response:

lm = dspy.OpenAI(
[...]
max_tokens=4096,
)

@drawal1
Copy link
Contributor Author

drawal1 commented Sep 1, 2023

@detaos that worked!

I will now test scenarios where the long form answer is assembled from multiple short answers gathered from different contexts. I am not sure they fit within the DSP paradigm

@DmitriyG228
Copy link

I guess the problem comes as a tradeoff to long prompts that are compiled by dspy. I switched to 16k model to deal with this.

@okhat okhat closed this as completed Feb 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants