Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you calculate the reward for Bittle robot? #2

Closed
metalwhale opened this issue May 15, 2024 · 7 comments
Closed

How do you calculate the reward for Bittle robot? #2

metalwhale opened this issue May 15, 2024 · 7 comments
Labels
question Further information is requested

Comments

@metalwhale
Copy link

Hello!
First of all, thank you so much for sharing your work. This is awesome and it will help me a lot when learning RL for real-life robots.
I have some questions. Could you please answer when you have time?
(I also asked on Petoi forum. I'm sorry for any inconvenience, but I think asking on GitHub can help more people who have the same question as me.)

How does your model make the robot move forward? As I understand, you use IMU acceleration data along with yaw, pitch, roll, and current angles as inputs for the neural network. However this data alone doesn't indicate if the robot is moving forward, as velocity is needed for calculating reward, not just acceleration.

Am I missing something?
Thank you in advance!

@ger01d
Copy link
Owner

ger01d commented May 15, 2024

Hello metalwhale,

during training in simulation the reward function receives as an input the x-position (lateral direction) of the robot and calculates the reward for each step. Later the speed or movement of the robot is unknown to the robot, but it is still able to move forward based on the angles and the joint angles (and the joint angles history). Please note, that the speed and x-position isn't part of the observation space of the agent.

Check line 74+75, 181 and also 196:

# The observation space are the torso roll, pitch and the 
# angular velocities and a history of the last 30 joint angles.

current_position = p.getBasePositionAndOrientation(self.robot_id)[0][0] 
movement_forward = current_position - last_position

@ger01d ger01d added the question Further information is requested label May 15, 2024
@metalwhale
Copy link
Author

metalwhale commented May 16, 2024

@ger01d

Please note, that the speed and x-position isn't part of the observation space of the agent.

Thank you for sharing! This is interesting. I will dig deeper into your code.

I have 2 other questions:

  • Could you please show me the code where you define the observation of the robot?
    I guess the observation space is the state of the robot itself (current angles and gyro data), not the position in the environment. Am I right?
    (I searched in opencat_gym_env.py file but I'm not quite sure if I found it)
  • I also understand that your explanation above is for training the robot in simulation.
    With the real Bittle demo (not in simulation), as you showed in this comment in the forum, how is it possible to retrieve the speed and x-position?

@metalwhale
Copy link
Author

@ger01d

I think I get it.
The observation space includes joint angles and gyro data, which represent the state of the robot itself and don't include information about the surrounding environment.
When you trained the model in simulation, you calculated the reward using speed and x-position. The model will later be deployed to a real robot, but this time we don't need to retrieve speed and x-position since we only needed them during training.

Am I correct? :D

If this is true, I wonder if it is possible and how much the performance can be improved if we use real data to train the model rather than data from simulation.

@ger01d
Copy link
Owner

ger01d commented May 16, 2024

Yes, that's correct. ;)

I already tried training Bittle on the real hardware with the RL algorithm SAC (Soft Actor Critic) that is said to be more sample efficient. In this case I used a cheap high speed camera to track a qr code (aruco library) to retrieve the movement in x-direction. But the training progress was not successful and it was quite time consuming to set the robot back each time the episode restarted. Training in simulation has the benefit, that you can generate a lot of training data in a decent amount of time and it's safer for the robot. The downside is the sim2real transfer, which is quite challenging, because of differences between simulation and reality.

@metalwhale
Copy link
Author

metalwhale commented May 17, 2024

@ger01d

That makes sense!

I'm wondering if it is possible to use acceleration data from a gyroscope to determine which direction the robot is moving. Theoretically (IMHO), at the moment we send a command to motors to make the robot move forward, acceleration data on the x-axis becomes positive. If we can capture this data at that exact time, we can use it to compute reward: the more positive and larger cumulative x-axis acceleration data, the further the robot is moving forward.

What do you think? TBH I'm not sure, but I want to try this.

@ger01d
Copy link
Owner

ger01d commented May 17, 2024

Theoretically you can use of course the IMU accelerations to determine the movement in space, since the velocity is the integral of the acceleration and the position is the integral of the velocity. Regarding practical aspects you will have to find methods to reduce noise in the signal, because the RL algorithm will be very sensible to the data. For instance if the sensor has a negative acceleration in the moment you collect the data your reward will be negative. If this result was just from noise and the real movement in space is in positive direction, the RL algorithm will take wrong conclusions.

@metalwhale
Copy link
Author

metalwhale commented May 17, 2024

@ger01d

Thank you so much for your kind support!
I'm still a newbie to reinforcement learning and really appreciate your help.

I have no more questions, but if you don't mind, please let me ask again later when I have other ideas to discuss.
Closing this issue with faith ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants