WSU Postgraduate Project Project A & B Reinforcement Learning with Robot Arm
This github directory contains both the Python files to train the RL algorithms (Training) and the Unity projects where different variations of the environments are stored.
robot_subset_joint2_102.mp4
There are so far 4 variations of the Unity project environments used for training the robot (in chronological order of development):
- ArmRobotOriginal - The original environment with:
- all 7 joints fully operational
- Discrete actions (0,1,2)
- The target spawning space as a sphere around the base
- ArmRobotSubset - Modified version (corresponding to "robot_subset.py"):
- Can control where target spawn depending on which joint is active
- Discrete actions (0,1,2) for 7 joints
- ArmRobotContinuous - Similar to ArmRobotSubset but with continuous actions:
- Continuous actions [-1,1] for all 7 joints
- Target spawn controllable by which joint is active
- ArmRobotExpanded - Redefined reward function and observation space (still in development)
For environments that accept active joints to spawn target, the target spawns in a circle surrounding the center position of the active joint. The orientation of the circle depends on the axis of rotation of the joint (for example, if the active joint rotates horizontally -- around the y-axis -- the circle is orientated in a horizontal plane).
The active joint is defined as the lowest joint that is operational. Any joint above that joint is assumed to be operational, while lower joints are assumed fixed. Joints are ordered from the base to the tip, with joint 0
being the bottom-most joint, and joint 6
being the tip joint (we assume this joint is useless because it only rotates the pincher).
You can move the robot around manually using the following keyboard commands:
A/D - rotate base joint
S/W - rotate shoulder joint
Q/E - rotate elbow joint
O/P - rotate wrist1
K/L - rotate wrist2
N/M - rotate wrist3
V/B - rotate hand
X - close pincher
Z - open pincher
This is controlled in the Heuristic
method of the RobotAgent.cs
script of each environment