Skip to content

Commit

Permalink
Docs: toc everywhere
Browse files Browse the repository at this point in the history
  • Loading branch information
erdnaxe committed Jul 17, 2020
1 parent a0426fe commit 64f445a
Show file tree
Hide file tree
Showing 12 changed files with 101 additions and 16 deletions.
17 changes: 14 additions & 3 deletions docs/build_the_electronics.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
**Table of content**:

[TOC]

* * *

# PCB

PCB were designed using Eagle 9, which demo version is sufficient.
Expand All @@ -11,6 +17,11 @@ PCB design files are available here :

![Boards](img/boards.png)

!!! Warning "Only one serial port"

These PCB connect all servomotors serial ports together.
This may bottleneck your communication speed later, so if your board have more serial ports you may remix those files.

* * *

## Battery
Expand All @@ -27,10 +38,10 @@ but you may go with a Lithium battery
that will have a better capacity over mass, thus being more dangerous.
With the NiMH cells, we measured around 2 hours of autonomy using the robot.

### Battery protection and connection
!!! Danger "Battery protection and connection"

For the battery, **you should use a fuse** and a switch in series
and then connect onto the large battery connector of the power board.
For the battery, **you should use a fuse** and a switch in series
and then connect onto the large battery connector of the power board.

### Powering the embedded computer

Expand Down
6 changes: 6 additions & 0 deletions docs/build_the_structure.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
**Table of content**:

[TOC]

* * *

# Build the structure

This section focuses on building the base structure of the robot.
Expand Down
6 changes: 6 additions & 0 deletions docs/gym_environments.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
**Table of content**:

[TOC]

* * *

# Introduction to reinforcement learning

**Reinforcement learning** (RL) consists in using machine learning techniques to
Expand Down
12 changes: 7 additions & 5 deletions docs/implementations_ppo.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Learning algorithm
**Table of content**:

[TOC]

## Choosing a learning algorithm
* * *

# Choosing a learning algorithm

This project choose to use **Proximal Policy Optimization** which is an **on-policy**, policy gradient method.
Other popular algorithm are:
Expand All @@ -11,7 +13,7 @@ Other popular algorithm are:
- **"Vanilla" Policy Gradient** methods which have poor data efficiency and robustness.
- **Trust Region / natural Policy gradient** methods (such as TRPO) which has a similar data efficiency and performance compared to PPO, while being more complicated [[Schulman et al., 2017](references.md#schulman2017ppo)].

### On-policy vs Off-policy
## On-policy vs Off-policy

An **on-policy** algorithm does not use old data.
In our case it means that one batch of simulation episodes will one be used to train the next policy,
Expand All @@ -23,7 +25,7 @@ Nonetheless it increases learning stability.
> These algorithms directly optimize [...] policy performance and it works out mathematically that you need on-policy data to calculate the updates. So, this family of algorithms trades off sample efficiency in favor of stability—but you can see the progression of techniques (from VPG to TRPO to PPO) working to make up the deficit on sample efficiency. <br/>
> \-- [Algorithms, OpenAI SpinningUp](https://spinningup.openai.com/en/latest/user/algorithms.html#the-on-policy-algorithms)
### Policy Gradient Methods
## Policy Gradient Methods

Let's define some common notations in reinforcement learning:

Expand All @@ -42,7 +44,7 @@ Then this gradient estimate is used in a stochastic gradient ascent algorithm. T
When training we will alternate between running simulation environments to generate a batch of sample (CPU/RAM bottlenecked)
and optimization (GPU).

## Proximal Policy Optimization
# Proximal Policy Optimization

Proximal Policy Optimization is a policy gradient method for reinforcement
learning developed by OpenAI [[Schulman et al., 2017](references.md#schulman2017ppo)].
Expand Down
10 changes: 8 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,19 @@ serial communications capability.
These servomotors are able to measure motor torque and speed, and are easy to
use with a serial port.

Even though the embedded computer has two free serial ports, getting all sensors values of 18 servomotors take an average of 40 ms, see [more details there](use_motors.md#how-slow-is-it).
This means Kraby may be too slow to do proper torque control, that is why we try to use them in position control.

These servomotors are similar to the more popular **Dynamixel AX-12**.

### Simulation environment included

An [OpenAI Gym](https://gym.openai.com/) environment is available.
[OpenAI Gym](https://gym.openai.com/) environments are available and packaged as [gym-kraby](https://pypi.org/project/gym-kraby/) on PyPi.
It uses [BulletPhysics](https://github.com/bulletphysics/bullet3) simulator
with [an URDF description](https://github.com/erdnaxe/kraby/blob/master/gym_kraby/data/hexapod.urdf) of the robot.
with [an URDF description](https://github.com/erdnaxe/kraby/blob/master/gym_kraby/data/hexapod.urdf) of the robot,
see [more details there](gym_environments.md).

![OpenAI Gym environment](img/env_demo.jpg)

* * *

Expand Down
8 changes: 7 additions & 1 deletion docs/install_nanopi_linux.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
**Table of content**:

[TOC]

* * *

# Compile NanoPi linux kernel with MPU9250 module

As of may 2020, only the official NanoPi kernel supports NanoPi Neo4 GPU and VPU.
Expand Down Expand Up @@ -103,7 +109,7 @@ then `ssh pi@10.42.0.1`.
!!! Note

You can plug wired Internet access
and NetworkManager will automatically use and share this connection.
and NetworkManager will automatically use and share this connection.

## Test the MPU9250

Expand Down
6 changes: 6 additions & 0 deletions docs/training_one_leg.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
**Table of content**:

[TOC]

* * *

# Training one robot leg

This section gives some example and draws some conclusions about
Expand Down
6 changes: 6 additions & 0 deletions docs/transfer_real_world.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
**Table of content**:

[TOC]

* * *

# Preparing transfer to real world

## Using bigger timesteps
Expand Down
32 changes: 28 additions & 4 deletions docs/urdf_description.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
**Table of content**:

[TOC]

* * *

# URDF description

The **Unified Robot Description Format** (URDF) is an XML format
Expand All @@ -7,9 +13,13 @@ by the Robotic Operating System project (ROS).
Kraby URDF description is available at
<https://github.com/erdnaxe/kraby/blob/master/gym_kraby/data/hexapod.urdf>.

* * *
!!! Note "Do not use SDF"

This project started with a SDFormat description of the robot in Gazebo.
Despite PyBullet being able to load SDF files, the support is not as good as URDF files
and some simulation parameters were impossible to input.

## Editing and building the URDF
## Editing and building the URDF with Jinja templating

To simplify URDF editing and to avoid input errors,
the project uses Jinja2[^xacro] templates to generate the URDF.
Expand All @@ -18,8 +28,6 @@ You may edit files under
then execute
[gym_kraby/data/generate_urdf.py](https://github.com/erdnaxe/kraby/tree/master/gym_kraby/data).

* * *

## Computing 3d-printed part inertia

[Meshlab](http://www.meshlab.net/) is able to compute a inertia tensor from
Expand Down Expand Up @@ -62,3 +70,19 @@ For more information, see
<http://gazebosim.org/tutorials?tut=inertia&cat=build_robot>.

[^xacro]: [Xacro](http://wiki.ros.org/xacro) could also be an option, but it requires to install the ROS toolchain.

# PyBullet integration

Now that our robot is fully described in URDF, we may load it in BulletPhysics using PyBullet Python bindings.

```Python
import pybullet as p

p.connect(p.GUI) # Open new physic server with GUI
p.setGravity(0, 0, -9.81) # No, we are still on earth
p.setAdditionalSearchPath(getDataPath()) # Add pybullet_data
p.loadURDF("plane.urdf") # Load a ground
p.loadURDF("hexapod.urdf") # Load the full robot
```

You may then use `p.setJointMotorControlArray` to control motors.
6 changes: 6 additions & 0 deletions docs/use_motors.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
**Table of content**:

[TOC]

* * *

# Send and receive data from servomotors

The NanoPi NEO4 has five 3.3V UART that can go up to 1.5 Mbauds/s.
Expand Down
6 changes: 6 additions & 0 deletions docs/use_mpu9250.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
**Table of content**:

[TOC]

* * *

# Fetch raw data from MPU9250 sensors

Download, install and enable `iiod` server on NanoPi,
Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ nav:
- Use the internal motion unit: use_mpu9250.md
- Use the servomotors: use_motors.md
- Simulation:
- URDF description: urdf_description.md
- Robot simulation: urdf_description.md
- Learning environments: gym_environments.md
- Learning algorithm: implementations_ppo.md
- Training one leg: training_one_leg.md
Expand Down

0 comments on commit 64f445a

Please sign in to comment.