Ongoing RL implementation of a logistic problem.
Still in a draft state.
Consider of a list of items characterized by their volumes and their masses. Give a bag of a fixed given volume, the purpose is to maximize the mass of the bag, once filled.
The approach followed is reinforcement learning. The observation space is the list of items placed in the bag (represented by an array of order (2, number_of_items)), and an action consists in taking one item from the list of items which has not yet been placed in the bag and place it in the bag. Note that this is problematic since it suggests that the action space is dynamic, which is not implemented yet.
A step consists in picking an item not placed in the bag yet. The output of a step is:
- State: the content of the bag, i.e. the list of items placed in the bags.
- Reward: the reward for the step is the mass of the item added in the bag.
The procedure stops, i.e. no further steps can be taken, when:
- There is no item left.
- The bag is not filled, but no other items fit in the bag.
- The bag is at maximum capacity.
- Requirements can be installed performing
pip install -r requirements.txt
- The custom Gym environment can be called as:
bag = Bag(config)
where
```config = {"bag_volume": bag_volume, "items": items}``
where
bag_volume
is a float, anditems
is a list of pairs of floats. See run_env_logistic.py for a basic file. - The Python tests of the environments can be run as
pytest -v tests/test_logistic_env.py
An action consists in taking an item not yet placed in the bag, and placing in the bag. Hence, the set action is dynamical, which is not yet readily implemented in gym. The list of allowed actions can be accessed as
bag = Bag()
bag.allowed_actions()
Since the standard method bag.action_space.sample()
does distinguish between allowed action, a new sampling techniques has been introduced as bag.items_sampler()
. This should be changed.