Incentivizes vehicles to optimize sensing distribution in crowd sensing system
The reinforcement learning standard core code
The agent for RL
calc_q_value
- Given a state (or batch of states) calculate the Q-values.
update_policy
- Update your policy.
fit
- Fit your model to the provided environment.
evaluate
- Test your agent with a provided environment.
The core classes needed for RL
Sample
- Represents a reinforcement learning sample.
Used to store observed experience from an MDP. Represents a
standard
(s, a, r, s', terminal)
tuple.
- Represents a reinforcement learning sample.
Used to store observed experience from an MDP. Represents a
standard
ReplayMemory
- Interface for replay memories.
The environment for training
step
- Given an action, compute the next state and reward, i.e. progress the state.
_compute_reawrd
- Compute the reward given the current distribution of taxis and desired distribution (KL divergence).
Standard CNN, with activtion fucntion relu
The loss functions, calculating mean huber loss
RL Policy classes, we are using LinearDecayGreedyEpsilonPolicy
Utility class to calculate distance and paths from index values
Simulator for driver reactions
step
- Lottery pick to match requests and drivers. If not assigned, go to the best possible adjacent grid, or remain unmoved.
Basic structures and utilities to visualize data
The ultimate agent to make decisions, place holder
Environment simulator, provides simulated data from real data
Visualize data on canvas, helpful to see distributions