The challenge that the participants are to solve involves a combination of order dispatching (order matching) and vehicle repositioning (fleet management) on an MoD platform. 

1. The teams are to develop algorithms for either or both of these tasks.
  The algorithms are evaluated in a simulation environment that simulates the dynamics of an MoD platform. 
  
2. They are encouraged (but not limited) to use reinforcement learning methods to solve the problems. 

参赛基本要求：

（1）对于MOD系统的仿真和模拟

（2）鼓励使用强化学习

The test environment maintains the states of all the vehicles and trip orders.

1. Each vehicle can serve only one trip at a time, i.e. carpooling is not considered. 


2. The order dispatching algorithm is given the states of the vehicles and orders at the time of invocation, and it is to return an assignment between the vacant vehicles and open orders. It is possible for either a vehicle or an order to be unassigned. 


3. The environment invokes the order dispatching algorithm every two seconds and executes the assignment. The assigned vehicles will be dispatched to pick up the order and transport to the destinations. If an order is not matched within its window, it is assumed lost. The passenger can cancel the order if the pick-up time for the matched driver is too long. 


4. After dropping off the passenger, the driver becomes idle and can thus move around with a vacant vehicle. During the course of any such movement, the driver is still eligible for order matching. 



5. The participants can control the repositioning of a small group (5) of the vehicles in the environment whose identity is unknown to the participants. For any of those vehicles, if the idle time exceeds a threshold of L=5 minutes, the vehicle becomes eligible for repositioning. The environment periodically sends the state information of all eligible vehicles within the selected group to the repositioning algorithm, which instructs the drivers to cruise to a specific destination. If the drivers are to stay around the current locations, they stay for L minutes before another repositioning could be triggered. The drivers other than those selected perform idle movement according to a set of generic transition probabilities. The vehicle speed is set at three meters per second for repositioning along spherical distance (a.k.a. great-circle distance).

调度任务的筛选条件：

(1) 以五辆车为组来进行调度

(2) 司机的空余时间长度超过$L=5$分钟

(3) 会对于每组的司机发布调度的指令（调度终点），若调度终点为当前的位置，该组司机需要继续等待$L=5$分钟

(4) 未被选中执行调度任务的司机会按照‘给定的转移概率进行转移’，车速为3米每秒，本研究之中所用的距离为球面距离(spherical distance).

主要来依据haversine公式来计算球面距离，参考链接：https://www.cnblogs.com/andylhc/p/9481636.html

![image.png](attachment:image.png)

# Task 2: Vehicle repositioning

The team is to develop a repositioning algorithm for a pre-selected small group of vehicles.

1. The identity of the vehicles for algorithmic repositioning is unknown to the teams, requiring that the algorithms be agnostic to the identity of the vehicles.  


2. For any of those vehicles, if the continuous idle time exceeds a threshold of L=5 minutes, the vehicle becomes eligible for repositioning. 


3. The environment periodically sends the state information of all eligible vehicles within the selected group to the repositioning algorithm, which instructs the drivers to cruise to a specific destination. 


4. The mean individual income rate (defined in Evaluation) for the group over the simulation period is computed as the score for this algorithm.


5. The algorithms will be evaluated in a simulation environment which the teams do not have access to, except the scores generated by the environment. The participating teams can choose to develop either or both algorithms. (One can always use the sample code for one of the algorithms, although that obviously will not produce a competitive score.)


4. The two policies are related (i.e. they are run in the same environment), but they are also sufficiently separate so that skipping one does not necessarily jeopardize the chance of winning on the other.  


The set-up of the competition resembles a real industrial setting, where it is typically not possible to train and explore directly on the production system due to operational and financial risks. On the other hand, there is usually abundance of data from historical operations. A simulator can be built for evaluating potential algorithms, but it is hard to simulate every details of the production system close to reality.

# 调度任务简介

1. 车辆的ID是未知的，所涉及的算法也无需对于车辆ID进行识别。


2. 车辆空余时间超过一定阀值，则该车辆会触发调度任务。


3. 模拟系统会向激活的车辆来发布调度指令，接到调度指令的司机会自发前往调度终点。


4. 平均的个体收入会被来当作算法的评估指标。


5. 建议来设计两种策略，两种策略互相为参考指标。


6. 所设计算法策略会在滴滴自己的仿真系统上进行评估。


![image.png](attachment:image.png)


# Rules


- All participants need to sign-up in the management system (DiDi Employees are not allowed to participate in this competition).

Challenge Platform: https://biendata.com/competition/kdd_didi

- Participants form teams inside the management system. Each team must consist no more than ten members. Each team needs to appoint a leader. Team title should contain no more than 15 characters.

- Each participant can join only one single team. Registering more than one account to join more than one team will lead to disqualification of all teams involved.

- Teams are allowed to combine before the team merge deadline but teams may not split. Combined teams must consist of no more than ten members.

- It is allowed to use open-source codes or tools, but using codes or tools that require authorization is not allowed.

- Except the datasets provided by the competition organizer, it is not allowed to use any external data.

- Privately sharing code or data outside of teams is not allowed.


- Each team can submit test code through Quick Test Submission.Submissions are limited to 20 times per day per team. The file size for each submission should not exceed 1GB. There is a fifteen-minute timeout limit for the run time.



- Each team is allowed to submit their solution up to 1 time every day for evaluation in the test environment. The file size for each submission should not exceed 1GB. There is a twenty-hour timeout limit for the run time.



- The competition organizers reserve the right to update the competition timeline and rules if they deem it necessary.

# 参赛规则

1. 所有参与者需要在平台上进行注册参赛


2. 参赛队伍不超过十个人


3. 队伍可以合并


4. 可以使用开源的的工具以及包


5. 不允许在队伍外部分享代码


6. Quick Test Submission: 每天最多提交20次，文件大小小于1GB，运行时间小于15分钟


7. Regular Submission: 每天最多提交1次，文件大小小于1GB，运行时间小于2小时


8. 主办方有权调整比赛时间


# Competition Process

The competition is divided into two phases, which are described in details below.

Development

The teams will be given a development kit consisting of data sets, sample source code and other information to develop the required algorithms. 

The development of a solution does not require building a simulator, although description of important environment dynamics will be provided to the teams, and they can choose to build a simulator at their discretion. 

The solution is in the form of source code that conforms to the specified module interfaces. 

The vehicle repositioning algorithm will receive upon invocation the state information of the current driver and all the other idle drivers among the specific repositioning drivers, whose identity is unknown to the teams. 

The teams will be able to discuss any questions that they have with the organizers through the forum on the competition platform.

# Validation

The solution is to be submitted in a zip file with a specified file directory structure through the competition platform. 

Each team is allowed to submit their solution 1 time every day for evaluation in our simulation environment. The evaluation runs offline, and each successful submission will receive two scores corresponding to the two metrics. The scores are updated on the leaderboard. 

Evaluation of each submission will additionally report other relevant metrics, e.g., fulfillment rate, response rate, as feedback to the teams.

# Testing

Before entering the final Testing Phase, the teams will be asked to submit an explanation on how their solution is related to RL in the broad sense (e.g., MDP, Approximate DP, bandits, adaptive control, optimization with long horizon). 

The union of the top 25 teams for each task (or the total number of teams, whichever is less) from Development Phase enter the final Testing Phase. 

The last submission of each selected team from Development Phase will be evaluated in a separate test environment (never seen by the teams during training phase but with the same environment dynamics). 

The last score from the Development Phase carries 40% weight into the final score, while the score from the Testing Phase carries 60%. The teams will be ranked on the two final scores separately.


# 开发流程


1. 给定数据集，样例代码，以及开发算法的其他信息


2. 不要求搭建仿真系统，但是会给定一些仿真系统的参数，如有兴趣，可以自行搭建仿真系统


3. 需要提供的的代码需要按照给定的接口与仿真系统对接


4. 调度任务需要来调用当前司机和空车司机的状态信息，司机的ID是未知的


5. 参赛队伍可以向比赛方提出任何问题。


# 校验流程

1. 文件以zip形式打包


2. 提交次数限制


3. 提交后会产出评估指标


# 测试流程

1. 对于方法的说明文稿


2. Top 25进入复赛


3. 总分数：最后一次的提交的分数 * 0.4 + 滴滴自测的分数 * 0.6





# Evaluation

Each trip order in the released data and the evaluation environment is assigned a reward unit number that represents the potential income for the driver. Evaluation is run over multiple days in simulation, with the same city as the one in the released data set but different time interval. 

Submissions are evaluated on two metrics corresponding to the two tasks.


![image.png](attachment:image.png)




# 评测指标

1. 司机每服务一笔订单，就会得到一份奖励

2. 会提取同一个城市的，不同日期，不同时段数据对于算法进行校验

![image.png](attachment:image.png)