connection closed by SUMO #295

BBDrive · 2020-12-10T14:56:20Z

When I run multiple instances with ray, it gives an error.

(pid=26758) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=26765) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=26767) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=26759) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=26750) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=26763) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=26753) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=26766) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=26752) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=26759) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=26758) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=26765) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=26750) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=26767) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=26752) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=26766) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=26753) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=26760) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=26760) ERROR:Zoo Worker:Failure while handling connection EOFError()

But it can still work for a few rounds. After running for a while, it crashed.

(pid=26763) ERROR:SMARTS:Simulation crashed with exception. Attempting to cleanly shutdown.
(pid=26763) ERROR:SMARTS:connection closed by SUMO
(pid=26763) Traceback (most recent call last):
(pid=26763)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 170, in step
(pid=26763)     return self._step(agent_actions)
(pid=26763)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 212, in _step
(pid=26763)     provider_state = self._step_providers(all_agent_actions, dt)
(pid=26763)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 684, in _step_providers
(pid=26763)     provider, actions, dt, self._elapsed_sim_time
(pid=26763)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 723, in _step_provider
(pid=26763)     provider_state = provider.step(provider_actions, dt, elapsed_sim_time)
(pid=26763)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/sumo_traffic_simulation.py", line 305, in step
(pid=26763)     self._traci_conn.simulationStep(self._cumulative_sim_seconds)
(pid=26763)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 302, in simulationStep
(pid=26763)     result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
(pid=26763)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 180, in _sendCmd
(pid=26763)     return self._sendExact()
(pid=26763)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 90, in _sendExact
(pid=26763)     raise FatalTraCIError("connection closed by SUMO")
(pid=26763) traci.exceptions.FatalTraCIError: connection closed by SUMO
Traceback (most recent call last):
  File "main.py", line 166, in <module>
    main(args)
  File "main.py", line 65, in main
    memory = sampler.sample(network)
  File "/home/hp/PycharmProjects/kaylen/ppo_highway/sampler_asyn.py", line 84, in sample
    for epi in ray.get(episode):
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/ray/worker.py", line 1513, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(FatalTraCIError): ray::Environment.one_episode() (pid=26763, ip=172.31.73.204)
  File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
  File "/home/hp/PycharmProjects/kaylen/ppo_highway/sampler_asyn.py", line 61, in one_episode
    new_observation, reward, done, _ = self.env.step(action[0])
  File "/home/hp/PycharmProjects/kaylen/ppo_highway/ENV/smartsEnv.py", line 41, in step
    observation, reward, done, info = self.env.step({self.AGENT_ID: action})
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/env/hiway_env.py", line 155, in step
    observations, rewards, agent_dones, extras = self._smarts.step(agent_actions)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 170, in step
    return self._step(agent_actions)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 212, in _step
    provider_state = self._step_providers(all_agent_actions, dt)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 684, in _step_providers
    provider, actions, dt, self._elapsed_sim_time
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 723, in _step_provider
    provider_state = provider.step(provider_actions, dt, elapsed_sim_time)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/sumo_traffic_simulation.py", line 305, in step
    self._traci_conn.simulationStep(self._cumulative_sim_seconds)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO
/home/hp/anaconda3/envs/smarts/lib/python3.7/subprocess.py:883: ResourceWarning: subprocess 26716 is still running
  ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback

The text was updated successfully, but these errors were encountered:

Gamenot · 2020-12-11T20:03:49Z

Hello, could you give a few more details on what you were running to cause this? Does this occur for you with the examples/rllib.py example?

BBDrive · 2020-12-14T01:40:55Z

I run the following code.

import ray
import gym

from smarts.core.agent_interface import AgentInterface, AgentType
from smarts.core.agent import AgentSpec, Agent


class SimpleAgent(Agent):
    def act(self, obs):
        return "keep_lane"

@ray.remote
class Environment:
    def __init__(self):
        self.AGENT_ID = "Agent-007"
        agent_spec = AgentSpec(
            interface=AgentInterface.from_type(AgentType.Laner, max_episode_steps=1000),
            agent_builder=SimpleAgent,
        )
        self.env = gym.make(
            "smarts.env:hiway-v0",
            scenarios=["/home/hp/SMARTS/scenarios/loop"],
            agent_specs={self.AGENT_ID: agent_spec},
        )
        self.agent = agent_spec.build_agent()

    def sample(self):
        observations = self.env.reset()

        while True:
            agent_action = self.agent.act(observations[self.AGENT_ID])
            observations, reward, done, _ = self.env.step({self.AGENT_ID:agent_action})
            if done[self.AGENT_ID]:
                break
        return 1  # return sampled trajectory


def train(trajectory):
    return 0


if __name__ == '__main__':
    ray.init()
    cpu = 8
    environment = [Environment.remote() for _ in range(cpu)]
    for i in range(100000):
        trajectory = ray.get([env.sample.remote() for env in environment])
        train(trajectory)
        print("Episode:%d" % i)
    ray.shutdown()

The console output is

(pid=20551) pybullet build time: Nov 26 2020 23:08:25
(pid=20549) pybullet build time: Nov 26 2020 23:08:25
(pid=20545) pybullet build time: Nov 26 2020 23:08:25
(pid=20539) pybullet build time: Nov 26 2020 23:08:25
(pid=20552) pybullet build time: Nov 26 2020 23:08:25
(pid=20559) pybullet build time: Nov 26 2020 23:08:25
(pid=20553) pybullet build time: Nov 26 2020 23:08:25
(pid=20562) pybullet build time: Nov 26 2020 23:08:25
(pid=20539) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=20545) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=20552) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=20553) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=20549) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=20551) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=20562) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=20559) ERROR:RemoteAgentBuffer:Waiting for local zoo worker to start up, retrying 0 / 3
(pid=20539) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=20545) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=20552) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=20553) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=20549) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=20551) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=20562) ERROR:Zoo Worker:Failure while handling connection EOFError()
(pid=20559) ERROR:Zoo Worker:Failure while handling connection EOFError()
Episode:0
Episode:1
...
Episode:60
Episode:61
(pid=20553) 2000 -> Problem solution failed (solver error)
(pid=20549) 2000 -> Problem solution failed (solver error)
(pid=20545) 2000 -> Problem solution failed (solver error)
(pid=20551) 2000 -> Problem solution failed (solver error)
(pid=20559) 2000 -> Problem solution failed (solver error)
(pid=20562) 2000 -> Problem solution failed (solver error)
(pid=20539) 2000 -> Problem solution failed (solver error)
(pid=20552) 2000 -> Problem solution failed (solver error)
Episode:62
Episode:63
...
Episode:80
Episode:81
(pid=20553) 2000 -> Problem solution failed (solver error)
(pid=20545) 2000 -> Problem solution failed (solver error)
(pid=20552) 2000 -> Problem solution failed (solver error)
(pid=20553) 2000 -> Problem solution failed (solver error)
(pid=20562) 2000 -> Problem solution failed (solver error)
(pid=20545) 2000 -> Problem solution failed (solver error)
(pid=20552) 2000 -> Problem solution failed (solver error)
(pid=20562) 2000 -> Problem solution failed (solver error)
Episode:82
Episode:83
...
Episode:366
Episode:367
(pid=20552) ERROR:SMARTS:Simulation crashed with exception. Attempting to cleanly shutdown.
(pid=20552) ERROR:SMARTS:connection closed by SUMO
(pid=20552) Traceback (most recent call last):
(pid=20552)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 170, in step
(pid=20552)     return self._step(agent_actions)
(pid=20552)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 212, in _step
(pid=20552)     provider_state = self._step_providers(all_agent_actions, dt)
(pid=20552)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 684, in _step_providers
(pid=20552)     provider, actions, dt, self._elapsed_sim_time
(pid=20552)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 723, in _step_provider
(pid=20552)     provider_state = provider.step(provider_actions, dt, elapsed_sim_time)
(pid=20552)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/sumo_traffic_simulation.py", line 305, in step
(pid=20552)     self._traci_conn.simulationStep(self._cumulative_sim_seconds)
(pid=20552)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 302, in simulationStep
(pid=20552)     result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
(pid=20552)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 180, in _sendCmd
(pid=20552)     return self._sendExact()
(pid=20552)   File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 90, in _sendExact
(pid=20552)     raise FatalTraCIError("connection closed by SUMO")
(pid=20552) traci.exceptions.FatalTraCIError: connection closed by SUMO
Traceback (most recent call last):
  File "test.py", line 47, in <module>
    trajectory = ray.get([env.sample.remote() for env in environment])
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/ray/worker.py", line 1513, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(FatalTraCIError): ray::Environment.sample() (pid=20552, ip=172.31.73.204)
  File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
  File "test.py", line 32, in sample
    observations, reward, done, _ = self.env.step({self.AGENT_ID:agent_action})
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/env/hiway_env.py", line 155, in step
    observations, rewards, agent_dones, extras = self._smarts.step(agent_actions)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 170, in step
    return self._step(agent_actions)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 212, in _step
    provider_state = self._step_providers(all_agent_actions, dt)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 684, in _step_providers
    provider, actions, dt, self._elapsed_sim_time
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/smarts.py", line 723, in _step_provider
    provider_state = provider.step(provider_actions, dt, elapsed_sim_time)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/smarts/core/sumo_traffic_simulation.py", line 305, in step
    self._traci_conn.simulationStep(self._cumulative_sim_seconds)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "/home/hp/anaconda3/envs/smarts/lib/python3.7/site-packages/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO
/home/hp/anaconda3/envs/smarts/lib/python3.7/subprocess.py:883: ResourceWarning: subprocess 20493 is still running
  ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback

BBDrive · 2020-12-14T01:43:34Z

I'm looking forward to you answer. Thanks.

Gamenot · 2020-12-15T18:59:00Z

Hello, sorry for the late reply. I have done some testing for this error and the solution is unclear but the issue is reproducible.

Traceback (most recent call last):
  File "examples/rllib_problem.py", line 47, in <module>
    trajectory = ray.get([env.sample.remote() for env in environment])
  File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/worker.py", line 1506, in get
    values = worker.get_objects(object_ids, timeout=timeout)
  File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/worker.py", line 312, in get_objects
    return self.deserialize_objects(data_metadata_pairs, object_ids)
  File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/worker.py", line 280, in deserialize_objects
    return context.deserialize_objects(data_metadata_pairs, object_ids)
  File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/serialization.py", line 323, in deserialize_objects
    self._deserialize_object(data, metadata, object_id))
  File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/serialization.py", line 284, in _deserialize_object
    obj = self._deserialize_pickle5_data(data)
  File ".../SMARTS/.venv/lib/python3.7/site-packages/ray/serialization.py", line 262, in _deserialize_pickle5_data
    obj = pickle.loads(in_band)
ModuleNotFoundError: No module named 'traci'

It looks like one of the ray workers was unable to import traci and then shortly later the traci connection also fails. This is not an issue we have seen before so it might take some time to resolve.

Gamenot · 2020-12-15T19:48:52Z

May have found the potential cause: #313

JianmingTONG · 2020-12-22T02:34:52Z

Hi Gamenot. I also face the same problem. Is there any progress on this problem?

Gamenot · 2020-12-22T15:59:21Z

Hello, @JianmingTONG, there is some good progress on #331. This is a fairly critical and should solve both the error messages and the SUMO connection issue.

JianmingTONG · 2020-12-23T15:50:26Z

Thanks for the reply @Gamenot . I have tried the version that seems to solve the sumo issue. i.e. 2a45972 (commit ID). However, there is still the "connection closed by sumo" error, when I try the following commands. Note: I change the episode from 10 to 1000000 to test the scenario without launching the training process.

#terminal 1
scl envision start -s ./scenarios -p 8081

#terminal 2
$python example/single_agent.py benchmark/scenarios/two_ways/bid

#terminal 3
$python example/single_agent.py benchmark/scenarios/two_ways/bid_sv

PS: both terminal 2 and terminal 3 died at 8185 iteration.

Adaickalavan · 2020-12-23T16:42:47Z

Hi @JianmingTONG, we aware of two separate shortcomings in the impementation of (i) ray and (ii) remote agents. We are curently actively looking into them both. Unfortunately, #331 is not ready for use yet.

JianmingTONG · 2021-01-04T11:10:47Z

Hi @Adaickalavan @Gamenot, I see that #366 has been closed. Has the issue been solved?

Thanks, wish you a happy new year.

Gamenot · 2021-01-04T19:40:17Z

@JianmingTONG Happy new year, thank you, it is looking like the problem is addressed however we are testing to make sure it is, in fact, solved.

Gamenot · 2021-01-04T19:44:46Z

As for the use with ray we have found that it is important to call env.close() explicitly on ray or some resources may be left over which may prevent some ray workers from exiting properly.

A modified example is as follows:

import gym
import ray

from smarts.core.agent import Agent, AgentSpec
from smarts.core.agent_interface import AgentInterface, AgentType


class SimpleAgent(Agent):
    def act(self, obs):
        return "keep_lane"


@ray.remote
class Environment:
    def __init__(self):
        self.AGENT_ID = "Agent-007"
        agent_spec = AgentSpec(
            interface=AgentInterface.from_type(AgentType.Laner, max_episode_steps=1000),
            agent_builder=SimpleAgent,
        )
        self.env = gym.make(
            "smarts.env:hiway-v0",
            scenarios=["scenarios/loop"],
            agent_specs={self.AGENT_ID: agent_spec},
            headless=True,
        )
        self.agent = agent_spec.build_agent()

    def sample(self):
        observations = self.env.reset()

        while True:
            agent_action = self.agent.act(observations[self.AGENT_ID])
            observations, reward, done, _ = self.env.step({self.AGENT_ID: agent_action})
            if done[self.AGENT_ID]:
                break

        return 1  # return sampled trajectory

    # Should be called when the environment is no longer needed
    def close(self):
        self.env.close()

if __name__ == "__main__":
    num_cpus = 2
    ray.init(num_cpus=num_cpus)
    environments = [Environment.remote() for _ in range(num_cpus)]
    try:
        for i in range(10000):
            futures = [env.sample.remote() for env in environments]
            trajectories = []
            for env, f in zip(environments, futures):
                trajectories.append(ray.get([f]))
            train(trajectories)
            print("Episode:%d" % i)
    finally:
        close_futures = [env.close.remote() for env in environments]
        ray.get(close_futures)
        ray.shutdown()

What this means specifically is that the underlying smarts instance needs to be disposed:

SMARTS/smarts/env/hiway_env.py

Lines 202 to 204 in d006ace

    
           def close(self): 
        
               if self._smarts is not None: 
        
                   self._smarts.destroy()

Adaickalavan · 2021-01-04T20:11:07Z

Hi @JianmingTONG , happy new year.

It appears that we have fixed this issue alongside other distributed computing issues (#331).

I have verified that executing the commands below, the code runs successfully to completion.

Run in terminal 1:

$ cd /path/to/repository/SMARTS/
$ scl scenario build --clean ./benchmark/scenarios/two_ways/bid
$ scl scenario build --clean ./benchmark/scenarios/two_ways/bid_sv
$ scl envision start -s ./scenarios -p 8081

See the visualization in a browser at http://localhost:8081/.
Run in terminal 2:

$ python3.7 ./examples/single_agent.py benchmark/scenarios/two_ways/bid_sv --episodes 10000

Run in terminal 3:

$ python3.7 ./examples/single_agent.py benchmark/scenarios/two_ways/bid --episodes 10000

Going forward, please

pull the latest SMARTS code from the main branch,
setup your Python virtual environment, and
re-run pip install -r requirements.txt.

Adaickalavan · 2021-01-04T20:53:23Z

To summarize, the problem is broken down to two parts:

@BBDrive

Problem:

raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO
    
No module named 'traci'

Issue: cotinued at the bottom

@JianmingTONG
- Problem: connection closed by SUMO #295 (comment)
- Solved: connection closed by SUMO #295 (comment)

JianmingTONG · 2021-01-05T06:38:11Z

Hi @JianmingTONG , happy new year.

It appears that we have fixed this issue alongside other distributed computing issues (#331).

I have verified that executing the commands below, the code runs successfully to completion.

Run in terminal 1:
$ cd /path/to/repository/SMARTS/
$ scl scenario build --clean ./benchmark/scenarios/two_ways/bid
$ scl scenario build --clean ./benchmark/scenarios/two_ways/bid_sv
$ scl envision start -s ./scenarios -p 8081
See the visualization in a browser at http://localhost:8081/.
Run in terminal 2:
$ python3.7 ./examples/single_agent.py benchmark/scenarios/two_ways/bid_sv --episodes 10000
Run in terminal 3:
$ python3.7 ./examples/single_agent.py benchmark/scenarios/two_ways/bid --episodes 10000
Going forward, please

pull the latest SMARTS code from the main branch,

setup your Python virtual environment, and

re-run pip install -r requirements.txt.

Hi, I follow the instructions here to launch the example evaluation. However, it complains the following issues.

│       4088/1000000 │               3.58 │                 25 │              35.79 │             bid_sv │          1.rou.xml │ 351907917703150455 │  60.96 - Agent-007 │
│       4089/1000000 │               3.72 │                 39 │              37.19 │             bid_sv │          2.rou.xml │ 351907917703150455 │  93.60 - Agent-007 │
╰────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────╯
Traceback (most recent call last):
  File "./examples/single_agent.py", line 82, in <module>
    seed=args.seed,
  File "./examples/single_agent.py", line 60, in main
    observations = env.reset()
  File "/media/nics/Data/SMARTS/smarts/env/hiway_env.py", line 189, in reset
    env_observations = self._smarts.reset(scenario)
  File "/media/nics/Data/SMARTS/smarts/core/smarts.py", line 306, in reset
    self.setup(scenario)
  File "/media/nics/Data/SMARTS/smarts/core/smarts.py", line 353, in setup
    provider_state = self._setup_providers(self._scenario)
  File "/media/nics/Data/SMARTS/smarts/core/smarts.py", line 643, in _setup_providers
    provider_state.merge(provider.setup(scenario))
  File "/media/nics/Data/SMARTS/smarts/core/sumo_traffic_simulation.py", line 249, in setup
    [tc.VAR_DEPARTED_VEHICLES_IDS, tc.VAR_ARRIVED_VEHICLES_IDS]
  File "/home/nics/Package/sumo/tools/traci/_simulation.py", line 440, in subscribe
    Domain.subscribe(self, "", varIDs, begin, end)
  File "/home/nics/Package/sumo/tools/traci/domain.py", line 208, in subscribe
    self._connection._subscribe(self._subscribeID, begin, end, objectID, varIDs)
  File "/home/nics/Package/sumo/tools/traci/connection.py", line 231, in _subscribe
    result = self._sendCmd(cmdID, (begin, end), objID, format, *args)
  File "/home/nics/Package/sumo/tools/traci/connection.py", line 178, in _sendCmd
    return self._sendExact()
  File "/home/nics/Package/sumo/tools/traci/connection.py", line 88, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO
ERROR:RemoteAgentBuffer:Exception while tearing down buffered remote agent. ValueError('Cannot invoke RPC on closed channel!')
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/media/nics/Data/SMARTS/smarts/core/remote_agent_buffer.py", line 109, in destroy
    raise e
  File "/media/nics/Data/SMARTS/smarts/core/remote_agent_buffer.py", line 104, in destroy
    remote_agent.terminate()
  File "/media/nics/Data/SMARTS/smarts/core/remote_agent.py", line 88, in terminate
    manager_pb2.Port(num=self._worker_address[1])
  File "/home/nics/venv/python37_smarts_1_5/lib/python3.7/site-packages/grpc/_channel.py", line 825, in __call__
    wait_for_ready, compression)
  File "/home/nics/venv/python37_smarts_1_5/lib/python3.7/site-packages/grpc/_channel.py", line 812, in _blocking
    ),), self._context)
  File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 498, in grpc._cython.cygrpc.Channel.segregated_call
  File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 353, in grpc._cython.cygrpc._segregated_call
  File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 357, in grpc._cython.cygrpc._segregated_call
ValueError: Cannot invoke RPC on closed channel!

And I have tested some other scenarios as following:

Might I request your help to fix it?

Gamenot · 2021-01-06T02:16:00Z

I have found another potential source of the crash when going through the crash report from running the example I provided. I am hoping we can do something about this without going into SUMO code.

StacktraceTop:
 MSLCHelper::getRoundaboutDistBonus(MSVehicle const&, double, MSVehicle::LaneQ const&, MSVehicle::LaneQ const&, MSVehicle::LaneQ const&) ()
 MSLCM_LC2013::_wantsChange(int, MSAbstractLaneChangeModel::MSLCMessager&, int, std::pair<MSVehicle*, double> const&, std::pair<MSVehicle*, double> const&, std::pair<MSVehicle*, double> const&, MSLane const&, std::vector<MSVehicle::LaneQ, std::allocator<MSVehicle::LaneQ> > const&, MSVehicle**, MSVehicle**) ()
 MSLaneChanger::checkChange(int, MSLane const*, std::pair<MSVehicle* const, double> const&, std::pair<MSVehicle* const, double> const&, std::pair<MSVehicle* const, double> const&, std::vector<MSVehicle::LaneQ, std::allocator<MSVehicle::LaneQ> > const&) const ()
 MSLaneChanger::checkChangeWithinEdge(int, std::pair<MSVehicle* const, double> const&, std::vector<MSVehicle::LaneQ, std::allocator<MSVehicle::LaneQ> > const&) const ()
 MSLaneChanger::change() ()
Tags: bionic third-party-packages
ThreadStacktrace:
 .
 Thread 1 (Thread 0x7fdb69499780 (LWP 14606)):
 #0  0x0000557a1be90151 in MSLCHelper::getRoundaboutDistBonus(MSVehicle const&, double, MSVehicle::LaneQ const&, MSVehicle::LaneQ const&, MSVehicle::LaneQ const&) ()
 No symbol table info available.
 #1  0x0000557a1be7d29e in MSLCM_LC2013::_wantsChange(int, MSAbstractLaneChangeModel::MSLCMessager&, int, std::pair<MSVehicle*, double> const&, std::pair<MSVehicle*, double> const&, std::pair<MSVehicle*, double> const&, MSLane const&, std::vector<MSVehicle::LaneQ, std::allocator<MSVehicle::LaneQ> > const&, MSVehicle**, MSVehicle**) ()
 No symbol table info available.
 #2  0x0000557a1bcdc041 in MSLaneChanger::checkChange(int, MSLane const*, std::pair<MSVehicle* const, double> const&, std::pair<MSVehicle* const, double> const&, std::pair<MSVehicle* const, double> const&, std::vector<MSVehicle::LaneQ, std::allocator<MSVehicle::LaneQ> > const&) const ()
 No symbol table info available.
 #3  0x0000557a1bcdd367 in MSLaneChanger::checkChangeWithinEdge(int, std::pair<MSVehicle* const, double> const&, std::vector<MSVehicle::LaneQ, std::allocator<MSVehicle::LaneQ> > const&) const ()
 No symbol table info available.
 #4  0x0000557a1bce0538 in MSLaneChanger::change() ()
 No symbol table info available.
 #5  0x0000557a1bcdae19 in MSLaneChanger::laneChange(long long) ()
 No symbol table info available.
 #6  0x0000557a1bcaad7c in MSEdgeControl::changeLanes(long long) ()
 No symbol table info available.
 #7  0x0000557a1bc07a8e in MSNet::simulationStep() ()
 No symbol table info available.
 #8  0x0000557a1bc080a6 in MSNet::simulate(long long, long long) ()
 No symbol table info available.
 #9  0x0000557a1bbf0c4d in main ()
 No symbol table info available.
Title: sumo crashed with SIGSEGV in MSLCHelper::getRoundaboutDistBonus()
UnreportableReason:
 You have some obsolete package versions installed. Please upgrade the following packages and check if the problem still occurs:
 
 libp11-kit0
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True

The cause looks like it might be from changes in the 1.7.0 release of SUMO. I am unsure why this getRoundaboutDistBonus() method is being called since there is not a roundabout in the loop scenario.

Gamenot · 2021-01-11T23:03:22Z

SUMO connection closed

Fix the sumo error logs (works with sumo-gui)
Tie SUMO version to the dependency list
Compile debug version of SUMO
Dev fork of SUMO (if problems are fixed)

Adaickalavan · 2021-01-25T10:47:28Z

Hi @JianmingTONG,

Given the occurrence of traci.exceptions.FatalTraCIError: connection closed by SUMO error, could you try running all your commands and experiments inside a docker container and report back here whether the error still occurs?

I think the error does not happen when SMARTS is run inside a docker container.

$ docker run --rm -it --network=host huaweinoah/smarts:v0.4.12

Do not map the source code using -v $PWD:/src when running the docker container.

dineshresearch · 2021-09-24T13:39:24Z

I am also currently facing the same issue. After 10 million training steps the training process is getting killed. I am using SMARTS 0.4.16 version

@Gamenot @Adaickalavan @JianmingTONG @BBDrive Is the issue fixed? If so can you please mention the pull request using which this issue is fixed?

Also moving to 0.4.18 version or any other branch solve this issue? If so you can mention the branch that I can use

Adaickalavan · 2021-09-27T09:23:36Z

Hi @dineshresearch,

Unfortunately, the traci.exceptions.FatalTraCIError: connection closed by SUMO error which originates from SUMO, is not solved yet.

For the time being, if you do not need background traffic vehicles, you may consider setting traffic_sim=None when instantiating SMARTS. This sidesteps the error, but removes background traffic vehicles.

SMARTS/smarts/core/smarts.py

Lines 68 to 78 in e3681e7

    
           class SMARTS: 
        
               def __init__( 
        
                   self, 
        
                   agent_interfaces, 
        
                   traffic_sim: SumoTrafficSimulation, 
        
                   envision: EnvisionClient = None, 
        
                   visdom: VisdomClient = None, 
        
                   timestep_sec=0.1, 
        
                   reset_agents_only=False, 
        
                   zoo_addrs=None, 
        
               ):

Gamenot self-assigned this Dec 11, 2020

Gamenot mentioned this issue Dec 15, 2020

SMARTS Leaves Behind Zombie Processes #313

Closed

2 tasks

Gamenot added this to the 0.5 milestone Dec 16, 2020

Gamenot assigned Adaickalavan Dec 16, 2020

Gamenot added this to To do in SMARTS v0.4.10 Dec 16, 2020

Gamenot added this to To do in SMARTS v0.4.11 Dec 23, 2020

Adaickalavan closed this as completed Jan 4, 2021

Gamenot reopened this Jan 4, 2021

Gamenot added this to To do in SMARTS v0.4.12 Jan 4, 2021

Gamenot moved this from To do to In progress in SMARTS v0.4.12 Jan 11, 2021

Gamenot removed this from In progress in SMARTS v0.4.12 Jan 11, 2021

Gamenot added this to In progress in SMARTS v0.4.13 Jan 18, 2021

Adaickalavan removed this from To do in SMARTS v0.4.10 Jan 19, 2021

Adaickalavan removed this from To do in SMARTS v0.4.11 Jan 19, 2021

Gamenot modified the milestones: 0.5, Backlog Jan 27, 2021

This was linked to pull requests Mar 3, 2021

Bugtest SUMO Crashing on Turns #565

Closed

Bugtest sumo crash reproduction #619

Open

Gamenot mentioned this issue Nov 25, 2021

Graceful handle of TraCI connection errors #1138

Merged

Gamenot added this to To do in SMARTS v0.5.0 Dec 28, 2021

Gamenot removed this from In progress in SMARTS v0.4.13 Dec 28, 2021

Gamenot mentioned this issue Dec 28, 2021

Restore MARL Benchmark #1213

Closed

1 task

Gamenot moved this from To do to In Review in SMARTS v0.5.0 Dec 28, 2021

Gamenot moved this from In Review to Done in SMARTS v0.5.0 Jan 10, 2022

Gamenot closed this as completed Jan 10, 2022

Gamenot linked a pull request Jan 11, 2022 that will close this issue

[Bugfix] Secondary TraCI error #1235

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connection closed by SUMO #295

connection closed by SUMO #295

BBDrive commented Dec 10, 2020 •

edited

Loading

Gamenot commented Dec 11, 2020

BBDrive commented Dec 14, 2020

BBDrive commented Dec 14, 2020

Gamenot commented Dec 15, 2020

Gamenot commented Dec 15, 2020

JianmingTONG commented Dec 22, 2020

Gamenot commented Dec 22, 2020

JianmingTONG commented Dec 23, 2020 •

edited

Loading

Adaickalavan commented Dec 23, 2020

JianmingTONG commented Jan 4, 2021

Gamenot commented Jan 4, 2021

Gamenot commented Jan 4, 2021 •

edited

Loading

Adaickalavan commented Jan 4, 2021 •

edited

Loading

Adaickalavan commented Jan 4, 2021 •

edited

Loading

JianmingTONG commented Jan 5, 2021 •

edited

Loading

Gamenot commented Jan 6, 2021 •

edited

Loading

Gamenot commented Jan 11, 2021

Adaickalavan commented Jan 25, 2021

dineshresearch commented Sep 24, 2021

Adaickalavan commented Sep 27, 2021

connection closed by SUMO #295

connection closed by SUMO #295

Comments

BBDrive commented Dec 10, 2020 • edited Loading

Gamenot commented Dec 11, 2020

BBDrive commented Dec 14, 2020

BBDrive commented Dec 14, 2020

Gamenot commented Dec 15, 2020

Gamenot commented Dec 15, 2020

JianmingTONG commented Dec 22, 2020

Gamenot commented Dec 22, 2020

JianmingTONG commented Dec 23, 2020 • edited Loading

Adaickalavan commented Dec 23, 2020

JianmingTONG commented Jan 4, 2021

Gamenot commented Jan 4, 2021

Gamenot commented Jan 4, 2021 • edited Loading

Adaickalavan commented Jan 4, 2021 • edited Loading

Adaickalavan commented Jan 4, 2021 • edited Loading

JianmingTONG commented Jan 5, 2021 • edited Loading

Gamenot commented Jan 6, 2021 • edited Loading

Gamenot commented Jan 11, 2021

SUMO connection closed

Adaickalavan commented Jan 25, 2021

dineshresearch commented Sep 24, 2021

Adaickalavan commented Sep 27, 2021

BBDrive commented Dec 10, 2020 •

edited

Loading

JianmingTONG commented Dec 23, 2020 •

edited

Loading

Gamenot commented Jan 4, 2021 •

edited

Loading

Adaickalavan commented Jan 4, 2021 •

edited

Loading

Adaickalavan commented Jan 4, 2021 •

edited

Loading

JianmingTONG commented Jan 5, 2021 •

edited

Loading

Gamenot commented Jan 6, 2021 •

edited

Loading