Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import hangs with Gym >0.21 #20

Open
callumtilbury opened this issue Feb 24, 2023 · 2 comments
Open

Import hangs with Gym >0.21 #20

callumtilbury opened this issue Feb 24, 2023 · 2 comments

Comments

@callumtilbury
Copy link

callumtilbury commented Feb 24, 2023

When installing lb-foraging with certain versions of gym, the import hangs for a (very) long time.

Consider a simple script, import_tests.py:

from time import time
start = time()
import lbforaging
print(f"time = {time() - start:.3f}s")

With gym==0.21.*:

$ python import_tests.py
time = 0.650s

vs. with gym==0.22.*:

$ python import_tests.py
time = 1188.446s

This massive bottleneck is caused by the way gym registers environments from v0.22 onwards. Consider the gym/envs/registrations.py::EnvRegistry::register() method in v0.21:

https://github.com/openai/gym/blob/c755d5c35a25ab118746e2ba885894ff66fb8c43/gym/envs/registration.py#L205-L217

    def register(self, id, **kwargs):
        if self._ns is not None:
            if "/" in id:
                namespace, id = id.split("/")
                logger.warn(
                    f"Custom namespace '{namespace}' is being overrode by namespace '{self._ns}'. "
                    "If you are developing a plugin you shouldn't specify a namespace in `register` calls. "
                    "The namespace is specified through the entry point key."
                )
            id = f"{self._ns}/{id}"
        if id in self.env_specs:
            logger.warn("Overriding environment {}".format(id))
        self.env_specs[id] = EnvSpec(id, **kwargs)

See how this is different in v0.22:

https://github.com/openai/gym/blob/95063a08943e1f587c58be7435d94f81ccac8fd9/gym/envs/registration.py#L542-L596

    def register(self, id: str, **kwargs) -> None:
        spec = EnvSpec(id, **kwargs)


        if self._ns is not None:
            if spec.namespace is not None:
                logger.warn(
                    f"Custom namespace `{spec.namespace}` is being overridden "
                    f"by namespace `{self._ns}`. If you are developing a "
                    "plugin you shouldn't specify a namespace in `register` "
                    "calls. The namespace is specified through the "
                    "entry point package metadata."
                )
            # Replace namespace
            spec.namespace = self._ns


        try:
            # Get all versions of this spec.
            versions = self.env_specs.versions(spec.namespace, spec.name)


            # We raise an error if the user is attempting to initialize an
            # unversioned environment when a versioned one already exists.
            latest_versioned_spec = max(
                filter(lambda spec: isinstance(spec.version, int), versions),
                key=lambda spec: cast(int, spec.version),
                default=None,
            )
            unversioned_spec = next(
                filter(lambda spec: spec.version is None, versions), None
            )


            # Trying to register an unversioned spec when versioned spec exists
            if unversioned_spec and spec.version is not None:
                message = (
                    "Can't register the versioned environment "
                    f"`{spec.id}` when the unversioned environment "
                    f"`{unversioned_spec.id}` of the same name already exists."
                )
                raise error.RegistrationError(message)
            elif latest_versioned_spec and spec.version is None:
                message = (
                    f"Can't register the unversioned environment `{spec.id}` "
                    f"when version `{latest_versioned_spec.version}` "
                    "of the same name already exists. Note: the default "
                    "behavior is that the `gym.make` with the unversioned "
                    "environment will return the latest versioned environment."
                )
                raise error.RegistrationError(message)
        # We might not find this namespace or name in which case
        # we should continue to register the environment.
        except (error.NamespaceNotFound, error.NameNotFound):
            pass
        finally:
            if spec.id in self.env_specs:
                logger.warn(f"Overriding environment {id}")
            self.env_specs[spec.id] = spec

Specifically, the issue here is the addition of:
https://github.com/openai/gym/blob/95063a08943e1f587c58be7435d94f81ccac8fd9/gym/envs/registration.py#L559

            versions = self.env_specs.versions(spec.namespace, spec.name)

which calls
https://github.com/openai/gym/blob/95063a08943e1f587c58be7435d94f81ccac8fd9/gym/envs/registration.py#L220

        self._assert_name_exists(namespace, name)

which eventually hits:
https://github.com/openai/gym/blob/95063a08943e1f587c58be7435d94f81ccac8fd9/gym/envs/registration.py#L293

            suggestions = difflib.get_close_matches(name, self.names(namespace), n=1)

Herein lies the problem: lbforaging attempts to register a large number of environments (9720 by default). With each new environment, gym checks whether any environment has already been registered with a similar name. Naturally, this fuzzy match scales really badly with a large number of environments being registered.

v0.23 of gym takes the same approach to registration as v0.22, and thus faces the same issue. In v0.24-v0.26, the registration no longer looks for a fuzzy match (with difflib), but there is still a bottleneck due to an iteration over all registered envs:
https://github.com/openai/gym/blob/dcd185843a62953e27c2d54dc8c2d647d604b635/gym/envs/registration.py#L379-L403

def _check_spec_register(spec: EnvSpec):
    """Checks whether the spec is valid to be registered. Helper function for `register`."""
    global registry
    latest_versioned_spec = max(
        (
            spec_
            for spec_ in registry.values()
            if spec_.namespace == spec.namespace
            and spec_.name == spec.name
            and spec_.version is not None
        ),
        key=lambda spec_: int(spec_.version),  # type: ignore
        default=None,
    )


    unversioned_spec = next(
        (
            spec_
            for spec_ in registry.values()
            if spec_.namespace == spec.namespace
            and spec_.name == spec.name
            and spec_.version is None
        ),
        None,
    )

thus still yielding bad performance (definitely not as bad as v0.22 & v0.23 though):

$ python import_tests.py
time = 8.575s

I'm not sure what the solution for lbforaging is here?

  • Advise users that gym v0.22 & v0.23 are incompatible
  • Don't pre-register loads of environment configs. Perhaps do a lazy registration only when user asks for a given set-up? Not sure if/why this isn't a preferred approach.

Note: I am aware that gym has officially moved to https://github.com/Farama-Foundation/Gymnasium, but the issue remains there with gymnasium v0.27:
https://github.com/Farama-Foundation/Gymnasium/blob/6f35e7f87fc5b455b8cc70e366016c463fa52850/gymnasium/envs/registration.py#L298-L321

@callumtilbury
Copy link
Author

PS: GitHub Markdown should renders code blocks that are linked from GitHub itself, but that isn't working here (weirdly). I've pasted the code manually.

@mateuslevisf
Copy link

+1. Basically makes the environment unusable unless you fork the repo and work from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants