Merge branch 'release/v0.5-release'

leap-ec · Feb 3, 2023 · 071b14e · 071b14e
2 parents 7bc7ce6 + f254d96
commit 071b14e
Show file tree

Hide file tree

Showing 9 changed files with 144 additions and 87 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,6 +2,10 @@
 /local
 *.log
 *.pyc
+*.pdf
+*.pt
+*.gz
+*-ubyte
 __pycache__
 docker-nom/build/
 \#*\#
@@ -80,4 +84,4 @@ __MACOSX
 dask-worker-space/
 
 # Generated data
-*.csv
+*.csv
diff --git a/CHANGELOG b/CHANGELOG
@@ -1,3 +1,22 @@
+* `v0.5`, 1/27/23
+
+    Installed executable now `gremlin` instead of `gremlin.py`.  Compensated
+    for LEAP API changes. (Note that `gremlin` still depends on the LEAP
+    `develop` branch and not on the official LEAP `master` branch releases to
+    take advantage of more up to date LEAP features.)
+
+    Added optional `with_client` `async` config section for code to be executed
+    after Dask client is started.  This can be used to start worker plugins or
+    wait for a certain number of workers to become available.
+
+    `setup.py` now installs third party dependencies.  Please note that the
+    latest LEAP version in LEAP `develop` will have to be installed.
+
+    Now better catch exceptions in LEAP code such that any errors that
+    propagate from there don't silently kill Gremlin.
+
+    Made a number of minor bug fixes and code format changes.
+
 * `v0.4`, 9/30/22
 
     Replaced `imports` with `preamble` in YAML config files thus giving more
@@ -11,19 +30,19 @@
 
 * `v0.3`, 3/9/22
 
-  Add support for config variable `algorithm` that denotes if using a
-  traditional by-generation EA or an asynchronous steady-state EA
+    Add support for config variable `algorithm` that denotes if using a
+    traditional by-generation EA or an asynchronous steady-state EA
 
 * `v0.2dev`, 2/17/22
 
-  Revamped config system and heavily refactored/simplified code
+    Revamped config system and heavily refactored/simplified code
 
 * Version `0.1dev` Migrated to github, 10/14/2021
 
-  This version moved from code.ornl.gov repository to github to facilitate
-  use as an open-source project.
+    This version moved from code.ornl.gov repository to github to facilitate
+    use as an open-source project.
 
 * Version `0.0` Migrated from internal repository, 7/13/2021
 
-  Migrated from internal git repository to code.ornl.gov, and generalized
-  source to be more readily applicable to new problems.
+    Migrated from internal git repository to code.ornl.gov, and generalized
+    source to be more readily applicable to new problems.
diff --git a/README.md b/README.md
@@ -11,17 +11,19 @@ perform better for those sets.
 ![2022 R&D 100 Award Winner](RD100_2022_Winner_Logo-small.png) Gremlin is a [2022 R&D 100 Award Winner!](https://www.rdworldonline.com/rd-100-winners-for-2022-are-announced/)
 
 ## Requires
-* Python 3.[78]
-* [LEAP https://github.com/AureumChaos/LEAP](https://github.com/AureumChaos/LEAP)
+* Python >= 3.7.0
+* [LEAP https://github.com/AureumChaos/LEAP/tree/develop](https://github.
+  com/AureumChaos/LEAP/tree/develop) -- **Note that this is for the LEAP
+  `develop` branch.
 
 ## Installation
 
 1. Activate your conda or virtual environment
-2. cd into top-level gremlin directory
+2. `cd` into top-level gremlin directory
 3. `pip install .`
 
 ## Configuration
-Gremlin is essentially a thin convenience wrapper around [LEAP]
+Gremlin is a thin convenience wrapper around [LEAP]
 (https://github.com/AureumChaos/LEAP).  Instead of writing a script in LEAP, 
 one would instead point the `gremlin` executable at a YAML file that describes 
 what LEAP classes, subclasses, and functions to use, as well as other salient 
@@ -42,7 +44,7 @@ async: # parameters for asynchronous steady-state EA
   ind_file_probe: probe.log_ind # optional functor or function for writing ind_file
 
 pop_file: pop.csv # where we will write out each generation in CSV format
-problem: problem.QLearnerBalanceProblem("${env:GREMLIN_QLEARNER_CARTPOLE_MODEL_FPATH}")
+problem: problem.QLearnerBalanceProblem("${oc.env:GREMLIN_QLEARNER_CARTPOLE_MODEL_FPATH}")
 representation: representation.BalanceRepresentation()
 preamble: |
   import probe # need to import our probe.py so that LEAP sees our probe pipeline operator
@@ -86,8 +88,21 @@ This can be run simply by (must be in `examples/MNIST` directory):
 $ gremlin config.yml
 ```
 
+## Documentation
+The [wiki](https://github.com/markcoletti/gremlin/wiki) has more detailed 
+documentation, particularly on how the YAML config files can be set up for 
+Gremlin runs.
+
 ## Versions
 
+Note that more detailed explanations for version changes can be found in the 
+`CHANGELOG`.
+
+* `v0.6`, in progress on `develop`
+* `v0.5`, 2/3/23
+  * Main installed executable now `gremlin` and not `gremlin.py`. Added 
+    optional `async.with_client` config section. Improvements made to `setup.
+    py`.
 * `v0.4`, 9/30/22
   * Added config variable `async.with_client` that allows for interacting 
     with Dask before the EA runs; e.g., `client.wait_for_workers()` or 
@@ -110,6 +125,5 @@ $ gremlin config.yml
 
 ## Main web site
 
-The `gremlin` github repository is [https://github.com/markcoletti/gremlin]
-(https://github.com/markcoletti/gremlin).  `main` is the release branch and 
-active work occurs on the `develop` branch.
+The `gremlin` github repository is https://github.com/markcoletti/gremlin.  
+`main` is the release branch and active work occurs on the `develop` branch.
diff --git a/examples/MNIST/broken.yml b/examples/MNIST/broken.yml
@@ -0,0 +1,4 @@
+# A configuration that intentionally breaks a gremlin run to observe how
+# it handles errors/exceptions.
+preamble: |
+  import doesnotexist
diff --git a/examples/MNIST/config_debug.yml b/examples/MNIST/config_debug.yml
@@ -9,19 +9,6 @@
 # Usage:
 #     $ gremlin.py config.yml config_debug.yml
 pop_size: 1
-algorithm: bygen
 bygen: # parameters that only make sense for a by-generation EA
   max_generations: 1
   k_elites: 0
-problem: problem.MNISTProblem()
-representation: representation.MNISTRepresentation()
-pop_file: pop.csv # where we will write out each generation in CSV format
-preamble: |
-  import probe # need to import our probe.py so that LEAP sees our probe pipeline operator
-pipeline:
-  - ops.tournament_selection
-  - ops.clone
-  - mutate_randint(expected_num_mutations=1, bounds=representation.MNISTRepresentation.genome_bounds)
-  - ops.evaluate
-  - probe.IndividualProbeCSV('inds.csv') # our own probe to see every single created offspring
-  - ops.pool(size=${pop_size})
diff --git a/gremlin/__init__.py b/gremlin/__init__.py
diff --git a/gremlin/__version__.py b/gremlin/__version__.py
@@ -1 +1 @@
-__version__ = 'v0.4'
+__version__ = 'v0.5'
diff --git a/gremlin/gremlin.py b/gremlin/gremlin.py
@@ -27,6 +27,7 @@
 
 from omegaconf import OmegaConf
 
+import rich
 from rich.logging import RichHandler
 
 # Create unique logger for this namespace
@@ -43,9 +44,13 @@
 
 pretty.install()
 
-from rich.traceback import install
+rich.traceback.install(show_locals=True)
+
+from rich.console import Console
+console = Console()
+
+
 
-install()
 
 from distributed import Client, LocalCluster
 
@@ -253,7 +258,7 @@ def run_async_ea(pop_size, init_pop_size, max_births, problem, representation,
             client.register_worker_plugin(WorkerLoggerPlugin())
 
             final_pop = asynchronous.steady_state(client,
-                                                  births=max_births,
+                                                  max_births=max_births,
                                                   init_pop_size=init_pop_size,
                                                   pop_size=pop_size,
 
@@ -285,7 +290,7 @@ def run_async_ea(pop_size, init_pop_size, max_births, problem, representation,
             client.register_worker_plugin(WorkerLoggerPlugin())
 
             final_pop = asynchronous.steady_state(client,
-                                                  births=max_births,
+                                                  max_births=max_births,
                                                   init_pop_size=init_pop_size,
                                                   pop_size=pop_size,
 
@@ -302,7 +307,7 @@ def run_async_ea(pop_size, init_pop_size, max_births, problem, representation,
             print([str(x) for x in final_pop])
 
 
-if __name__ == '__main__':
+def main():
     logger.info('Gremlin started')
 
     parser = argparse.ArgumentParser(
@@ -332,50 +337,68 @@ def run_async_ea(pop_size, init_pop_size, max_births, problem, representation,
 
     pop_size = int(config.pop_size)
 
-    if config.algorithm == 'async':
-        logger.debug('Using async EA')
-
-        scheduler_file = None if 'scheduler_file' not in config['async'] else \
-        config['async'].scheduler_file
-
-        ind_file = None if 'ind_file' not in config['async'] else \
-            config['async'].ind_file
-
-        ind_file_probe = None if 'ind_file_probe' not in config['async'] else \
-            config['async'].ind_file_probe
-
-        # This is for optional code to be executed after the Dask client has
-        # been established, but before execution of the EA.  This allows for
-        # things like client.wait_for_workers() or client.upload_file() or the
-        # registering of dask plugins.  This is a string that will be `exec()`
-        # later after a dask client has been connected.
-        with_client_exec_str = None if 'with_client' not in config['async'] else \
-            config['async'].with_client
-
-        run_async_ea(pop_size,
-                     int(config['async'].init_pop_size),
-                     int(config['async'].max_births),
-                     problem, representation, pipeline,
-                     config.pop_file,
-                     ind_file,
-                     ind_file_probe,
-                     scheduler_file,
-                     with_client_exec_str)
-    elif config.algorithm == 'bygen':
-        # default to by generation approach
-        logger.debug('Using by-generation EA')
-
-        # Then run leap_ec.generational_ea() with those classes while writing
-        # the output to CSV and other, ancillary files.
-        max_generations = int(config.bygen.max_generations)
-        k_elites = int(config.bygen.k_elites) if 'k_elites' in config else 1
-
-        run_generational_ea(pop_size, max_generations, problem, representation,
-                            pipeline,
-                            config.pop_file, k_elites,
-                            with_client_exec_str)
-    else:
-        logger.critical(f'Algorithm type {config.algorithm} not supported')
-        sys.exit(1)
+    try:
+        if config.algorithm == 'async':
+            logger.debug('Using async EA')
+
+            scheduler_file = None if 'scheduler_file' not in config['async'] else \
+            config['async'].scheduler_file
+
+            ind_file = None if 'ind_file' not in config['async'] else \
+                config['async'].ind_file
+
+            ind_file_probe = None if 'ind_file_probe' not in config['async'] else \
+                config['async'].ind_file_probe
+
+            # This is for optional code to be executed after the Dask client has
+            # been established, but before execution of the EA.  This allows for
+            # things like client.wait_for_workers() or client.upload_file() or the
+            # registering of dask plugins.  This is a string that will be `exec()`
+            # later after a dask client has been connected.
+            # TODO generalize this to be algorithm agnostic in config file
+            with_client_exec_str = None if 'with_client' not in config['async'] else \
+                config['async'].with_client
+
+            run_async_ea(pop_size,
+                         int(config['async'].init_pop_size),
+                         int(config['async'].max_births),
+                         problem, representation, pipeline,
+                         config.pop_file,
+                         ind_file,
+                         ind_file_probe,
+                         scheduler_file,
+                         with_client_exec_str)
+        elif config.algorithm == 'bygen':
+            # default to by generation approach
+            logger.debug('Using by-generation EA')
+
+            # Then run leap_ec.generational_ea() with those classes while writing
+            # the output to CSV and other, ancillary files.
+            max_generations = int(config.bygen.max_generations)
+            k_elites = int(config.bygen.k_elites) if 'k_elites' in config else 1
+
+            # This is for optional code to be executed after the Dask client has
+            # been established, but before execution of the EA.  This allows for
+            # things like client.wait_for_workers() or client.upload_file() or the
+            # registering of dask plugins.  This is a string that will be `exec()`
+            # later after a dask client has been connected.
+            # TODO LEAP does not (yet) support Dask for by-generation. Soon!
+            # with_client_exec_str = None if 'with_client' not in config['bygen'] else \
+            #     config['bygen'].with_client
+
+            run_generational_ea(pop_size, max_generations, problem, representation,
+                                pipeline,
+                                config.pop_file, k_elites)
+        else:
+            logger.critical(f'Algorithm type {config.algorithm} not supported')
+            sys.exit(1)
+    except Exception as e:
+        logger.critical(f'Caught {e!s} during run.  Exiting.')
+        console.print_exception()
 
     logger.info('Gremlin finished.')
+
+
+
+if __name__ == '__main__':
+    main()
diff --git a/setup.py b/setup.py
@@ -11,17 +11,23 @@
     version=__version__,
     packages=['gremlin'],
     scripts=['gremlin/gremlin.py'],
-    # entry_points={
-    #     'console_scripts': [
-    #         'gremlin = gremlin.gremlin:client'
-    #     ],
-    # },
+    python_requires=">=3.7.0",
     url='https://github.com/markcoletti/gremlin',
     license='MIT License',
     author='Mark Coletti',
     author_email='colettima@ornl.gov',
     long_description=long_description,
     long_description_content_type='text/markdown',
-    description=('Adversarial evolutionary algorithm for'
-                 'training data optimization')
+    description=('Adversarial evolutionary algorithm for training data '
+                 'optimization'),
+    entry_points={
+        "console_scripts": [
+            "gremlin = gremlin:main"
+        ]
+    },
+    install_requires=[
+        'leap-ec',
+        'omegaconf',
+        'tqdm',
+        'rich']
 )