Should I run setup again? #10

wn4github · 2021-03-02T02:18:28Z

I have git cloned the repository and run ./setup.py install and ./setup-dataset.sh, but then I realized train_all.sh was not present. Later I found it in the 4.1.3 release. Do I need to set up once again in the 4.1.3 directory or just copy train_all.sh, train_others.sh and other script files? Thank you.

The text was updated successfully, but these errors were encountered:

guicho271828 · 2021-03-02T14:48:14Z

I don't think regenerating the dataset is necessary.
Regarding setup.py, git diff HEAD..refs/tags/4.1.3 then if there are no diff it should be good to go

wn4github · 2021-03-03T09:35:43Z

Thank you for the suggestion. But in the end, I couldn't get the code to run due to the incompatibility issues between nvidia-tensorflow 1.15 and keras. I have to use the nvidia-tensorflow because RTX 30 series do not support CUDA 10 while prebuilt tensorflow 1.x does not support CUDA 11.

I'm wondering if you are currently porting the code to PyTorch?

guicho271828 · 2021-03-03T16:56:08Z

yes, that aspect is also something I am struggling with. My lab cluster is also transitioning to CUDA11, so I am attempting to rewrite part of the code, but the development is slow.

guicho271828 · 2021-03-03T16:57:06Z

you could try to build tf 1.15 for cuda 11.

guicho271828 · 2021-03-03T17:34:35Z

I just learned that NVIDIA (not Google) provides a backward-compatible version of tensorflow 1.15 that works on cuda 11.
tensorflow/tensorflow#43629
https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/
It seems its package name is nvidia-tensorflow.

wn4github · 2021-03-04T10:59:08Z

Thank you so much for you help, Asaiさん.

I too discovered the nvidia-tensorflow could work with CUDA 11 and I managed to get a working tf 1.15 with

nvidia-tensorflow 1.15.4+nv20.12
Keras 2.2.5
keras-adabound 0.6.0
Keras-Applications 1.0.8

Now I am using the 4.1.3 version from the release source code, and running ./setup-dataset.sh is successful. However, I notice the difference of setup-dataset.sh between 4.1.3 and the latest commit: In 4.1.3, no .npz files are downloaded.

Anyway the problem I run into now is the error when executing ./train_all.sh. I only uncomment the first line for training: task-planning learn_plot_dump_summary. The trace is as follows

Fancy Traceback (most recent call last):
  File ./strips.py line 294 function <module> : main()
                    mode = 'learn_plot_dump_summary'
                sae_path = 'puzzle_mnist_3_3_5000_None_None_None_False_ConcreteDetNormalizedLogitAddEffectTransitionAE_planning'
      default_parameters = {'epoch': 200, 'batch_size': 500, 'optimizer': 'radam', 'max_temperature': 5.0, 'min_temperature': 0.7, 'M': 2, 'train_gumbel': True, 'train_softmax': True, 'test_gumbel': False, 'test_softmax': False, 'locality': 0.0, 'locality_delay': 0.0, 'aeclass': 'ConcreteDetNormalizedLogitAddEffectTransitionAE'}
              parameters = {'beta': [-0.3, -0.1, 0.0, 0.1, 0.3], 'lr': [0.1, 0.01, 0.001], 'N': [100, 200, 500, 1000], 'M': [2], 'layer': [1000], 'clayer': [16], 'dropout': [0.4], 'noise': [0.4], 'dropout_z': [False], 'activation': ['relu'], 'num_actions': [100, 200, 400, 800, 1600], 'aae_width': [100, 300, 600], 'aae_depth': [0, 1, 2], 'aae_activation': ['relu', 'tanh'], 'aae_delay': [0], 'direct': [0.1, 1.0, 10.0], 'direct_delay': [0.05, 0.1, 0.2, 0.3, 0.5], 'zerosuppress': [0.1, 0.2, 0.5], 'zerosuppress_delay': [0.05, 0.1, 0.2, 0.3, 0.5], 'loss': ['BCE'], 'type': ['mnist'], 'width': [3], 'height': [3], 'num_examples': [5000], 'stop_gradient': [False], 'aeclass': ['ConcreteDetNormalizedLogitAddEffectTransitionAE'], 'comment': ['planning']}

  File ./strips.py line 290 function main : globals()[task](*map(myeval,sys.argv))
                    task = 'puzzle'

  File ./strips.py line 208 function puzzle : show_summary(ae, train, test)
                    type = 'mnist'
                   width = 3
                  height = 3
            num_examples = 5000
                       N = None
             num_actions = None
                  direct = None
           stop_gradient = False
                 aeclass = 'ConcreteDetNormalizedLogitAddEffectTransitionAE'
                 comment = 'planning'
                    name = 'comment'
                   value = 'planning'
                    path = '/home/wn/workspace/latplan/latplan-4.1.3_original/latplan/puzzles/puzzle-mnist-3-3.npz'
                    data = <numpy.ndarray float32  (5000, 2, 42, 42)>
             pre_configs = <numpy.ndarray float64  (5000, 9)>
             suc_configs = <numpy.ndarray float64  (5000, 9)>
                    pres = <numpy.ndarray float32  (5000, 42, 42)>
                    sucs = <numpy.ndarray float32  (5000, 42, 42)>
             transitions = <numpy.ndarray float32  (2, 5000, 42, 42)>
                  states = <numpy.ndarray float32  (10000, 42, 42)>
                   train = <numpy.ndarray float32  (4500, 2, 42, 42)>
                     val = <numpy.ndarray float32  (250, 2, 42, 42)>
                    test = <numpy.ndarray float32  (250, 2, 42, 42)>
                      ae = None

  File ./strips.py line 180 function show_summary : ae.summary()
                      ae = None
                   train = <numpy.ndarray float32  (4500, 2, 42, 42)>
                    test = <numpy.ndarray float32  (250, 2, 42, 42)>

AttributeError: 'NoneType' object has no attribute 'summary'

I am also considering to use the trained weights directly if training cannot be done, but I need some guidance.

guicho271828 · 2021-03-04T21:05:44Z

Now I am using the 4.1.3 version from the release source code, and running ./setup-dataset.sh is successful. However, I notice the difference of setup-dataset.sh between 4.1.3 and the latest commit: In 4.1.3, no .npz files are downloaded.

setup-dataset also downloads unrelated npz files that are not used in ijcai paper (but are used on other papers). Sorry for this confusion, this is because this entire repository is a kind of my "lab environment" which sets up everything I use for all of my papers. The failed ones for photorealistic-blocksworld are not used, so no worries. Instead, all datasets needed for reproducing the ijcai paper are rendered locally using a script included in this repo.

Anyway the problem I run into now is the error when executing ./train_all.sh.

Since you already have the trained weights, running this script is not necessary. All results including the csv dump and the PDDL domain file is included in the archive.

AttributeError: 'NoneType' object has no attribute 'summary'

Here is what is happening: task-planning learn_plot_dump_summary tries to run the training. However, since samples/*/grid_search.log already has more entires than the specified limit of hyperparameter configurations (300), it did not run the training. Thus the model instance (ae) is None.

If you want to regenerate the reconstructions etc., then task-planning plot_dump_summary would load the stored weights and produce a reconstruction plot and dump several files necessary for generating pddl files. Be sure that the archive is decompressed in a correct directory. It should be made so that samples/ directory is in the root of the repository.

The hyperparameter search is completely parallelized in the process level. So, if you have an 8-core 8-gpu machine, just run 8 processes in parallel.

guicho271828 · 2021-03-04T21:11:05Z

If you do want to train the model, you may also want to prune some hyperparameters by looking at samples/*/grid_search.log. For example, this one is the best hyperparameter for mandrill 15-puzzle. Then you can edit strips.py and edit the dictionary.

wn4github · 2021-03-08T01:51:59Z

My immediate goal is to use Cube-space AE to encode some MNIST 8-puzzle images. Then to better understand Latplan, I am planning to train the network, get my hands dirty with the implementation. But this is out of the topic of the issue, maybe I should open a new one. Thank you for all your help.

wn4github closed this as completed Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should I run setup again? #10

Should I run setup again? #10

wn4github commented Mar 2, 2021

guicho271828 commented Mar 2, 2021

wn4github commented Mar 3, 2021

guicho271828 commented Mar 3, 2021

guicho271828 commented Mar 3, 2021

guicho271828 commented Mar 3, 2021

wn4github commented Mar 4, 2021

guicho271828 commented Mar 4, 2021 •

edited

guicho271828 commented Mar 4, 2021 •

edited

wn4github commented Mar 8, 2021

Should I run setup again? #10

Should I run setup again? #10

Comments

wn4github commented Mar 2, 2021

guicho271828 commented Mar 2, 2021

wn4github commented Mar 3, 2021

guicho271828 commented Mar 3, 2021

guicho271828 commented Mar 3, 2021

guicho271828 commented Mar 3, 2021

wn4github commented Mar 4, 2021

guicho271828 commented Mar 4, 2021 • edited

guicho271828 commented Mar 4, 2021 • edited

wn4github commented Mar 8, 2021

guicho271828 commented Mar 4, 2021 •

edited

guicho271828 commented Mar 4, 2021 •

edited