Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign requires shapes of both tensors to match #153

Closed
scottkelso opened this issue Jul 21, 2018 · 4 comments
Closed

Assign requires shapes of both tensors to match #153

scottkelso opened this issue Jul 21, 2018 · 4 comments

Comments

@scottkelso
Copy link

Description

Experiencing tensorflow.python.framework.errors_impl.InvalidArgumentError when running make eval_onelayer from commit cc17326 (July 18th) onwards. I have confirmed that the error does not occur on 1fd5e3e (July 17th) or before.

Environment

  • Git commit hash cc17326
  • Docker version 18.03.1-ce
  • Ubuntu 18.04

Output

The following output is a result of running a script which runs train_onelayer on a bunch of node extracted IOT pcaps and then an eval_onelayer on some similar node extracted pcaps. The above InvalidArgumentError happens in the evaluation step. (Second code snippet)

I have also included the training step (first snippet) because there is the following warning Warning: The least populated class in y has only 3 members, which is too few. The minimum number of groups for any class cannot be less than n_splits=5. I have got this before a number of times and assumed it was something to do with not having enough data. Do you think this it at any way related to this error?

Training

sh ../onelayer.sh

Sending build context to Docker daemon  578.4MB
Step 1/8 : FROM debian:stretch-slim
---> 3e235dbb0ba6
Step 2/8 : LABEL maintainer="Charlie Lewis <clewis@iqt.org>"
---> Using cache
---> 4c4a0b3a3e95
Step 3/8 : ENV BUILD_PACKAGES="        build-essential         linux-headers-4.9         python3-dev         cmake         tcl-dev         xz-utils         zlib1g-dev         git         curl"     APT_PACKAGES="        ca-certificates         openssl         python3         python3-pip         tcpdump"     PYTHON_VERSION=3.6.4     PATH=/usr/local/bin:$PATH     PYTHON_PIP_VERSION=9.0.1     LANG=C.UTF-8
---> Using cache
---> f21017329ec4
Step 4/8 : COPY requirements.txt requirements.txt
---> Using cache
---> 25674e6a26bd
Step 5/8 : RUN set -ex;     apt-get update -y;     apt-get upgrade -y;     apt-get install -y --no-install-recommends ${APT_PACKAGES};     apt-get install -y --no-install-recommends ${BUILD_PACKAGES};     ln -s /usr/bin/idle3 /usr/bin/idle;     ln -s /usr/bin/pydoc3 /usr/bin/pydoc;     ln -s /usr/bin/python3 /usr/bin/python;     ln -s /usr/bin/python3-config /usr/bin/python-config;     ln -s /usr/bin/pip3 /usr/bin/pip;     pip install -U -v setuptools wheel;     pip install -U -v -r requirements.txt;     apt-get remove --purge --auto-remove -y ${BUILD_PACKAGES};     apt-get clean;     apt-get autoclean;     apt-get autoremove;     rm -rf /tmp/* /var/tmp/*;     rm -rf /var/lib/apt/lists/*;     rm -f /var/cache/apt/archives/*.deb         /var/cache/apt/archives/partial/*.deb         /var/cache/apt/*.bin;     find /usr/lib/python3 -name __pycache__ | xargs rm -r;     rm -rf /root/.[acpw]*
---> Using cache
---> 71f74155053a
Step 6/8 : COPY . /poseidonml
---> e43665f19d7f
Step 7/8 : WORKDIR /poseidonml
Removing intermediate container 72a08abbe52a
---> e561ad0cb2fb
Step 8/8 : RUN pip uninstall -y poseidonml && pip install .
---> Running in 6fb4d2853c7a
Uninstalling poseidonml-0.1.4:
 Successfully uninstalled poseidonml-0.1.4
Processing /poseidonml
Requirement already satisfied: numpy==1.14.5 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: pika==0.12.0 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: redis==2.10.6 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: scikit-learn==0.18.2 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: scipy==1.1.0 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: tensorflow==1.9.0 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: protobuf>=3.4.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Collecting setuptools<=39.1.0 (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
 Downloading https://files.pythonhosted.org/packages/8c/10/79282747f9169f21c053c562a0baa21815a8c7879be97abd930dbcf862e8/setuptools-39.1.0-py2.py3-none-any.whl (566kB)
Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: gast>=0.2.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: absl-py>=0.1.6 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: tensorboard<1.10.0,>=1.9.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: astor>=0.6.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.5/dist-packages (from tensorboard<1.10.0,>=1.9.0->tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: werkzeug>=0.11.10 in /usr/local/lib/python3.5/dist-packages (from tensorboard<1.10.0,>=1.9.0->tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Installing collected packages: poseidonml, setuptools
 Running setup.py install for poseidonml: started
   Running setup.py install for poseidonml: finished with status 'done'
 Found existing installation: setuptools 40.0.0
   Uninstalling setuptools-40.0.0:
     Successfully uninstalled setuptools-40.0.0
Successfully installed poseidonml-0.1.5.dev0 setuptools-39.1.0
Removing intermediate container 6fb4d2853c7a
---> aa5fbe6644c0
Successfully built aa5fbe6644c0
Successfully tagged cyberreboot/poseidonml:base
~/workspace/PoseidonML/DeviceClassifier/OneLayer ~/workspace/PoseidonML
Sending build context to Docker daemon  209.9kB
Step 1/6 : FROM cyberreboot/poseidonml:base
---> aa5fbe6644c0
Step 2/6 : LABEL maintainer="Charlie Lewis <clewis@iqt.org>"
---> Running in 0be06a51cf98
Removing intermediate container 0be06a51cf98
---> 608b9fd721dc
Step 3/6 : COPY . /OneLayer
---> f81ecbea53ab
Step 4/6 : COPY models /models
---> 23be2cd4b02f
Step 5/6 : WORKDIR /OneLayer
Removing intermediate container 361e5f27c872
---> 3df5d8c3ed66
Step 6/6 : ENTRYPOINT ["python3", "eval_OneLayer.py"]
---> Running in 18594406f9ca
Removing intermediate container 18594406f9ca
---> 8f3a2444b630
Successfully built 8f3a2444b630
Successfully tagged poseidonml:onelayer
~/workspace/PoseidonML
Running OneLayer Train on PCAP files ~/workspace/traffic/SanitizeNodes/Training/
Reading data
Reading /pcaps/TribySpeaker-160925-7-18b79e022044.pcap as TribySpeaker
Reading /pcaps/SmartBabyMonitor-161003-2-0024e41118a8.pcap as SmartBabyMonitor
Reading /pcaps/TPLinkRouterBridgeLAN-160923-5-14cc205133ea.pcap as TPLinkRouterBridgeLAN
...
Reading /pcaps/SamsungGalaxyTab-160924-4-0821ef3bfce3.pcap as SamsungGalaxyTab
Reading /pcaps/TPLinkRouterBridgeLAN-160929-7-14cc205133ea.pcap as TPLinkRouterBridgeLAN
Reading /pcaps/NESTProtectSmokeAlarm-161009-7-18b43025bee4.pcap as NESTProtectSmokeAlarm
Making data splits
Normalizing features
Doing feature selection
/usr/local/lib/python3.5/dist-packages/sklearn/utils/__init__.py:54: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int64 == np.dtype(int).type`.
 if np.issubdtype(mask.dtype, np.int):
/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_split.py:581: Warning: The least populated class in y has only 3 members, which is too few. The minimum number of groups for any class cannot be less than n_splits=5.
 % (min_groups, self.n_splits)), Warning)
...
/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_split.py:581: Warning: The least populated class in y has only 3 members, which is too few. The minimum number of groups for any class cannot be less than n_splits=5.
 % (min_groups, self.n_splits)), Warning)
/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_split.py:581: Warning: The least populated class in y has only 3 members, which is too few. The minimum number of groups for any class cannot be less than n_splits=5.
 % (min_groups, self.n_splits)), Warning)
[0, 67, 68, 123, 443, 1024, 1046, 1077, 1091, 1092, 1104, 1147, 1467, 1489, 2017, 2048, 2070, 2101, 2115, 2116, 2128, 2185, 2491, 2595, 3041, 3072, 3125, 3139, 3140, 3152, 3195, 3209, 3515, 3537, 3618, 4065, 4096, 4097, 4098, 4099, 4100, 4101, 4102, 4103]
/usr/local/lib/python3.5/dist-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
 'precision', 'predicted', average, warn_for)
F1 score: 0.8813119197002154

Evaluating

Testing on ~/workspace/traffic/SanitizeNodes/Testing/AmazonEcho-160926-9-44650d56ccd3.pcap
Sending build context to Docker daemon  578.3MB
Step 1/8 : FROM debian:stretch-slim
---> 3e235dbb0ba6
Step 2/8 : LABEL maintainer="Charlie Lewis <clewis@iqt.org>"
---> Using cache
---> 4c4a0b3a3e95
Step 3/8 : ENV BUILD_PACKAGES="        build-essential         linux-headers-4.9         python3-dev         cmake         tcl-dev         xz-utils         zlib1g-dev         git         curl"     APT_PACKAGES="        ca-certificates         openssl         python3         python3-pip         tcpdump"     PYTHON_VERSION=3.6.4     PATH=/usr/local/bin:$PATH     PYTHON_PIP_VERSION=9.0.1     LANG=C.UTF-8
---> Using cache
---> f21017329ec4
Step 4/8 : COPY requirements.txt requirements.txt
---> Using cache
---> 25674e6a26bd
Step 5/8 : RUN set -ex;     apt-get update -y;     apt-get upgrade -y;     apt-get install -y --no-install-recommends ${APT_PACKAGES};     apt-get install -y --no-install-recommends ${BUILD_PACKAGES};     ln -s /usr/bin/idle3 /usr/bin/idle;     ln -s /usr/bin/pydoc3 /usr/bin/pydoc;     ln -s /usr/bin/python3 /usr/bin/python;     ln -s /usr/bin/python3-config /usr/bin/python-config;     ln -s /usr/bin/pip3 /usr/bin/pip;     pip install -U -v setuptools wheel;     pip install -U -v -r requirements.txt;     apt-get remove --purge --auto-remove -y ${BUILD_PACKAGES};     apt-get clean;     apt-get autoclean;     apt-get autoremove;     rm -rf /tmp/* /var/tmp/*;     rm -rf /var/lib/apt/lists/*;     rm -f /var/cache/apt/archives/*.deb         /var/cache/apt/archives/partial/*.deb         /var/cache/apt/*.bin;     find /usr/lib/python3 -name __pycache__ | xargs rm -r;     rm -rf /root/.[acpw]*
---> Using cache
---> 71f74155053a
Step 6/8 : COPY . /poseidonml
---> ac11edbe1037
Step 7/8 : WORKDIR /poseidonml
Removing intermediate container 40cf1ceda5d3
---> 3efaafb9b8d7
Step 8/8 : RUN pip uninstall -y poseidonml && pip install .
---> Running in 4854b1d4813f
Uninstalling poseidonml-0.1.4:
 Successfully uninstalled poseidonml-0.1.4
Processing /poseidonml
Requirement already satisfied: numpy==1.14.5 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: pika==0.12.0 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: redis==2.10.6 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: scikit-learn==0.18.2 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: scipy==1.1.0 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: tensorflow==1.9.0 in /usr/local/lib/python3.5/dist-packages (from poseidonml==0.1.5.dev0)
Requirement already satisfied: gast>=0.2.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: absl-py>=0.1.6 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Collecting setuptools<=39.1.0 (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
 Downloading https://files.pythonhosted.org/packages/8c/10/79282747f9169f21c053c562a0baa21815a8c7879be97abd930dbcf862e8/setuptools-39.1.0-py2.py3-none-any.whl (566kB)
Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: protobuf>=3.4.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: astor>=0.6.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: tensorboard<1.10.0,>=1.9.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.5/dist-packages (from tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.5/dist-packages (from tensorboard<1.10.0,>=1.9.0->tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Requirement already satisfied: werkzeug>=0.11.10 in /usr/local/lib/python3.5/dist-packages (from tensorboard<1.10.0,>=1.9.0->tensorflow==1.9.0->poseidonml==0.1.5.dev0)
Installing collected packages: poseidonml, setuptools
 Running setup.py install for poseidonml: started
   Running setup.py install for poseidonml: finished with status 'done'
 Found existing installation: setuptools 40.0.0
   Uninstalling setuptools-40.0.0:
     Successfully uninstalled setuptools-40.0.0
Successfully installed poseidonml-0.1.5.dev0 setuptools-39.1.0
Removing intermediate container 4854b1d4813f
---> ac5c5ef9eebc
Successfully built ac5c5ef9eebc
Successfully tagged cyberreboot/poseidonml:base
~/workspace/PoseidonML/DeviceClassifier/OneLayer ~/workspace/PoseidonML
Sending build context to Docker daemon  157.2kB
Step 1/6 : FROM cyberreboot/poseidonml:base
---> ac5c5ef9eebc
Step 2/6 : LABEL maintainer="Charlie Lewis <clewis@iqt.org>"
---> Running in 088e18cd7fb3
Removing intermediate container 088e18cd7fb3
---> 248e7e9bd59c
Step 3/6 : COPY . /OneLayer
---> b4fdff9c42e4
Step 4/6 : COPY models /models
---> 62902c460d8d
Step 5/6 : WORKDIR /OneLayer
Removing intermediate container 31d824807a86
---> 3dbbe25c2bec
Step 6/6 : ENTRYPOINT ["python3", "eval_OneLayer.py"]
---> Running in 3cd0fd77b21a
Removing intermediate container 3cd0fd77b21a
---> 53850224d110
Successfully built 53850224d110
Successfully tagged poseidonml:onelayer
~/workspace/PoseidonML
Running OneLayer Eval on PCAP file ~/workspace/traffic/SanitizeNodes/Testing/AmazonEcho-160926-9-44650d56ccd3.pcap
docker run -it -v "~/workspace/traffic/SanitizeNodes/Testing/AmazonEcho-160926-9-44650d56ccd3.pcap:/pcaps/eval.pcap" -e SKIP_RABBIT=true --entrypoint=python3 poseidonml:onelayer eval_OneLayer.py
Traceback (most recent call last):
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
   return fn(*args)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
   options, feed_dict, fetch_list, target_list, run_metadata)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
   run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [169,512] rhs shape= [141,400]
        [[Node: save/Assign_13 = Assign[T=DT_FLOAT, _class=["loc:@network/session_rnn/rnn/basic_lstm_cell/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](optimizer/network/session_rnn/rnn/basic_lstm_cell/kernel/Adam_1, save/RestoreV2:13)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "eval_OneLayer.py", line 431, in <module>
   abnormality = eval_pcap(pcap_path, conf_labels, time_const, label=labels[0], rnn_size=rnn_size)
 File "/usr/local/lib/python3.5/dist-packages/poseidonml/eval_SoSModel.py", line 29, in eval_pcap
   rnnmodel.load(os.path.join(working_set.find(Requirement.parse('poseidonml')).location, 'poseidonml/models/SoSmodel'))
 File "/usr/local/lib/python3.5/dist-packages/poseidonml/SoSmodel.py", line 204, in load
   self.saver.restore(self.sess, path)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1752, in restore
   {self.saver_def.filename_tensor_name: save_path})
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
   run_metadata_ptr)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
   feed_dict_tensor, options, run_metadata)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
   run_metadata)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
   raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [169,512] rhs shape= [141,400]
        [[Node: save/Assign_13 = Assign[T=DT_FLOAT, _class=["loc:@network/session_rnn/rnn/basic_lstm_cell/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](optimizer/network/session_rnn/rnn/basic_lstm_cell/kernel/Adam_1, save/RestoreV2:13)]]

Caused by op 'save/Assign_13', defined at:
 File "eval_OneLayer.py", line 431, in <module>
   abnormality = eval_pcap(pcap_path, conf_labels, time_const, label=labels[0], rnn_size=rnn_size)
 File "/usr/local/lib/python3.5/dist-packages/poseidonml/eval_SoSModel.py", line 27, in eval_pcap
   rnnmodel = SoSModel(rnn_size=rnn_size)
 File "/usr/local/lib/python3.5/dist-packages/poseidonml/SoSmodel.py", line 76, in __init__
   self._build_model()
 File "/usr/local/lib/python3.5/dist-packages/poseidonml/SoSmodel.py", line 129, in _build_model
   self.saver = tf.train.Saver()
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1284, in __init__
   self.build()
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1296, in build
   self._build(self._filename, build_save=True, build_restore=True)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1333, in _build
   build_save=build_save, build_restore=build_restore)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 781, in _build_internal
   restore_sequentially, reshape)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 422, in _AddRestoreOps
   assign_ops.append(saveable.restore(saveable_tensors, shapes))
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 113, in restore
   self.op.get_shape().is_fully_defined())
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/state_ops.py", line 219, in assign
   validate_shape=validate_shape)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 60, in assign
   use_locking=use_locking, name=name)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
   op_def=op_def)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
   op_def=op_def)
 File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
   self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [169,512] rhs shape= [141,400]
        [[Node: save/Assign_13 = Assign[T=DT_FLOAT, _class=["loc:@network/session_rnn/rnn/basic_lstm_cell/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](optimizer/network/session_rnn/rnn/basic_lstm_cell/kernel/Adam_1, save/RestoreV2:13)]]

Makefile:13: recipe for target 'eval_onelayer' failed
make: *** [eval_onelayer] Error 1

Thanks in advance.

@cglewis
Copy link
Member

cglewis commented Jul 23, 2018

I changed the rnn size in DeviceClassifier/OneLayer/opts/config.json so that SoS and OneLayer would match which might be the problem:
cc17326#diff-cc727211c401f3b2b1a0596ad1ea4e55L37

Did you also train against this commit, or a prior commit?

If you change the value back to 128 does the error go away?

As for the warning about minimum groups, yes that means you don't have enough data samples in those buckets (classes).

@scottkelso
Copy link
Author

I replace the config.json and label_assignments.json file each time so every test I have done has been with the same config file (where rnn=128). I think the error must have just been in a select few commits...

I have since ran training and evals on 3e5a52d (latest), cc1f326 and 1fd5e3e and have discovered that the only ones that are throwing that error, are those of commit cc1f326 with rnn=128, irrespective of which commit was used to train. See results below.

Train on 3e5a52d (latest)

          rnn=128  rnn=100
3e5a52d : PASS     PASS
cc17326 : ERROR    PASS
1fd5e3e : PASS     PASS

Train on cc17326 (18th July)

          rnn=128  rnn=100
3e5a52d : PASS     PASS
cc17326 : ERROR    PASS
1fd5e3e : PASS     PASS

Train on 1fd5e3e (17th July)

          rnn=128  rnn=100
3e5a52d : PASS     PASS
cc17326 : ERROR    PASS
1fd5e3e : PASS     PASS

Apologies if much time was wasted on this issue. Thanks!

@cglewis
Copy link
Member

cglewis commented Jul 23, 2018

So it's working in the latest on Master now? Or am I misunderstanding?

@scottkelso
Copy link
Author

Correct, feel free to close,

@cglewis cglewis closed this as completed Jul 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants