BUG: Neptune log error for multiple dataloaders #643

stonelazy · 2021-08-05T12:59:11Z

Describe the bug

Error gets thrown while logging the metric value.
Having Pytorch lightning integration with Neptune. This error gets thrown only in the latest client of Neptune's
from neptune.new.integrations.pytorch_lightning import NeptuneLogger

Reproduction

https://colab.research.google.com/drive/13rRlztjGRQrv6Y3W-d21Dotoj8L2UtoZ?usp=sharing

Expected behavior

Experiment should keep running when without any error.

Traceback

Following trace as a result of invoking self.logger.log_metrics

    def __getattr__(self, attr):
>       raise AttributeError("{} has no attribute {}.".format(type(self), attr))
E       AttributeError: <class 'neptune.new.attributes.namespace.Namespace'> has no attribute log.
env/lib/python3.8/site-packages/neptune/new/attributes/attribute.py:35: AttributeError

If the value of attr is None, then it passes the if condition and am not facing any error. Facing the issue in the else condition.
neptune.new.handler.Handler.log
self._path = "val_loss"

Environment

The output of pip list:

neptune-contrib           0.27.2                   pypi_0    pypi
neptune-pytorch-lightning 0.9.7                    pypi_0    pypi

The operating system you're using:
Ubuntu
The output of python --version:
Python 3.8.10

Additional context
It gets logged for all the metrics, only for this particular 'val_loss' key the error gets thrown.
Happens only after migrating to new neptune client. Works fine with previous version.
This error gets thrown only having more than one validation dataloader.

EDIT:
If we have multiple dataloaders, then all of the parameters that gets logged will have name of the dataloader appended.
Ex: Suppose my log is self.log('loss',0.2)
It will get logged for each of the dataloader along with its index in the log name and its corresponding value: loss/dataloader_0 = 0.2 , loss/dataloader_1=0.4 and so on for every dataloader.
Since my metric to monitor is 'loss', PTL also expects exact string 'loss' value to be logged, otherwise it throws below error

  
if not trainer.fit_loop.epoch_loop.val_loop._has_run:
              warning_cache.warn(m)
          else:
             raise MisconfigurationException(m)
E               pytorch_lightning.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='loss') not found in the   returned metrics: ['train_loss', 'train_loss_step', 'loss/dataloader_idx_0', 'loss/dataloader_idx_1', 'validation_f1',  'validation_precision', 'validation_recall', 'validation_accuracy']. HINT: Did you call self.log('loss', value) in the LightningModule?

But according to Neptune, 'loss' is now invalid once you have already logged 'loss/dataloader_1' (I guess) ? If so, you are both contradicting.

The text was updated successfully, but these errors were encountered:

Blaizzy · 2021-08-06T06:45:36Z

Hi @stonelazy

I'm Prince Canuma a Data Scientist and DevRel at Neptune.ai

The error you are facing happens when try to use .log() to folder/namespace/path that is already populated with data.
Example Error ❌ :

neptune_logger.experiment['val_loss/results'].log([1,2,3])
neptune_logger.experiment['val_loss'].log([1,2,3])

Traceback:
AttributeError: <class 'neptune.new.attributes.namespace.Namespace'> has no attribute log.

Why: val_loss is now a folder and folders don't work with .log(). Instead, you can create another metadata(like item or file) inside the folder val_loss as displayed below.

Example Solution ✅ :

neptune_logger.experiment['val_loss/results'].log([1,2,3])
...
neptune_logger.experiment['val_loss/results_n'].log([1,2,3])

Blaizzy · 2021-08-06T06:46:47Z

Please let me know if this solves your problem 😃 👍

stonelazy · 2021-08-06T07:11:10Z

Thanks for getting back on this, but as a lightning user i'm not invoking neptune_logger.experiment[].log() myself, am only making use of lighting's wrapper self.log('val_loss',0.2) in my experiment subclass of pl.LightningModule
How can i make use of your suggestion to handle this case ?

stonelazy · 2021-08-06T09:55:07Z

Have updated this comment as additional context in the original post as well.

If we have multiple dataloaders, then all of the parameters that gets logged will have name of the dataloader appended.
Ex: Suppose my log is self.log('loss',0.2)
It will get logged for each of the dataloader along with its index in the log name and its corresponding value: loss/dataloader_0 = 0.2 , loss/dataloader_1=0.4 and so on for every dataloader.
Since my metric to monitor is 'loss', PTL also expects exact string 'loss' value to be logged, otherwise it throws below error

  
if not trainer.fit_loop.epoch_loop.val_loop._has_run:
              warning_cache.warn(m)
          else:
             raise MisconfigurationException(m)
E               pytorch_lightning.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='loss') not found in the   returned metrics: ['train_loss', 'train_loss_step', 'loss/dataloader_idx_0', 'loss/dataloader_idx_1', 'validation_f1',  'validation_precision', 'validation_recall', 'validation_accuracy']. HINT: Did you call self.log('loss', value) in the LightningModule?

But according to Neptune, 'loss' is now invalid once you have already logged 'loss/dataloader_1' (I guess) ? If so, you are both contradicting.

Blaizzy · 2021-08-06T09:58:23Z

Alright,

mailto: prince.canuma@neptune.ai I will reach you through intercom.

Blaizzy · 2021-08-06T10:00:37Z

But according to Neptune, 'loss' is now invalid once you have already logged 'loss/dataloader_1' (I guess)

Correct ✅ . You can't log to loss because loss is now a folder that contains results for each dataloader.
-> loss
--->Dataloader_n

Blaizzy · 2021-08-10T06:25:57Z

Hi @stonelazy

Is your problem solved? or is it still not working properly?

stonelazy · 2021-08-10T06:53:15Z

Error is reproducible in this notebook file. Please have a look at it.
https://colab.research.google.com/drive/13rRlztjGRQrv6Y3W-d21Dotoj8L2UtoZ?usp=sharing

Blaizzy · 2021-08-11T06:22:09Z

Thank you very much! I will submit a ticket to the engineering team to fix it. But a workarround would be: chaning: ` metrics = self._add_prefix(metrics) metrics_key = self.METRICS_KEY if self._base_namespace: metrics_key = f'{self._base_namespace}/{metrics_key}' for key, val in metrics.items(): # `step` is ignored because Neptune expects strictly increasing step values which # Lighting does not always guarantee. self.experiment[f'{metrics_key}/{key}'].log(val) ` to something like: ` metrics = self._add_prefix(metrics) metrics_key = self.METRICS_KEY if self._base_namespace: metrics_key = f'{self._base_namespace}/{metrics_key}' for key, val in metrics.items(): # `step` is ignored because Neptune expects strictly increasing step values which # Lighting does not always guarantee. if key == 'loss' and isinstance(val, (int, float)): key = 'loss/loss' self.experiment[f'{metrics_key}/{key}'].log(val) `

…

On Tue, Aug 10, 2021 at 12:23 PM stonelazy ***@***.***> wrote: Error is reproducible in this notebook file. Please have a look at it. https://colab.research.google.com/drive/13rRlztjGRQrv6Y3W-d21Dotoj8L2UtoZ?usp=sharing — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#643 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFS4BGMFXDYCRHOIUNRZEJTT4DEGLANCNFSM5BTZ4YIQ> .

Blaizzy · 2021-08-11T07:02:46Z

I have created a jira ticket The team will try to inform you once the issue is resolved.

…

On Wed, Aug 11, 2021 at 11:51 AM Prince Canuma ***@***.***> wrote: Thank you very much! I will submit a ticket to the engineering team to fix it. But a workarround would be: chaning: ` metrics = self._add_prefix(metrics) metrics_key = self.METRICS_KEY if self._base_namespace: metrics_key = f'{self._base_namespace}/{metrics_key}' for key, val in metrics.items(): # `step` is ignored because Neptune expects strictly increasing step values which # Lighting does not always guarantee. self.experiment[f'{metrics_key}/{key}'].log(val) ` to something like: ` metrics = self._add_prefix(metrics) metrics_key = self.METRICS_KEY if self._base_namespace: metrics_key = f'{self._base_namespace}/{metrics_key}' for key, val in metrics.items(): # `step` is ignored because Neptune expects strictly increasing step values which # Lighting does not always guarantee. if key == 'loss' and isinstance(val, (int, float)): key = 'loss/loss' self.experiment[f'{metrics_key}/{key}'].log(val) ` On Tue, Aug 10, 2021 at 12:23 PM stonelazy ***@***.***> wrote: > Error is reproducible in this notebook file. Please have a look at it. > > https://colab.research.google.com/drive/13rRlztjGRQrv6Y3W-d21Dotoj8L2UtoZ?usp=sharing > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#643 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AFS4BGMFXDYCRHOIUNRZEJTT4DEGLANCNFSM5BTZ4YIQ> > . >

Blaizzy · 2021-08-23T21:16:02Z

Hi Sudharsan,

Regarding your multiple data loaders issues,

We found the following solution,

In the code you submitted, we noticed you wanted to log a metric named “loss“ after you already had created loss/dataloader_idx_0 and loss/dataloader_idx_1 for the validation step. This results in error.

The fix is to simply rename the aggregate “loss” to something else like “loss_global” → this fix the problem.

Link to colab solution: https://colab.research.google.com/drive/1APHu9qYVukdxBHmZBQFD35m1PAuDpLZ4

Let me know if this helps,

Kind regards,

stonelazy · 2021-10-13T08:09:44Z

In the code you submitted, we noticed you wanted to log a metric named “loss“ after you already had created loss/dataloader_idx_0 and loss/dataloader_idx_1 for the validation step. This results in error.

There is a slight correction, 'we' are not creating metric named 'loss/dataloader_idx_0' it is done by Pytorch Lightning when there are multiple dataloaders and this Issue was raised to address the very same concern. I understand changing the name of metric would work, but it would only be a workaround.
Please correct me if am wrong.

kamil-kaczmarek · 2021-10-14T15:51:39Z

Hey @stonelazy,

I checked the colab that you initially paste as a reproduction info:
https://colab.research.google.com/drive/13rRlztjGRQrv6Y3W-d21Dotoj8L2UtoZ?usp=sharing

Here is a run that I made: https://app.neptune.ai/o/common/org/pytorch-lightning-integration/e/PTL-29/all

Here is what I did

I changed "loss" to "val_loss" in your code (line 49)
I did the same to the EarlyStopping callback argument monitor="val_loss" (line 105)

The error was fixed my making sure that you log val_loss to the separate namespace.

Yes, PTL creates 'loss/dataloader_idx_N paths when working with multiple dataloaders. In neptune you can create hierarchical structure of the run, but cannot log values to the loss and loss/dataloader_idx_N at the same time.

I will pass this info to the product team, for the time being, I recommend to adjust loss names a bit.

Pls, let me know what you think?

stonelazy · 2021-10-15T11:56:39Z

Appreciate your reply.

I will pass this info to the product team, for the time being, I recommend to adjust loss names a bit

Sure, Thanks.

stonelazy changed the title ~~BUG:~~ BUG: Training terminates with neptune error Aug 5, 2021

stonelazy changed the title ~~BUG: Training terminates with neptune error~~ BUG: Neptune log error for multiple dataloaders Aug 6, 2021

This comment has been minimized.

Sign in to view

kamil-kaczmarek self-assigned this Aug 18, 2021

Blaizzy closed this as completed Aug 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Neptune log error for multiple dataloaders #643

BUG: Neptune log error for multiple dataloaders #643

stonelazy commented Aug 5, 2021 •

edited

Blaizzy commented Aug 6, 2021

Blaizzy commented Aug 6, 2021

stonelazy commented Aug 6, 2021 •

edited

stonelazy commented Aug 6, 2021

Blaizzy commented Aug 6, 2021 •

edited

Blaizzy commented Aug 6, 2021 •

edited

This comment has been minimized.

Blaizzy commented Aug 10, 2021

stonelazy commented Aug 10, 2021

Blaizzy commented Aug 11, 2021 via email

Blaizzy commented Aug 11, 2021 via email

Blaizzy commented Aug 23, 2021

stonelazy commented Oct 13, 2021 •

edited

kamil-kaczmarek commented Oct 14, 2021

stonelazy commented Oct 15, 2021 •

edited

BUG: Neptune log error for multiple dataloaders #643

BUG: Neptune log error for multiple dataloaders #643

Comments

stonelazy commented Aug 5, 2021 • edited

Describe the bug

Reproduction

Expected behavior

Traceback

Environment

Blaizzy commented Aug 6, 2021

Blaizzy commented Aug 6, 2021

stonelazy commented Aug 6, 2021 • edited

stonelazy commented Aug 6, 2021

Blaizzy commented Aug 6, 2021 • edited

Blaizzy commented Aug 6, 2021 • edited

This comment has been minimized.

Blaizzy commented Aug 10, 2021

stonelazy commented Aug 10, 2021

Blaizzy commented Aug 11, 2021 via email

Blaizzy commented Aug 11, 2021 via email

Blaizzy commented Aug 23, 2021

stonelazy commented Oct 13, 2021 • edited

kamil-kaczmarek commented Oct 14, 2021

stonelazy commented Oct 15, 2021 • edited

stonelazy commented Aug 5, 2021 •

edited

stonelazy commented Aug 6, 2021 •

edited

Blaizzy commented Aug 6, 2021 •

edited

Blaizzy commented Aug 6, 2021 •

edited

stonelazy commented Oct 13, 2021 •

edited

stonelazy commented Oct 15, 2021 •

edited