Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs/modelling layers #1502

Merged
merged 4 commits into from
Mar 20, 2024

Conversation

mykolaskrynnyk
Copy link
Contributor

Improves the documentation in layers/modeling by

  • Adding a missing argument description to TokenAndPositionEmbedding.
  • Aligning the usage of name argument in TransformerDecoder and TransformerEncoder layers with that in FNetEncoder layer as well as their own docstrings.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First change looks good. Second I think we should skip.

@@ -108,14 +108,15 @@ def __init__(
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
normalize_first=False,
name=None,
Copy link
Member

@mattdangerw mattdangerw Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a place we are not fully consistent in KerasNLP, but I would say let's not do this to avoid clutter. Core Keras Dense, for example, does not do this https://github.com/keras-team/keras/blob/v3.0.5/keras/layers/core/dense.py#L33-L57

And it's not just name, it's also trainable, dtype, autocast. Would be a pain to replicate these in each layer. It would be great on the docs side to figure out how to add these to all layers across the site, but let's not clutter the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have time, feel free to remove this from other layers where we are doing it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mattdangerw. In this case, one thing we could do to increase consistency is to drop name – and other optional arguments passed explicitly to super().__init__, for that matter – from all the doctsrings and code as well as expand the definition of **kwargs. For example, changing FNetEncoder from this:

@keras_nlp_export("keras_nlp.layers.FNetEncoder")
class FNetEncoder(keras.layers.Layer):
    """FNet encoder.

    [...]

    Args:
        [...]
        name: string. The name of the layer. Defaults to `None`.
        **kwargs: other keyword arguments.

    [...]
    """

    def __init__(
        self,
        intermediate_dim,
        dropout=0,
        activation="relu",
        layer_norm_epsilon=1e-5,
        kernel_initializer="glorot_uniform",
        bias_initializer="zeros",
        name=None,
        **kwargs
    ):
        super().__init__(name=name, **kwargs)

to this:

@keras_nlp_export("keras_nlp.layers.FNetEncoder")
class FNetEncoder(keras.layers.Layer):
    """FNet encoder.

    [...]

    Args:
        [...]
        **kwargs: other keyword arguments passed to `keras.layers.Layer`, including `name`, `trainable`, 
            `dtype', `autocast` etc.

    [...]
    """

    def __init__(
        self,
        intermediate_dim,
        dropout=0,
        activation="relu",
        layer_norm_epsilon=1e-5,
        kernel_initializer="glorot_uniform",
        bias_initializer="zeros",
        **kwargs
    ):
        super().__init__(**kwargs)

That way the code will be tidier but the documentation will be clear on how users can leverage additional keywords.

What is your take?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me! Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit. Maybe let's not even mention autocast, that's more of a power user feature.

**kwargs: other keyword arguments passed to `keras.layers.Layer`, including `name`, `trainable`, and `dtype`.

@mattdangerw
Copy link
Member

@mykolaskrynnyk should I wait for the kwargs doc update or just pull in tie_weights docs for now and we do that on separate PR?

@mykolaskrynnyk
Copy link
Contributor Author

@mattdangerw , I've just pushed the changes that we discussed earlier. For ReversibleEmbedding, I referenced keras.layers.Embedding in the docstring as that is the class it inherits from directly.

I think we can make similar changes to preprocessing layers. However, I can see that RandomSwap and RandomDeletion expect a specific dtype by default, so we might need to change the wording slightly. At any rate, that will be a different pull request. Cheers.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@mattdangerw
Copy link
Member

Thanks very much for the contribution!

@mattdangerw mattdangerw merged commit bfcb0fc into keras-team:master Mar 20, 2024
6 checks passed
@mykolaskrynnyk mykolaskrynnyk deleted the docs/modelling_layers branch March 23, 2024 11:04
abuelnasr0 pushed a commit to abuelnasr0/keras-nlp that referenced this pull request Apr 2, 2024
* Docs(layers): add a description for `tie_weights` argument

* Refactor(layers): make `name` an explicit argument for Transformer layers

* Refactor(layers): remove explicit usage of `name` in `__init__` calls

* Docs(layers): remove references to `name` and consistently documents `**kwargs`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants