PerNodeDrop: A Practical and Efficient Alternative to DropConnect:- – Exploring Node-Owned Stochasticity #23090

geleshChrsitUniversity · 2026-06-14T14:06:47Z

geleshChrsitUniversity
Jun 14, 2026

# Feedback Request: PerNodeDropDense – Exploring Node-Owned Stochasticity

Hello everyone,

I have been working on a stochastic regularization method called PerNodeDrop, and I would greatly appreciate feedback from the Keras/TensorFlow community regarding both the abstraction and the API design.

The work began with a simple question:

Who owns the stochasticity?

Most stochastic regularization methods can be viewed as assigning perturbation ownership to a particular computational entity.

Activation-Owned Stochasticity

Methods such as Dropout, GaussianDropout, and GaussianNoise assign stochasticity to activations through a dedicated regularization layer.

Dense(...)
Dropout(...)
Dense(...)

For a given sample, all downstream neurons observe the same perturbed activation value. The stochastic influence of that activation is therefore shared across the receiving layer.

Connection-Owned Stochasticity

DropConnect assigns stochasticity to individual connections.

Neuron A ── m11*w11 ──► Neuron B
Neuron A ── m12*w12 ──► Neuron C
Neuron A ── m13*w13 ──► Neuron D

Different downstream neurons may receive different stochastic realizations of the same upstream signal, creating a richer perturbation space.

However, stochasticity now operates in the connection space:

O(Nin × Nout)

which can become increasingly expensive as network size grows.

Node-Owned Stochasticity

PerNodeDrop explores a third ownership model.

Instead of assigning stochasticity to activations or connections, the receiving neuron owns the perturbation.

Activation-Owned  → Sender Perspective
Connection-Owned  → Edge Perspective
Node-Owned        → Receiver Perspective

Each receiving neuron generates and applies its own perturbation during computation.

The perturbation is local to the computational unit and does not require connection-level mask management.

This shifts stochasticity from:

O(Nin × Nout)

to:

O(Nout)

while preserving Dense-layer execution semantics.

Current API

Bernoulli mode:

PerNodeDropDense(
    units=256,
    rate=0.5,
    stir_type="bernoulli"
)

Gaussian mode:

PerNodeDropDense(
    units=256,
    rate=0.5,
    stir_type="gaussian"
)

The same abstraction supports both binary masking and continuous perturbation.

Why I Find This Interesting

PerNodeDrop attempts to combine:

Dense-layer simplicity
Per-sample stochastic perturbation
Standard forward/backward propagation semantics
No graph rewiring
No custom training loops
No connection-level bookkeeping
Support for both Bernoulli and Gaussian perturbation families

The implementation is currently being prepared for open-source release with improved testing, documentation, serialization support, and production-quality packaging.

An initial preprint is available here:

[https://arxiv.org/abs/2512.12663]

A substantially revised manuscript is currently under journal review.

Feedback Requested

I would greatly appreciate thoughts on:

Does "node-owned stochasticity" feel like a meaningful abstraction?
Does a Dense-family layer seem like the natural implementation vehicle?
Are there API, performance, serialization, or maintainability concerns that should be considered?
Would this be better positioned as a standalone package, a Keras ecosystem extension, or a future framework contribution?
Are there related methods or prior work that I should examine?

Any thoughts, criticism, implementation concerns, or references would be extremely valuable.

starkmarkus · 2026-06-16T15:56:24Z

starkmarkus
Jun 16, 2026

Interesting idea.

The main appeal here is that the stochasticity is attached to the node rather than the full connection matrix, so it should be cheaper than DropConnect while still being more structured than plain Dropout.

If you want to take this further, I would make it a backend-agnostic Keras 3 layer and verify a few things early:

keras.ops / keras.random only
clean training behavior
get_config() / serialization
shape and seed tests
comparisons against Dropout, GaussianDropout, and DropConnect-style baselines

I would also be careful to define exactly where the noise is applied, because that changes the behavior quite a bit.

My instinct would be to treat this as a standalone package first and only push upstream if the experiments are consistently strong.

3 replies

geleshChrsitUniversity Jun 16, 2026
Author

Thank you for your thoughtful feedback and encouragement.

You correctly identified the central idea: the stochasticity is attached to the neuron/node rather than the full connection matrix. The motivation is to retain some of the diversity associated with DropConnect while keeping the computational cost much closer to Dropout.

I have already carried out some preliminary comparisons against Dropout and Gaussian Dropout, and I am currently refining the implementation.

from tensorflow.keras import layers,
class PerNeuronDropDense(layers.Layer):
# def __init__()
# def call() 
# def get_config(self):

Refactoring it into a backend-agnostic Keras 3 layer using keras.ops and keras.random is indeed my next step, along with proper serialization, shape validation, seed handling, and testing.

I also agree that carefully defining where the stochastic perturbation is applied is important, as different placements can lead to different behaviors. A more modular implementation should make it easier to analyze and compare these variants systematically.

Your suggestion of developing it as a standalone package first makes a lot of sense. That would provide a cleaner environment for experimentation, benchmarking, and studying the internal behavior before considering any upstream proposal.

Thanks again for taking the time to review the idea and for the practical guidance.

geleshChrsitUniversity Jun 16, 2026
Author

@starkmarkus , May I have your thought on Pre_Print: https://arxiv.org/abs/2512.12663 .. too ..

Thanks

starkmarkus Jun 16, 2026

I have not read the full manuscript in detail yet, but the core idea is clear and the framing is reasonable.

What I would pay the most attention to in the preprint is the empirical side:

does node-owned stochasticity beat Dropout and GaussianDropout at similar cost
does it still help across multiple datasets and random seeds
does the effect depend on where the perturbation is applied
how large is the runtime / parameter overhead compared with the baselines

If those points are well covered, the paper should make a much stronger case than the API name alone. The terminology is interesting, but the ablation table and the cost/benefit story will probably matter most to readers.

If you want, I can also give more specific feedback on the manuscript structure or the experiments once you share the sections you most want reviewed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PerNodeDrop: A Practical and Efficient Alternative to DropConnect:- – Exploring Node-Owned Stochasticity #23090

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PerNodeDrop: A Practical and Efficient Alternative to DropConnect:- – Exploring Node-Owned Stochasticity #23090

Uh oh!

geleshChrsitUniversity Jun 14, 2026

Who owns the stochasticity?

Activation-Owned Stochasticity

Connection-Owned Stochasticity

Node-Owned Stochasticity

Current API

Why I Find This Interesting

Feedback Requested

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

starkmarkus Jun 16, 2026

Uh oh!

geleshChrsitUniversity Jun 16, 2026 Author

Uh oh!

Uh oh!

geleshChrsitUniversity Jun 16, 2026 Author

Uh oh!

starkmarkus Jun 16, 2026

geleshChrsitUniversity
Jun 14, 2026

Replies: 1 comment 3 replies

starkmarkus
Jun 16, 2026

geleshChrsitUniversity Jun 16, 2026
Author

geleshChrsitUniversity Jun 16, 2026
Author