Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixnet: Periodical topology update + Test refactoring #51

Closed
wants to merge 10 commits into from

Conversation

youngjoon-lee
Copy link
Contributor

This PR is the last piece of Mixnet specification. This covers the deterministic periodical topology reconstruction.

For the simplicity, this Python spec doesn't describe how network connections are established and maintained. Instead, this focuses on showing how entropy is provided and used to build topologies. Managing network connections is an implementation detail.

Comment on lines 5 to 11
class FisherYates:
@staticmethod
def shuffle(elements: List, entropy: bytes) -> List:
def set_seed(seed: bytes) -> None:
random.seed(a=seed, version=2)

@staticmethod
def shuffle(elements: List) -> List:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the old style is better. IIRC from the expects, the entropy changes over time. In this case we would set the whole system seed when calling set_seed, which is not what we want. It should be reset on every call to maintain integrity with other random calls. (This is python specific, but it actually fixates the intention that the entropy is just for the shuffling).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, yes, the entropy was designed to be changed over time. But, from the recent discussion, the entropy is going to be injected less frequently, for example, 5 days. But, the topology update is going to be triggered more frequently, e.g. 30 minutes. So, the seed, which was set when the entropy was injected, should be maintained over subsequent topology updates until a new entropy is injected. According to the discussion, I've updated the spec: https://www.notion.so/Mixnet-Specification-807b624444a54a4b88afa1cc80e100c2?pvs=4#be6b9cafb4b94334b9f154d3a072204c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the topic that we need to make a sync in #51 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think in this case there is a missunderstanding here. set_seed should be instead set_entropy then we store the entropy and do not set the seed inmidiately. Then we use the old code. Otherwise is gonna be impossible to make simulations for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If you remove the mixnet_epoch concept. We can use the old code. Also, not setting the seed immediately looks reasonable (#51 (comment)).

mixnet/mixnet.py Outdated Show resolved Hide resolved
mixnet/mixnet.py Outdated
next_epoch_ts = datetime.now() + timedelta(seconds=self.mixnet_epoch_sec)

while True:
time.sleep(1 / 1000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a sleep here? nitpick: I think schedules are defined already when checking it below. Probably can be abstracted into some scheduler.
Not necessary to change now. But we could have a blackboxed structure than awaits till next epoch.

mixnet/mixnet.py Outdated Show resolved Hide resolved
mixnet/mixnet.py Show resolved Hide resolved
mixnet/mixnet.py Outdated
self.next_topology = None

def run(self) -> None:
next_epoch_ts = datetime.now() + timedelta(seconds=self.mixnet_epoch_sec)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking at loud here.

I don't know if this is the best way to count epochs, it may work here when all node instances have their clocks synchronized. However, when one node will connect later (or will disconnect for some time) he might loose track of time here. It might be better to have something like self.genesis_timestamp from which all nodes calculated current epoch:
epoch_number = ceiling(datetime.now() - params.genesis_timestamp / params.epoch_len_in_sec)
and slot number as (relative to the epoch):
slot_number_in_epoch = ceiling((datetime.now() - epoch_number * params.epoch_len_in_sec) / params.slot_len_in_sec)
Or something similar.

This way the calculation should be independent from the connectivity conditions of nodes and can be influenced only by clock drifts.

Another things is that the epoch and slot information should be confirmed by reading them from the consensus. But that might be out of the scope for this specification effort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, related to this comment . It would be enough so this structure have a channel (a python corutine that receive values can be used) that schedules the epochs, something like (note sintax may be different):

while next_epoch :=  yield:

or

while next_epoch := await epoch_notifier:

or

async for epoch in epoch_stream:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I missed this point. I'll fix it.

Another things is that the epoch and slot information should be confirmed by reading them from the consensus. But that might be out of the scope for this specification effort.

I'm also thinking about this. Here, we only use mixnet_epoch_sec that defines the interval of topology updates, which is going to be shorter than consensus_epoch_sec. So, if we design the entropy to be injected from the consensus layer to the mixnet layer, I think the mixnet layer doesn't need to read something (consensus epoch or slot info) from the consensus layer. In other words, we can design mixnet independently with upper layers. Did I understand the point that you mentioned?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that we are thinking the same way here. I've left more in depth comment below in the next conversation (#51 (comment))

mixnet/mixnet.py Outdated
continue

if self.next_topology is None:
self.next_topology = self.build_topology()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, in this version, we are refreshing the topology only on once per epoch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I designed that the topology is updated once per mixnet_epoch (e.g. 30min), which is going to be shorter than consensus_epoch (e.g. 5d). Also, the mixnet_epoch is going to be independent with consensus_slot (20sec).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make a sync of this topic. It depends on how long the consensus_epoch is going to be. It would be much simpler to just remove mixnet_epoch and make mixnet follow the consensus_epoch, if the consensus_epoch is not too long. Also, it may be okay even if the topology is updated less frequently. In Nym, the topology is updated once chain validators publish a new random beacon on the Nym blockchain (once a day, iirc).

Although there will be dynamic optimization of the epoch duration, it is expected epochs will be in the order of hours or days

(from Nym whitepaper: 4.7 Mixnet epochs)

What do you think about between these two approaches (having a shorter mixnet_epoch or removing it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To minimize the misunderstanding, I just made a new commit assuming that we're gonna remove the mixnet_epoch concept: 9a64242. Let's talk which one would be reasonable for v1.

This is much simpler and doesn't tightly rely on the term consensus epoch and consensus slot, thus loosening the architectural relationship between mixnet and consensus.

@danielSanchezQ In this commit, I didn't use async to make diff prettier ;) In fact, the code after this change is very very simple even in the sync mode, and doesn't have the time calculation error that @madxor mentioned. Once we agree on a single approach, I'll make a separate PR for async.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it might be easier to think about mixnet functions as a tool that will be used by the consensus. In my head, the consensus should be responsible for triggering mixnet function, as the consensus should be treated as the source of truth. That is, what if we focus only on providing an API that potentially could be used as part of the consensus who will be responsible for synchronization of time and data, and the mixnet API will only respond (by setting the connectivity) accordingly to data provided by consensus. This way we could simplify a lot the internal logic of the mixnet and relieve it from knowing anything about the time or consensus.

The API could be very simple, such like:
topology = get_mixnet_topology(nodes_list, entropy) <-- returns topology configuration based on the entropy and the list of nodes.
set_mixnet_topology(topology) <-- activates the topology.

I think that mixnet should only care about the two above, and all synchronization/configuration issues should be abstracted from the mixnet and done on the consensus side. Therefore, the consensus be responsible for calculating and providing synchronized data in a timely manner to the mixnet.

Technically, I think that there should be another layer on top of the consensus that would do the above in conjunction to parsing governance transactions (coordination layer?), through which some of the global parameters will be set. But we don't have it yet so I'm using consensus here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madxor @danielSanchezQ I think we're almost on the same page and we can now come to a conclusion.

I made a commit 9a64242 for clarity and another commit 1a97064 to rename a variable. These commit have API different from what Marcin suggested in #51 (comment), but I'm going to explain about this at the end of this comment.

In these commit, the consensus layer just injects a new entropy when it thinks that the new entropy is almost safe to use. Potentially, the consensus layer will be able to have some delays before injecting the entropy, instead of injecting it as soon as the random beacon is updated, because of #51 (comment). But anyway, it's the responsibility of the consensus layer. Mixnet layer doesn't need to know about this detail.

In perspective of mixnet, it just uses a new entropy whenever it is injected. Mixnet doesn't need to care about the frequency of entropy injection. It is up to the consensus layer. Even if the new entropy is injected every 20 seconds, it's okay. Mixnet will just use it to build a new topology, as soon as it arrives in.

The one thing Mixnet needs to care about is putting some delay before switching to the new topology. As mentioned in #51 (comment), it takes some time for each mix node to establish new connections according to the new topology. Of course, we can ignore this, by allowing some connections are established on demand when packets based on new topology are arrived in, even if it may increase the total latency of packet delivery. But, I think we can minimize the impact of this transition period by putting some delay (e.g. 1min) before switching to the new topology. Then, each mix node can earn 1min to calculate the new topology and establish new connections in advance. This is what 1a97064 tries to describe. Still, this delay is not perfect, and also this doesn't mitigate the time difference on the new entropy being injected to each mix node because the network is not fully synchronized. But at least, this delay can minimize the impact caused by connection re-establishments.

As I mentioned above, I designed API different from what Marcin suggested, because I didn't want to expose the concept of "topology" to the consensus layer by providing get/set_topology API. In that case, the consensus layer may be confused about how long it should hold the topology returned from get_topology(entropy) before calling set_topology(topology). Instead of that, I thought it's simpler to have a single API inject_entropy(entropy), so that the consensus layer just injects a new entropy whenever it wants and forgets about that.

What do you think about this? We have had many valuable discussions, and I think we are in the end of the discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just attaching a diagram that helps our discussion. The option 2 is what my code is suggesting, but let's discuss today.
mixnet-topology-update drawio(2)
Even if we choose the option 2, the first topology generation (when the process is started) should be like the option 1 probably.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we are on the right path. However, I think we are still missing another layer to make it perfect. I fully agree that there should be a logical separation between the mixnet (and its time-awareness) and consensus (and its network awareness). Therefore, it might be necessary to create another layer on top of the mixnet and consensus, we can call it robustness (or privacy) agent/layer (or something). The robustness layer would be responsible for reading consensus and setting mixnet, we could configure it separately from the mixnet and consensus. Therefore, the flow would always be from the consensus event to mixnet action with independently configured logic.

For example, if change the length of the consensus epoch or slot time, then we should not adjust mixnet delay, that should be taken care by the robustness abstraction, which would translate the consensus timings to the mixnet ones (in practice, I imagine that mixnet will not be aware on any specific timings but will react on robustness calls).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree as well. We need something in the middle to coordinate efforts. I wan to understand more about the delays as I am getting a bit lost on that. I feel the responsibility is kind of mixed. This system should be totally reactive IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a summary of our discussion today.

mixnet-topology-update drawio(3)

  • We're gonna have a separate "robustness" layer on top of consensus layer and mixnet layer for the clear separation of reponsibilities.
  • Robustness layer is gonna know parameters for building mixnet topologies.
    • After v1, robustness layer will have more responsibilites and parameters (e.g. changing mix delays).
  • Whenever a new entropy is stabilized in the consensus layer, the consensus layer is going to emit the new entropy to the robustness layer, along with the updated list of nodes in the network. In general, this is going to happen once per consensus epoch, but it depends on the design of consensus layer, strictly speaking.
  • As soon as a new information arrives in, the robustness layer "immediately" builds a new topology configuration based on the current parameter that it knows. The topology configuration means the list of mix nodes for each layer (for now in v1).
  • As soon as the new topology configuration is generated, the robustness layer injects it to the mixnet layer. There is going to be no time delay between those two events.
  • The mixnet layer switches the topology configuration immediately, starts establishing new QUIC connections in background, start serving packets generated based on the new topology.
    • Note that we're not gonna have an explicit time delay before activating the new topology. The new topology is activated automatically as soon as it is injected from the robustness layer.
    • Packets based on the new topology can arrive in the mix node, even before all new connections are established. But, it is totally okay. Mix node always establish a new connection on demand to forward the packet with the best effort. Even if packets based on unexpected topology is arrived in, mix node never reject packets. This design is reasonable because we assume that all nodes in the network are ready to act as a mix node (at least for v1).

mixnet/mixnet.py Outdated
time.sleep(1 / 1000)

now = datetime.now()
if now < next_epoch_ts - timedelta(seconds=self.proactive_sec):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that I understand this correctly, the intention here is to update the topology before the next epoch, so that when the new epoch starts it can already use the new topology?

If so, then it might be problematic due to the construction of the random beacon. That is, the beacon stability is more probable with time, that is, it is better to wait longer for the beacon. In other words, can we consider that the new topology must be usable with some delay, for example: the beacon is "stable" when it is in calculated in slot 1 of every epoch, then the topology is updated with some delay for example we assume that it is calculated in slot 2 and, usable for slot 3 and later of each epoch. Therefore, we are giving more time for network synchronization.

How does it sounds?

Also for me it is easier to reason when I we switch from thinking in seconds into thinking in slots and epochs. I see less synchronization problems ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that I understand this correctly, the intention here is to update the topology before the next epoch, so that when the new epoch starts it can already use the new topology?

Exactly

In other words, can we consider that the new topology must be usable with some delay,

The strategy you suggested is reasonable if we design topology to be updated once per consensus_epoch. But, this PR assumed that the topology is updated once per mixnet_epoch, which is shorter than consensus_epoch, because I thought the consensus_epoch is going to be very long (e.g. 5days). I thought this is what we've discussed in Notion, but I probably didn't understand it correctly (#51 (comment)). Let me think which way between these two approaches is more reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, even if we have mixnet_epoch shorter than consensus_epoch, we also need to use the newly injected entropy with some delays, as you explained.

@youngjoon-lee youngjoon-lee changed the base branch from mixnet-client to master January 23, 2024 01:51
@youngjoon-lee
Copy link
Contributor Author

Closing this, and I'm going to open another PR according to #51 (comment).

@youngjoon-lee youngjoon-lee deleted the mixnet-topology-entropy branch February 5, 2024 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants