Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sound Synchronization #14431

Closed
Warr1024 opened this issue Mar 3, 2024 · 9 comments
Closed

Sound Synchronization #14431

Warr1024 opened this issue Mar 3, 2024 · 9 comments
Labels
@ Client / Audiovisuals Duplicate Feature request Issues that request the addition or enhancement of a feature Sounds

Comments

@Warr1024
Copy link
Contributor

Warr1024 commented Mar 3, 2024

Problem

When playing sounds in Minetest, the timing of sounds is entirely dependent on when the network packet requesting those sounds arrives at the client side. This means that:

  • Things that require tight timing, such as echoing effects, simultaneously played sounds with multiple sources, or music, are not reliable over a network.
  • Sounds can only be played at server step times, so e.g. if the server step size if 0.09 seconds, there is no way to play two sounds that are exactly 0.02 seconds apart.
  • If a lot of sounds need to be played at the same time, or in a very short time interval, those packets cannot be sent ahead of time to spread them out, they need to be crammed down the network all at once, which may be a problem for some connections.

Solutions

  • When the client and server negotiate a connection, at some point, a per-player "stopwatch" is started on both sides, which counts forward in realtime.
    • It's not important that the timers be exactly synchronized, but just that the client's timer is behind the server's timer by an amount about equal to the network delay.
  • sound_play specifications have a new "delay" number that can be specified.
    • If the delay is nil, then the sound is played by the client as specified as soon as it receives and processes the packet, similar to the way things work now.
    • If the delay is a number (including zero) then the sound is played "synchronized" to the timer.
      • The server adds its stopwatch value to the delay value and includes this timestamp in the sound play packet.
      • When the client receives the packet, it will try to play the sound starting at the time specified in it, relative to its own timer.
        • If the requested time is after the current timestamp, then the sound will be delayed until that time.
        • If the requested time is before the current timestamp, it will be played immediately, and the early portion "cropped off", as if start_time was used to skip it (in addition to any existing start_time value).
      • It is the modder's responsibility to ensure that time-synchronized sounds are sent with adequate time for the client to receive them and play them as intended, to the extent possible (e.g. send music notes early with a positive delay to ensure the start of notes is not cut off).

Alternatives

None.

Additional context

Use-cases:

  • Modders/gamedevs and players alike have both wanted to program music by playing sounds in timed sequences instead of playing a single long sound file.
    • Procedural music playing would allow music to change dynamically during gameplay.
      • Example: Klots plays music in all levels except the "introductory" one (i.e. once you've entered space). This music is procedurally generated and infinite in length, but it needs to be played very slowly in order to hide network timing jank.
    • Sending the media for individual instrument notes and varying pitch can save a lot of media transfer time and CDB download time compared to pre-baked music.
      • Example: Klots has background music, and some musical effects (e.g. fanfare when you complete certain puzzles) that only uses a single instrument file, thus keeping the download size small (1.6MB) compared to other games that have music, such as ColourHop (11MB) or Piranesi Restoration (13MB).
    • Procedural music can be created and edited by players during gameplay.
  • Environmental sound effects, such as reflections (echoes)
    • This is used to an extent in games like Citadel (for the ghost voice) or Velvet Crystal (some of the last boss' voice lines) to give speech by supernatural characters an otherworldy, space-filling effect.
    • I would like to be able to do some sound reflection simulation off nearby surfaces and play "accurate" reflections in order to make in-game objects more aurally "solid", but have not attempted this in a game yet because it's not feasible without this feature.
      • This could have a huge benefit for a game like Veil of the Unknown, which has no visuals and relies heavily on sound cues.
@Warr1024 Warr1024 added the Feature request Issues that request the addition or enhancement of a feature label Mar 3, 2024
@Desour
Copy link
Member

Desour commented Mar 3, 2024

Duplicate of: #10306

But good description of the issue!

@Desour Desour closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2024
@Warr1024
Copy link
Contributor Author

Warr1024 commented Mar 3, 2024

#10306 doesn't cover the synchronization aspect of this issue, which is actually the most important part. It's mentioned only briefly in passing in #10306 (comment) but never made it into the issue proper.

@Desour
Copy link
Member

Desour commented Mar 3, 2024

How I understood #10306, it's about precise delays between sound playbacks. If you use a delay of 0, so to speak, both sounds play at the same time, aka are synchronized. Idk what else that issue would be requesting. Did I misunderstand something?

@Warr1024
Copy link
Contributor Author

Warr1024 commented Mar 4, 2024

If I play a sound with delay 0, and then 5 seconds later, I play another sound with delay 0, I want them to play 5 seconds apart on the client side regardless of the amount of time each packet took to traverse the network. I don't just want each sound to play when they arrive at the destination.

A key thing that the old issue misses is that it's possible to receive a sound packet AFTER the sound is supposed to start playing, in which case the sound needs to be started past the beginning of the sound file to avoid the rest of the sound being delayed (you can't make up for the portion you missed).

I also proposed keeping compatibility, so that if a delay is nil, we keep the old behavior of playing sounds at the earliest possible opportunity rather than forcing them to sync by timestamp.

@Desour
Copy link
Member

Desour commented Mar 4, 2024

If I play a sound with delay 0, and then 5 seconds later, I play another sound with delay 0, I want them to play 5 seconds apart on the client side regardless of the amount of time each packet took to traverse the network. I don't just want each sound to play when they arrive at the destination.

That's what the other issue is about.

A key thing that the old issue misses is that it's possible to receive a sound packet AFTER the sound is supposed to start playing, in which case the sound needs to be started past the beginning of the sound file to avoid the rest of the sound being delayed (you can't make up for the portion you missed).

That's an important detail. Still, I think it's essentially the same issue, so merging them makes sense.

I also proposed keeping compatibility, so that if a delay is nil, we keep the old behavior of playing sounds at the earliest possible opportunity rather than forcing them to sync by timestamp.

Keeping compatibility is required for all new features.

The other issue speaks more about relative time offsets than global timestamps, and keeps the solution more open, maybe that caused some confusion.
I'm fine with any good solution.

@Desour Desour changed the title Sound Syncrhonization Sound Synchronization Mar 4, 2024
@sfan5
Copy link
Member

sfan5 commented Mar 4, 2024

I was actually thinking about this just yesterday and the issue is way bigger then just sounds.

Minetest needs a general synchronized time between the server and clients.
The network protocol provides ordering guarantees for reliable packets but everything else in Minetest is written with the assumption that there is no latency and no reordering.

This affects:

  • client to server player position, interactions
  • animations of any kind that have a defined start point (bone, model, sprite, ...)
  • server to client object movement
  • time of day
  • server to client add_velocity
  • playing and fading sounds
  • particles
  • HUD changes

These packets should be timestamped so the other party can take network latency into account when handling these events.
This would have two purposes:

  • (In some cases) to provide ordering: First I get a HUD_CHANGE packet with ts=10, next I get one with ts=9. This one should be ignored silently.
  • To correct for latency: If the server wanted to play a sound at t=10 and it is now t=11, the first second of it should be skipped.

The name I propose for this is "global timestamp" or "gts" for short. There's no need for it to be per-player either.

As for synchronization we probably have to implement whatever algorithm NTP uses, because if it's off too much the whole approach goes bad.
This is also the reason why this can't be naively implement using the current RTT estimates our networking code provides.


I also proposed keeping compatibility, so that if a delay is nil, we keep the old behavior of playing sounds at the earliest possible opportunity rather than forcing them to sync by timestamp.

Honestly I think calling this "backwards compatibility" is wrong. The old behavior is very broken.
I can see the need for disabling the lag compensation (for sounds!) so I would simply make it opt-out.

#10306

If general lag compensation is implemented you would also not need the complexity of core.sound_sync() proposed here.
If the global timestamp is a continuous timeline that follows e.g. get_us_time() all you really need is a delay parameter and the modder can have his fun calculating the right offset.

If we want to facilitate bulk-playing of sounds (aka sheet music), it would still be more efficient to add a command that can transfer and entire list of sounds + delays to the client at once.

@paradust7
Copy link
Contributor

paradust7 commented Mar 5, 2024

@sfan5
There's no need for global synchronization, because timestamps are only sent from server to client. If the server includes a timestamp[1] with every event, then the client can purposefully maintain an average delay between when events are received and when they are displayed, as if playing a buffered video feed. Only the relative spacing between timestamps matters to the client.

[1] or tick counter, for fixed interval ticks

EDIT: To clarify, you'll need two timestamps from the server when there's an event scheduled in the future rather than immediately. One for when the packet was sent, and one for when the event is scheduled.

@Warr1024
Copy link
Contributor Author

Warr1024 commented Mar 5, 2024

We don't need to account for clock drift over time, we should be able to assume that clocks on both ends of the connection tick at a rate that is close enough for our purposes, and manage their own time synchronization. Reimplementing any portion of NTP is probably out of scope for a game engine.

For music, echoes, and other timed effects, it's less important than the time between sounds is exact as that it's consistent, and if I ask for an N second delay, it's never perceptibly different than any other N second delay between any other sounds.

Just a zero-order model, where the client and server start their timers based on a single server to client packet, should be sufficient for an MVP of this. We should try to solve other problems only if there's evidence they exist, or else this will get scope creeped into infeasibility before we even have an implementation and we'll be stuck with sound-rich games being SP-only.

@sfan5
Copy link
Member

sfan5 commented Mar 5, 2024

There's no need for global synchronization, because timestamps are only sent from server to client.

So you wouldn't try to latency compensate the movement the client sends to the server? why not?

Reimplementing any portion of NTP is probably out of scope for a game engine.

To be clear I was suggesting implementing exactly as many parts of (e.g.) NTP needed so the client and server can establish a common time that's not off by 1-300ms. Ignoring clock drift is okay.

Just a zero-order model, where the client and server start their timers based on a single server to client packet, should be sufficient for an MVP of this.

I bet that would work well enough for sounds but I'm not convinced that this is the proper way to do it if you wanted to also use it for the other points on my list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@ Client / Audiovisuals Duplicate Feature request Issues that request the addition or enhancement of a feature Sounds
Projects
None yet
Development

No branches or pull requests

5 participants