Mike Brady edited this page Oct 10, 2016 · 12 revisions
Clone this wiki locally

Shairport Sync

Shairport Sync emulates an AirPort Express for the purpose of streaming audio from iTunes, iPods, iPhones, iPads and AppleTVs. Unlike other versions of Shairport, audio played by a Shairport Sync-powered device stays synchronised with the source and hence with similar devices playing the same source. Thus, for example, synchronised multi-room audio is possible without difficulty. (Hence the name Shairport Sync, BTW.)

The project started when I saw reviews [Accessed May 14, 2014] for the Topping TP30, a low-cost HiFi amplifier with USB input. The reviews are generally excellent and it's pretty cheap, so I thought it would be nice to get a fully digital multi-room hifi system based around the TP30 and iTunes. I already have a couple of amplifiers connected to Airport Expresses around the house, and wanted to try my luck with this approach.

To summarize, what I wanted was:

  • Fully digital end-to-end audio chain from iTunes to amplifier
  • Full audio synchronisation
  • Multiroom functionality
  • Low hassle – something that "Just Works"(TM)

My plan was to take an embedded Linux device, such as a Raspberry Pi [Accessed May 14, 2014] or an [old!] Linksys NSLU2 [Accessed May 14, 2014] (a "slug"), connect it over Ethernet or WiFi, connect it to the USB port of the TP30 and run Shairport to make it visible as an output device in iTunes. I have some experience with OpenWrt, so I went looking for Shairport on OpenWrt.

Shairport 1.0

I found a Shairport for OpenWrt at https://github.com/sm3rt/OpenWRT-ShairPort [Accessed May 14, 2014] which uses https://github.com/abrasive/shairport [Accessed May 14, 2014]. I managed to get this to compile against OpenWrt trunk with little difficulty — thanks guys! — and installed it on an NSLU2, connected up and was up and running.

It works pretty well, but has a few issues:

  • It doesn't synchronise with other audio devices, in fact it can't do audio synchronisation, so it can't reliably provide decent multi room operation. This is a deal breaker. If it won't work with other Airplay devices, it's no use. It offers an option you can mess around with – the number of buffers to look for before it starts playing. You can estimate the setting so that Shairport will start more-or-less in sync with the other devices, but the thing is, it won't stay in sync. So this sort-of works, but really sort-of doesn't work – it doesn't "Just Work".
  • It has a laggy volume control. If you don't select the 'hardware' device type, Shairport 1.0's response to changes in volume is slow – alarmingly slow, annoyingly slow.
  • There is an annoying pause delay. If you pause a piece of music in iTunes, it can take an appreciable time for Shairport 1.0 to pause.
  • There is no true mute. Shairport emulates mute by setting the volume to its lowest value. On good stereos, however, you can still hear the sound, albeit faintly.
  • Volume adjustment is poor. As you adjust the volume control, you'd expect the volume of the sound to change fairly smoothly from a very low value to the highest, but Shairport 1.0 does a very poor job of this. Okay, I'm being a bit picky here.

Overall, the real deal breaker for me is the inability to have multi room operation. Actually it's worse than that: if you have an iTunes machine in the same room as your Shairport speakers, you can't even turn on the computer's speakers as well and have a good single-room experience! The other issues detract from the overall experience, but would not be fatal on their own; they are in the "just not good enough" category.

Synchronisation in Shairport 1.0

Digging into the code of Shairport 1.0 reveals that Shairport 1.0 doesn't use the extra timing information Apple added to AirTunes. (I think this is because Shairport was written originally before the timing information was added in AirTunes 2.) The extra timing information can be used to synchronise playback accurately with the source, effectively "locking" playback to the source's timing.

Shairport 1.0 tries to get around this by measuring the average rate on reception of audio data. If it seems that the audio is arriving too quickly, then it's a sign that the clock Shairport's computer is using is running too slow, and vice versa. This information is then used to work out whether to remove or insert frames of audio into the audio stream going to the output device to keep pace with the incoming audio. In fact, Shairport 1.0 has a pretty fancy low pass filter, based on a bi-quad filter, for averaging out small variations in the rate of incoming audio data. It also has code for inserting and removing frames of audio into the audio stream.

This approach has two problems:

  • The first problem with this approach is that while the [apparent] rate of incoming audio is related to the rate at which Shairport's computer's clock is running, it is not necessarily related to the rate at which the audio card or output device is really consuming the data, as the device may be running its own, independent, clocks. So even if you get the control factor 'right' for Shairport's computer's clock, it's still useless. This undermines the viability of the approach, IMHO.
  • A less serious, but still bothersome, issue is that the 'slow' or 'fast' signal derived from the rate of incoming audio is multiplied by a control factor to govern how many frames to delete or insert. If the factor is too high, it overcompensates and you get less-than-fully-faithful audio; if too low, it undercompensates and you lose synchronisation. The real kicker is: How do you set the value of the control factor?

Shairport Sync

Shairport Sync is a fork of Shairport 1.0 and builds on it as follows:

  • It gets extra timing information from the iTunes / iOS source. A much simplified version of NTP clock synchronisation protocols is used to keep the Shairport Sync local clock in sync with the source to well within a millisecond. All timings are made relative to that local clock.
  • As well as getting more accurate timing information from the source, Shairport Sync gets accurate timing information from the output device (via ALSA in Linux), so it can work out the exact time difference between incoming and outgoing audio.
  • A true mute facility is used if it is available, again requiring facilities provided by ALSA.
  • A fairly sophisticated volume control attenuation profile is calculated so that volume rises rapidly from very low levels until it comes to about -30db, after which is rises more slowly to 0dB. This attenuation profile is applied to the hardware mixer if it provides decibel-based attenuation, and is applied to the software attenuator as well, with a -96 dB to 0 bB range (i.e. 1 to 65535 approximately). This is modelled on the attenuators described in http://tangentsoft.net/audio/atten.html [Accessed May 14, 2014]. Once more, this requires facilities provided only by ALSA.

All these changes required a fairly extensive rewrite of Shairport 1.0. The changes are so radical – especially the architecture of the player function – that IMHO they would be difficult to integrate back.

A new option is offered – latency. Latency is the exact time difference between a sound sample's timestamp, as determined by source, and the time at which it is played, as measured against the source's clock. In effect, it is the exact time from when the sound sample is sent by the source to when it is emitted by the loudspeaker. It is given in frames, where there are 44,100 frames to the second. Shairport Sync monitors the exact time frames of audio are sent to the loudspeaker and compares it with their timestamps, making adjustments to the flow of frames to keep the latency as close to its nominal value as possible. At present the limits are ± 88 frames – i.e. plus or minus two milliseconds. For reference, sound travels about 340 m/s, so could travel about 60 cm (2 feet or so) in 2 millseconds. If the latency is greater than specified, frames are pseudorandomly deleted; if it is less, interpolated frames are pseudorandomly inserted. Alternatively, with SoX support, packets containing 352 frames may be resampled to contain one more or one less frame. These "corrections" are almost inaudible, and typically average tens to about 100 of parts per million (ppm). One hundred ppm is about 10 seconds per day, and a drift of this amount would be distinctly audible after a few minutes of play.