Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Port replayer to Linux #49

Draft
wants to merge 32 commits into
base: master
Choose a base branch
from
Draft

Conversation

PoroCYon
Copy link
Contributor

@PoroCYon PoroCYon commented Aug 31, 2020

This one is absolutely not ready to be merged yet, but, I want to discuss a few things as I go along:

  • First of all, for SongRenderer and CriticalSection, I've been using pthreads and std::atomic instead of the winnt.h stuff, would it be a good idea to use std::atomic and std::thread on all platforms, or should I leave it as-is? Or might having the option to not use any multithreading at all whatsoever be desirable?
    • Looks like std::thread will be quite a bit of work to use as its API isn't as elaborated as windows/pthreads. std::atomic looks promising, though.
  • Secondly, I've currently added GCC inline asm versions of the maths helper functions, and I'm not too satisfied about how the pow stuff works, would reworking this be a good idea? (cf. switch out fpuPow variants for fpuExp2 #37)
  • Thirdly: although I haven't reached this stage at all yet, would adding Vorbis or Opus or Speex sample support to Thunder and Specimen be a good idea? They aren't very bad at low bitrates, the decoding works a lot better than GSM on Linux, and the libraries for decoding these are installed by default on Ubuntu desktop.
    • I have reached this stage now.
  • Fourthly: what about Adultery/gm.dls? Should a similar source of samples be used that's available on default Ubuntu installs? Should it be made optionally included and use a user-provided gm.dls, and not included in compo versions of WaveSabre?

P.S. if needed, I can separate the non-Linux-specific patches (eg. using std::atomic, reworking the maths routines, etc.) into separate PRs, then rebase this one on them.


Roadmap:

  • Fix inline assembly maths functions
  • Get the WAV writer to work
  • Get the prerenderer player to work
  • Get the realtime player to work
  • Get multithreaded rendering to work
  • Synchronous/explicitely NON-threaded version
  • ALSA (aplay(1)) backend
  • SDL2_Audio backend
  • Get Specimen and Thunder to work (currently disabled) (works, sortof)
  • Get Adultery to work (currently disabled) (works, sortof)
  • Add Opus/Linux-specific sample device (very Specimen-like)
  • Add Linux-specific system sample device (very Adultery-like)
  • More Linux-specific devices? (something libfftw? sample 'generator' using nonsense libopus packets, like in scaleMARK?)
  • Make the output binary small
  • Non-sucky argument parsing in the standalone player

@emoon
Copy link

emoon commented Aug 31, 2020

I would really take a look at size generated when including stuff such as std::atomic and std::thread as it may add extra overhead that isn't really needed.

@PoroCYon
Copy link
Contributor Author

Very good point, but, I can't try this myself (I don't have any Windows machines myself), so, would anyone else be willing to test this?

(std::atomic is compiled directly to cmpxchg/lock xadd instructions, from what I've seen in gdb, but, I don't know about std::thread of course.)

@PoroCYon
Copy link
Contributor Author

Hm, turns out using std::thread will need a nontrivial amount of work (w. signalling between threads) to have it working properly, as it can't shut down threads forcefully by itself.

@PoroCYon
Copy link
Contributor Author

PoroCYon commented Sep 2, 2020

Seems that only FFmpeg and libgsm (the latter kinda) support the GSM 6.10 codec WaveSabre uses, and neither are available by default on Ubuntu. Opus, Vorbis and Speex are available, however. So I'm thinking of doing the following:

  • Optionally enable GSM 6.10 decoding using FFmpeg (disabled in compo builds, always enabled on Windows using ACM)
  • Add Opus (and maybe Vorbis and Speex) support to the device and VST exporter code
    • IME it's possible to lose a few hundered bytes/sample without losing (much) in quality, with Ogg container overhead, but without intro packer. (Just comparing two files on my filesystem that are playable by FFmpeg.)
    • Sample format can probably differentiated by using different values for the first two bytes in the sample data, which is the WAVE format number for GSM anyway.
    • Opus encoding and decoding seems to be supported by Windows' Media Foundataion API as well, as of Windows 10. As it seems to perform better (size-wise) than Vorbis and Speex as well, it probably makes sense to not use the latter two.
      • So, optionally enable Opus encoding/decoding on Windows using MF, and have libopus decoding enabled by default on Linux.
      • Maybe also provide a libopus encoding fallback for earlier Windows versions?

Thoughts?

@yupferris
Copy link
Member

yupferris commented Sep 2, 2020

Thanks for looking into this. Some thoughts:

I believe some of this work may overlap some of the work that's already been done/is being done by @kusma / @alkama / others. This topic comes up sometimes in the #development channel in our slack, usually under the broader umbrella of supporting both mac and linux to some degree. I got the impression from the demoscene discord that you weren't interested in slack, which is unfortunate, as it would be nice to try and coordinate these efforts. It's not crucial, though; I'm happy to accept helpful changes either way.

When it comes to linux support in general (really any platform that's not Windows), I'm actually a bit hesitant, because neither @djh0ffman nor I currently use these platforms, nor do we plan to in the future, so we cannot test them or ensure they're always maintained. Luckily we have some CI pipelines set up as well as interested folks that could take up this burden, but we'll have to communicate that this will likely be a lower "tier" in terms of support, and might need some TLC once in awhile. That said, the core parts of WS don't really change that often anymore, so maybe it's not that big of a deal. When I see a large diff like this with lots of conditional compilation, though, I must admit I feel a bit scared/concerned.

It's absolutely paramount that @djh0ffman (especially!) and I can work effectively making new devices and releasing intros. This is the #1 priority for the project still - that logicoma as a group is able to use our tool the way we intend to. Everything else is secondary. I think it's unlikely, but if a situation ever arose where we made a decision to merge a piece of code that broke x64/linux/mac support, even temporarily, we should not hesitate to do it if it serves one of our intros. We try to keep things backwards compatible and have generally done a good job of this, so it's not like we don't care about other users, but I feel this is an important reminder nonetheless. We will rely solely on the community to maintain non-Windows (really anything that's non-our-intros) - so my hope is that by contributing this, that you understand this and are open to being available for future maintenance if possible.

Anyways, what I see mostly looks like it's headed in the right direction, so thanks again for that. It will definitely be more consumable/easier to review and consider split out as separate PR's, but I totally get that it's easier to paint in broad strokes first, so we can do that later.

My thoughts so far on specific stuff:

I'm fine with inline asm where we need it even though it's clearly not ideal; historically this has only affected the win32 version of the player and we just use standard stuff for non-size-optimized builds (eg in the VST plugs, which previous cross-platform work has been mostly focused on). Since this work focuses mostly on the replayer, which adds complication in two ways - one being that now we need to support GCC, and the other being that we may need to support x64 as well. My gut feeling is that since this only really covers a couple functions it shouldn't be that too bad and I feel that small 2x2 matrix is manageable (technically 2x3 if we still want to fall back to standard stuff for some builds - which I'd actually like not to do if it can be avoided, again to reduce maintenance burden). However I'd like to see all impls match in the end as much as they can, especially if we rework the code to do pow better (which I'm in favor of btw, see my comments on #37). I also like the idea of using inline asm over the full body, it would be nice if that change were done for the windows builds too. If it makes things easier, we could consider only adding support for x86 first or something.

I'm concerned about the threading stuff. We know the current code ends up very small and efficient on Windows, and I assume you're at least able to do some kind of packing test on linux to see that that's also the case there, but if we choose to use std stuff we need to make sure it's still small and fast on Windows. If it is, and doesn't cause linker trouble etc, I wouldn't mind doing something cross-platform as the only impl, but this needs to be checked. I'd certainly prefer if only the threading-specific changes could be split out into a separate PR so those could be looked at/tested in isolation. I understand this might be difficult, so perhaps we can split out/land other stuff in the meantime and work towards that goal if possible.

GSM6.10 is a shit codec. Since Opus is supported on Windows 10, I'd much rather see Specimen deprecated in favor of a new device (eg a Specimen 2) that uses Opus instead. I believe this is the best path forward; I don't much like the idea of supporting multiple codecs in our devices if we can avoid it.

Similarly, Thunder is already a deprecated device; while I don't think we've formalized the deprecation process beyond letting the linker optimize dead code and some preliminary work to not build deprecated VSTs, I'd much rather not support it if we decide to change sample formats. I don't think it's worth getting this to work on other platforms either as it just increases maintenance burden, so I'd rather not have that in the main repo.

I've also experimented a bit with having our own custom codec which would obviously be supported on all platforms but this work is too early to count on at this point; it's just an experiment that may never work out. We'll see.

For Adultery, I'm actually leaning towards dropping it on platforms other than Windows. Perhaps an alternative device that's a linux-only replacement could make sense. In general I like the idea of having platform-specific devices where it makes sense; my hope with WS in general is not that folks would rely only on the standard devices, but that they'd add their own crazy stuff (on branches). There's loads of untapped potential here, and I imagine that's especially true for Linux!

I don't like the idea of conditionally supporting a non-threaded player. I think this just makes things more confusing with the conditionals, and most tracks produced nowadays are very CPU-heavy; this is not going to change anytime soon, so leaning into a threaded player is the best way forward.

@PoroCYon
Copy link
Contributor Author

PoroCYon commented Sep 2, 2020

I believe some of this work may overlap some […]

I see. My wording on the Demoscene discord might've been a bit blunt, but, I'd rather keep the number of accounts I have on random websites to a minimum, especially if it's owned by an Evil™ corporation. But of course this makes coordination harder. (Initially this was just for asking for a test track, but, now it's for a different matter.)

When it comes to linux support in general […]
It's absolutely paramount that […]

That's totally fair and understandable, and it's usually been like that with many projects in my experience. I'm used to cleaning up the mess to make things run on Linux again :P

I'm fine with inline asm where we need it […]

x64 support will be needed because on Linux, you typically don't have 32-bit libraries anymore, so either you have to use 64-bit code, or circumvent the library requirement somehow.

For the Pow2/Pow4 stuff, I think it's more efficient to just always implement it as an inline function that does just return x*x; or x=x*x; return x*x;, as compilers are likely able to optimize that in the best way possible.

I don't like the idea of conditionally[…]

In my own (4k) intro framework, the graphics code runs as a 64-bit process, while it forks into a 32-bit standalone program that produces audio and then sends everything to aplay (which is a tiny terminal program that just plays raw samples from a pipe), but, that's a bit hacky of course. But it's also why I initially made a multithreading-less option as well, as it runs in a completely separate process. Although this is still a reflection of my current 4k tools, I don't know how realistic this would be in practice, especially for 64ks.

(Furthermore, smol doesn't support TLS and some pthread init stuff out of the box yet (because nobody needs that in a 4k context, for which smol is made), but that's something I should fix on my end.)

So yeah, I guess I could remove this, at least for anything that isn't my private branch.

I'm concerned about the threading stuff […]

You can scrap the std::thread stuff already, it's not that viable on first sight due to its API anyway. std::atomic does seem interesting to look into, though, as it just compiles into the required assembly instructions without emitting any function calls, idk how that works with Windows' InterlockedCompareExchange etc. But someone (who is empathically not me) needs to test this first, of course.

GSM6.10 is a shit codec. Since Opus is […]

That's indeed not a bad idea, however, I'd like to at least be able to replay some tracks that aren't made specifically for Linux (or render them to a WAV), outside of a compo context. But that's not the main goal here of course.

Also, how many new features would there be needed in a Specimen 2, aside from Opus (and a bitrate slider maybe)? (I have no clue about the answer to this question, I'm definitely not a musician, at all.) If it's not much, then it might be easier to just keep using Specimen. But then there's the argument of still including the legacy GSM codepath, of course.

Similarly, Thunder is already a deprecated device […]

So that's why Renoise crashed when Molive tried to make a test track using it for me :P . I simply took Thunder first because its code looked a little simpler, but moved to Specimen for that reason.

For Adultery, I'm actually leaning towards dropping […]

A platform-specific device definitely makes sense here (along with maybe a libfftw-based effect or something. But, anyway,), but, again, I'd like to have some sort of playback functionaly for Windows tracks in a non-compo context, as a nice-to-have thing. (Eg. gm.dls is sometimes present in a wineprefix, so it can be stolen from there as a fallback or something.)

let's see if I messed up the Windows build
because of course it did
@emoon
Copy link

emoon commented Sep 3, 2020

Just going to chip in a bit here. What I want to do is to add support for Wavesabre in my music player so this work being done is essential for me to do so, but at the same time I fully understand what Ferris saying that hacking can be done on the synth as required for shipping intro which totally makes sense.

My use case is really at the bottom of the list of what Wavesabre targets and if that can't be achived that isn't the end of the world but I think it just would be cool. A reasonable person would say: "Why not use the generated FLAC/WAV/MP3 instead?" I would say: "Now where is the fun in that?" :) But really one can do way more cool visualizations having the original data (such as tracker style playback, pianos, scopes for all channels, etc)

And yes, it will eat lots of CPU but to me that is fine. If someone can't use it because of CPU that's how it is, and the rendered version can be used instead.

Now as backwards compability has been mentioned above that is of course something I care about for my use-case otherwise I will have to deal with several version of the code-base to play specific songs and while doable it becomes kinda annoying so if it's possible to keep that I would be happy :)

@PoroCYon
Copy link
Contributor Author

PoroCYon commented Sep 9, 2020

Ok so I've sent out #50 , #51 , #52 , #53 .

After this, I'll send a PR to disable all the Windows-only stuff on not-Windows, but this depends on #52 , so I'm first waiting for that one to be merged.

Then, I'd add the patches for pthread-based threading and an SDL2/aplay backend/renderthread. However, I'm not exactly sure in which order I'd send these, as these two are more or less interdependent. So what would be the preferred approach here? Send them chunk-by-chunk (and the first one then having a broken non-Windows compile), or both in one PR, or maybe first add the SDL2/aplay renderthread with only the single-core WAV writer enabled on non-Windows, and in a following PR, add the pthread stuff which reenables the prerender and realtime players, and multithreadd WAV writing? Either way, after those PRs, I'll send the ffmpeg and wineprefix fallbacks for Thunder/Specimen and Adultery.

(I'll be keeping this branch/PR for experimentation and WIP and misc stuff, and I'll rebase it on the current master every so often.)

@@ -184,13 +188,23 @@ namespace WaveSabrePlayerLib
TrackRenderState *trackRenderStates;

int numRenderThreads;
#if defined(WIN32) || defined(_WIN32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need both of these. _WIN32 should always be set by Windows compilers, but WIN32 gets set by Windows.h IIRC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, alright, thanks, didn't know about that.

(Though, I'm using tihs branch for random testing, please check out the 4 PRs I linked to above for code that's actually mergeable.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you mean... I am answering to one of the PRs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that #50 .. #53 are mergeable, not #49 (this one) (for now). I opened this PR for some discussion + WIP stuff, I'm sending out clean(er) and mergeable patches one by one and will rebase this one on them once they're merged in.

, writerStarted(false)
#endif
{
#if !defined(HAVE_PTHREAD) || !HAVE_PTHREAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need both here. Just checking #if !HAVE_PTHREAD should be enough, I believe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious though. When would it be reasonable to have ALSA but not pthread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

And, the current Linux tooling for shrinking down dyanmically linked executables doesn't support threading yet, but that's mostly because those were made for 4ks, I just need to look into it and add the required code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but that's not the same semantics as HAVE_FOO is commonly used for in build-systems. HAVE_FOO is to say if the system has support for something or not, USE_FOO is probably what you want then ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants