WIP: Port replayer to Linux #49

PoroCYon · 2020-08-31T17:51:41Z

This one is absolutely not ready to be merged yet, but, I want to discuss a few things as I go along:

First of all, for SongRenderer and CriticalSection, I've been using pthreads and std::atomic instead of the winnt.h stuff, would it be a good idea to use std::atomic and std::thread on all platforms, or should I leave it as-is? Or might having the option to not use any multithreading at all whatsoever be desirable?
- Looks like std::thread will be quite a bit of work to use as its API isn't as elaborated as windows/pthreads. std::atomic looks promising, though.
Secondly, I've currently added GCC inline asm versions of the maths helper functions, and I'm not too satisfied about how the pow stuff works, would reworking this be a good idea? (cf. switch out fpuPow variants for fpuExp2 #37)
Thirdly: ~~although I haven't reached this stage at all yet,~~ would adding Vorbis or Opus or Speex sample support to Thunder and Specimen be a good idea? They aren't very bad at low bitrates, the decoding works a lot better than GSM on Linux, and the libraries for decoding these are installed by default on Ubuntu desktop.
- I have reached this stage now.
Fourthly: what about Adultery/gm.dls? Should a similar source of samples be used that's available on default Ubuntu installs? Should it be made optionally included and use a user-provided gm.dls, and not included in compo versions of WaveSabre?

P.S. if needed, I can separate the non-Linux-specific patches (eg. using std::atomic, reworking the maths routines, etc.) into separate PRs, then rebase this one on them.

Roadmap:

emoon · 2020-08-31T17:54:47Z

I would really take a look at size generated when including stuff such as std::atomic and std::thread as it may add extra overhead that isn't really needed.

PoroCYon · 2020-08-31T17:57:12Z

Very good point, but, I can't try this myself (I don't have any Windows machines myself), so, would anyone else be willing to test this?

(std::atomic is compiled directly to cmpxchg/lock xadd instructions, from what I've seen in gdb, but, I don't know about std::thread of course.)

PoroCYon · 2020-08-31T23:11:00Z

Hm, turns out using std::thread will need a nontrivial amount of work (w. signalling between threads) to have it working properly, as it can't shut down threads forcefully by itself.

…olutely unusable)

…zations, fix error printing in the standalone segfaulting (bad ferris!)

… renderthreads (unused yet, but will be useful for piping to alsa + select(2)/poll(2))

PoroCYon · 2020-09-02T16:59:24Z

Seems that only FFmpeg and libgsm (the latter kinda) support the GSM 6.10 codec WaveSabre uses, and neither are available by default on Ubuntu. Opus, Vorbis and Speex are available, however. So I'm thinking of doing the following:

Optionally enable GSM 6.10 decoding using FFmpeg (disabled in compo builds, always enabled on Windows using ACM)
Add Opus (and maybe Vorbis and Speex) support to the device and VST exporter code
- IME it's possible to lose a few hundered bytes/sample without losing (much) in quality, with Ogg container overhead, but without intro packer. (Just comparing two files on my filesystem that are playable by FFmpeg.)
- Sample format can probably differentiated by using different values for the first two bytes in the sample data, which is the WAVE format number for GSM anyway.
- Opus encoding and decoding seems to be supported by Windows' Media Foundataion API as well, as of Windows 10. As it seems to perform better (size-wise) than Vorbis and Speex as well, it probably makes sense to not use the latter two.
  - So, optionally enable Opus encoding/decoding on Windows using MF, and have libopus decoding enabled by default on Linux.
  - Maybe also provide a libopus encoding fallback for earlier Windows versions?

Thoughts?

yupferris · 2020-09-02T19:03:33Z

Thanks for looking into this. Some thoughts:

I believe some of this work may overlap some of the work that's already been done/is being done by @kusma / @alkama / others. This topic comes up sometimes in the #development channel in our slack, usually under the broader umbrella of supporting both mac and linux to some degree. I got the impression from the demoscene discord that you weren't interested in slack, which is unfortunate, as it would be nice to try and coordinate these efforts. It's not crucial, though; I'm happy to accept helpful changes either way.

When it comes to linux support in general (really any platform that's not Windows), I'm actually a bit hesitant, because neither @djh0ffman nor I currently use these platforms, nor do we plan to in the future, so we cannot test them or ensure they're always maintained. Luckily we have some CI pipelines set up as well as interested folks that could take up this burden, but we'll have to communicate that this will likely be a lower "tier" in terms of support, and might need some TLC once in awhile. That said, the core parts of WS don't really change that often anymore, so maybe it's not that big of a deal. When I see a large diff like this with lots of conditional compilation, though, I must admit I feel a bit scared/concerned.

It's absolutely paramount that @djh0ffman (especially!) and I can work effectively making new devices and releasing intros. This is the #1 priority for the project still - that logicoma as a group is able to use our tool the way we intend to. Everything else is secondary. I think it's unlikely, but if a situation ever arose where we made a decision to merge a piece of code that broke x64/linux/mac support, even temporarily, we should not hesitate to do it if it serves one of our intros. We try to keep things backwards compatible and have generally done a good job of this, so it's not like we don't care about other users, but I feel this is an important reminder nonetheless. We will rely solely on the community to maintain non-Windows (really anything that's non-our-intros) - so my hope is that by contributing this, that you understand this and are open to being available for future maintenance if possible.

Anyways, what I see mostly looks like it's headed in the right direction, so thanks again for that. It will definitely be more consumable/easier to review and consider split out as separate PR's, but I totally get that it's easier to paint in broad strokes first, so we can do that later.

My thoughts so far on specific stuff:

I'm fine with inline asm where we need it even though it's clearly not ideal; historically this has only affected the win32 version of the player and we just use standard stuff for non-size-optimized builds (eg in the VST plugs, which previous cross-platform work has been mostly focused on). Since this work focuses mostly on the replayer, which adds complication in two ways - one being that now we need to support GCC, and the other being that we may need to support x64 as well. My gut feeling is that since this only really covers a couple functions it shouldn't be that too bad and I feel that small 2x2 matrix is manageable (technically 2x3 if we still want to fall back to standard stuff for some builds - which I'd actually like not to do if it can be avoided, again to reduce maintenance burden). However I'd like to see all impls match in the end as much as they can, especially if we rework the code to do pow better (which I'm in favor of btw, see my comments on #37). I also like the idea of using inline asm over the full body, it would be nice if that change were done for the windows builds too. If it makes things easier, we could consider only adding support for x86 first or something.

I'm concerned about the threading stuff. We know the current code ends up very small and efficient on Windows, and I assume you're at least able to do some kind of packing test on linux to see that that's also the case there, but if we choose to use std stuff we need to make sure it's still small and fast on Windows. If it is, and doesn't cause linker trouble etc, I wouldn't mind doing something cross-platform as the only impl, but this needs to be checked. I'd certainly prefer if only the threading-specific changes could be split out into a separate PR so those could be looked at/tested in isolation. I understand this might be difficult, so perhaps we can split out/land other stuff in the meantime and work towards that goal if possible.

GSM6.10 is a shit codec. Since Opus is supported on Windows 10, I'd much rather see Specimen deprecated in favor of a new device (eg a Specimen 2) that uses Opus instead. I believe this is the best path forward; I don't much like the idea of supporting multiple codecs in our devices if we can avoid it.

Similarly, Thunder is already a deprecated device; while I don't think we've formalized the deprecation process beyond letting the linker optimize dead code and some preliminary work to not build deprecated VSTs, I'd much rather not support it if we decide to change sample formats. I don't think it's worth getting this to work on other platforms either as it just increases maintenance burden, so I'd rather not have that in the main repo.

I've also experimented a bit with having our own custom codec which would obviously be supported on all platforms but this work is too early to count on at this point; it's just an experiment that may never work out. We'll see.

For Adultery, I'm actually leaning towards dropping it on platforms other than Windows. Perhaps an alternative device that's a linux-only replacement could make sense. In general I like the idea of having platform-specific devices where it makes sense; my hope with WS in general is not that folks would rely only on the standard devices, but that they'd add their own crazy stuff (on branches). There's loads of untapped potential here, and I imagine that's especially true for Linux!

I don't like the idea of conditionally supporting a non-threaded player. I think this just makes things more confusing with the conditionals, and most tracks produced nowadays are very CPU-heavy; this is not going to change anytime soon, so leaning into a threaded player is the best way forward.

PoroCYon · 2020-09-02T20:25:38Z

I believe some of this work may overlap some […]

I see. My wording on the Demoscene discord might've been a bit blunt, but, I'd rather keep the number of accounts I have on random websites to a minimum, especially if it's owned by an Evil™ corporation. But of course this makes coordination harder. (Initially this was just for asking for a test track, but, now it's for a different matter.)

When it comes to linux support in general […]
It's absolutely paramount that […]

That's totally fair and understandable, and it's usually been like that with many projects in my experience. I'm used to cleaning up the mess to make things run on Linux again :P

I'm fine with inline asm where we need it […]

x64 support will be needed because on Linux, you typically don't have 32-bit libraries anymore, so either you have to use 64-bit code, or circumvent the library requirement somehow.

For the Pow2/Pow4 stuff, I think it's more efficient to just always implement it as an inline function that does just return x*x; or x=x*x; return x*x;, as compilers are likely able to optimize that in the best way possible.

I don't like the idea of conditionally[…]

In my own (4k) intro framework, the graphics code runs as a 64-bit process, while it forks into a 32-bit standalone program that produces audio and then sends everything to aplay (which is a tiny terminal program that just plays raw samples from a pipe), but, that's a bit hacky of course. But it's also why I initially made a multithreading-less option as well, as it runs in a completely separate process. Although this is still a reflection of my current 4k tools, I don't know how realistic this would be in practice, especially for 64ks.

(Furthermore, smol doesn't support TLS and some pthread init stuff out of the box yet (because nobody needs that in a 4k context, for which smol is made), but that's something I should fix on my end.)

So yeah, I guess I could remove this, at least for anything that isn't my private branch.

I'm concerned about the threading stuff […]

You can scrap the std::thread stuff already, it's not that viable on first sight due to its API anyway. std::atomic does seem interesting to look into, though, as it just compiles into the required assembly instructions without emitting any function calls, idk how that works with Windows' InterlockedCompareExchange etc. But someone (who is empathically not me) needs to test this first, of course.

GSM6.10 is a shit codec. Since Opus is […]

That's indeed not a bad idea, however, I'd like to at least be able to replay some tracks that aren't made specifically for Linux (or render them to a WAV), outside of a compo context. But that's not the main goal here of course.

Also, how many new features would there be needed in a Specimen 2, aside from Opus (and a bitrate slider maybe)? (I have no clue about the answer to this question, I'm definitely not a musician, at all.) If it's not much, then it might be easier to just keep using Specimen. But then there's the argument of still including the legacy GSM codepath, of course.

Similarly, Thunder is already a deprecated device […]

So that's why Renoise crashed when Molive tried to make a test track using it for me :P . I simply took Thunder first because its code looked a little simpler, but moved to Specimen for that reason.

For Adultery, I'm actually leaning towards dropping […]

A platform-specific device definitely makes sense here (along with maybe a libfftw-based effect or something. But, anyway,), but, again, I'd like to have some sort of playback functionaly for Windows tracks in a non-compo context, as a nice-to-have thing. (Eg. gm.dls is sometimes present in a wineprefix, so it can be stolen from there as a fallback or something.)

let's see if I messed up the Windows build

because of course it did

…aders, not exactly good practice

I hope the VSTs are finally happy now

emoon · 2020-09-03T15:15:13Z

Just going to chip in a bit here. What I want to do is to add support for Wavesabre in my music player so this work being done is essential for me to do so, but at the same time I fully understand what Ferris saying that hacking can be done on the synth as required for shipping intro which totally makes sense.

My use case is really at the bottom of the list of what Wavesabre targets and if that can't be achived that isn't the end of the world but I think it just would be cool. A reasonable person would say: "Why not use the generated FLAC/WAV/MP3 instead?" I would say: "Now where is the fun in that?" :) But really one can do way more cool visualizations having the original data (such as tracker style playback, pianos, scopes for all channels, etc)

And yes, it will eat lots of CPU but to me that is fine. If someone can't use it because of CPU that's how it is, and the rendered version can be used instead.

Now as backwards compability has been mentioned above that is of course something I care about for my use-case otherwise I will have to deal with several version of the code-base to play specific songs and while doable it becomes kinda annoying so if it's possible to keep that I would be happy :)

PoroCYon · 2020-09-09T16:25:51Z

Ok so I've sent out #50 , #51 , #52 , #53 .

After this, I'll send a PR to disable all the Windows-only stuff on not-Windows, but this depends on #52 , so I'm first waiting for that one to be merged.

Then, I'd add the patches for pthread-based threading and an SDL2/aplay backend/renderthread. However, I'm not exactly sure in which order I'd send these, as these two are more or less interdependent. So what would be the preferred approach here? Send them chunk-by-chunk (and the first one then having a broken non-Windows compile), or both in one PR, or maybe first add the SDL2/aplay renderthread with only the single-core WAV writer enabled on non-Windows, and in a following PR, add the pthread stuff which reenables the prerender and realtime players, and multithreadd WAV writing? Either way, after those PRs, I'll send the ffmpeg and wineprefix fallbacks for Thunder/Specimen and Adultery.

(I'll be keeping this branch/PR for experimentation and WIP and misc stuff, and I'll rebase it on the current master every so often.)

kusma · 2020-09-10T11:41:32Z

WaveSabrePlayerLib/include/WaveSabrePlayerLib/SongRenderer.h

@@ -184,13 +188,23 @@ namespace WaveSabrePlayerLib
 		TrackRenderState *trackRenderStates;

 		int numRenderThreads;
+#if defined(WIN32) || defined(_WIN32)


You shouldn't need both of these. _WIN32 should always be set by Windows compilers, but WIN32 gets set by Windows.h IIRC.

Ah, alright, thanks, didn't know about that.

(Though, I'm using tihs branch for random testing, please check out the 4 PRs I linked to above for code that's actually mergeable.)

I'm not sure I understand what you mean... I am answering to one of the PRs...

I meant that #50 .. #53 are mergeable, not #49 (this one) (for now). I opened this PR for some discussion + WIP stuff, I'm sending out clean(er) and mergeable patches one by one and will rebase this one on them once they're merged in.

kusma · 2020-09-10T11:43:15Z

WaveSabrePlayerLib/src/AplayRenderThread.cpp

+		, writerStarted(false)
+#endif
+	{
+#if !defined(HAVE_PTHREAD) || !HAVE_PTHREAD


You don't need both here. Just checking #if !HAVE_PTHREAD should be enough, I believe.

I'm curious though. When would it be reasonable to have ALSA but not pthread?

Ok.

And, the current Linux tooling for shrinking down dyanmically linked executables doesn't support threading yet, but that's mostly because those were made for 4ks, I just need to look into it and add the required code.

Right, but that's not the same semantics as HAVE_FOO is commonly used for in build-systems. HAVE_FOO is to say if the system has support for something or not, USE_FOO is probably what you want then ;)

PoroCYon force-pushed the try2 branch from 0eaf80e to 09a4b64 Compare August 31, 2020 22:24

PoroCYon added 14 commits September 1, 2020 01:39

move stuff around so that it at least compiles on linux (but it's abs…

110a5ef

…olutely unusable)

add inline assembly stuff (from earlier porting tries), enable optimi…

2ce9ca6

…zations, fix error printing in the standalone segfaulting (bad ferris!)

factor out the actual platform-specific renderthread/player stuff

31885d1

remove need for -fpermissive flag, optional foreground work stuff for…

c068941

… renderthreads (unused yet, but will be useful for piping to alsa + select(2)/poll(2))

w00t the WAV writer works now

d781da5

prerender player on linux (untested)

941f035

fix cmake error

78a9e68

fix more cmake errors

8e3c7ad

aplay prerender works, tackling multithread now

284eec2

aplay somewhat working, debugging misc issues now

b8cb3c3

asm maths issues debugging ep.1

f320448

asm maths issues debugging ep.2

0b5ebc6

right, windows...

3cc35a7

properly fix all aplay RT stuff

8743101

PoroCYon force-pushed the try2 branch from c74d60e to 8743101 Compare August 31, 2020 23:41

PoroCYon added 6 commits September 1, 2020 02:17

wow, an argument parser that's actually helpful!

4831bfc

proper thread stopping

1ca8809

SDL backend

b1bfd75

synchronous/non-threaded option

cd1a918

fix windows cmake once more

b39a67a

fix assembly maths functions

a8061ee

PoroCYon added 2 commits September 3, 2020 03:41

factor out sample loading logic from Thunder and Specimen

27d5277

let's see if I messed up the Windows build

it broke the Windows bulid

38d1a95

because of course it did

PoroCYon added 6 commits September 3, 2020 03:49

psh, vst sources relying on the core to include all the dependency he…

56344f6

…aders, not exactly good practice

more of the same

bd872fb

and move some stuff back to the headers

586aea0

I hope the VSTs are finally happy now

oops please don't look at the previous commit

876f0cb

saving unfinished horror code for now

fabbe20

oh no it's starting to work

e31e269

PoroCYon added 4 commits September 3, 2020 20:39

forgot the extra wavefmtex info

f9a24b2

extremely ugly gm.dls stuff

cd412c1

and fix the bugs

0c17f77

stuff

c23e940

kusma reviewed Sep 10, 2020

View reviewed changes

This was referenced Sep 11, 2020

Cleaner/nicer to use command-line interface (+ help text) for the standalone player #53

Closed

Use std::atomic instead of the windows-only primitives for atomic operations in the song renderer #50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Port replayer to Linux #49

WIP: Port replayer to Linux #49

PoroCYon commented Aug 31, 2020 •

edited

Loading

emoon commented Aug 31, 2020

PoroCYon commented Aug 31, 2020

PoroCYon commented Aug 31, 2020

PoroCYon commented Sep 2, 2020 •

edited

Loading

yupferris commented Sep 2, 2020 •

edited

Loading

PoroCYon commented Sep 2, 2020

emoon commented Sep 3, 2020

PoroCYon commented Sep 9, 2020 •

edited

Loading

kusma Sep 10, 2020

PoroCYon Sep 10, 2020

kusma Sep 10, 2020

PoroCYon Sep 10, 2020

kusma Sep 10, 2020

kusma Sep 10, 2020

PoroCYon Sep 10, 2020

kusma Sep 10, 2020

WIP: Port replayer to Linux #49

Are you sure you want to change the base?

WIP: Port replayer to Linux #49

Conversation

PoroCYon commented Aug 31, 2020 • edited Loading

emoon commented Aug 31, 2020

PoroCYon commented Aug 31, 2020

PoroCYon commented Aug 31, 2020

PoroCYon commented Sep 2, 2020 • edited Loading

yupferris commented Sep 2, 2020 • edited Loading

PoroCYon commented Sep 2, 2020

emoon commented Sep 3, 2020

PoroCYon commented Sep 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PoroCYon commented Aug 31, 2020 •

edited

Loading

PoroCYon commented Sep 2, 2020 •

edited

Loading

yupferris commented Sep 2, 2020 •

edited

Loading

PoroCYon commented Sep 9, 2020 •

edited

Loading