Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universe Generation Crash / Bad Version String when using Slackware 14.1 Custom Build Script #323

Closed
ghost opened this issue Aug 31, 2015 · 99 comments
Labels
category:bug The Issue/PR describes or solves a perceived malfunction within the game.

Comments

@ghost
Copy link

ghost commented Aug 31, 2015

freeorion 0.4.5-RC1

The game doesn't work.

I start it by typing freeorion in command line. It launches, gets to the main menu. I select "Quick Start". After that the Messages window prints a "Creating AI Clients" message, then a "Generating Universe" message and after that an error window saying "The connection to the server has been lost".

Sreenshot

@geoffthemedio
Copy link
Member

What OS are you using? The version string / application title are oddly broken.

Please clear the contents of the directory described here:

http://freeorion.org/index.php/Config.xml

Then attempt to start a game with 1 AI and otherwise default settings.

After it crashes, go to the same directory and locate all the .log files, and post them somewhere (eg. pastebin) and link to them in a reply.

@ghost
Copy link
Author

ghost commented Aug 31, 2015

I using Slackware 14.1 x86_64 'stable'

Delete the folder ~/.freeorion and i restart the game again and the problem is the same.

These are the logs:
AI_X logs
Config.xml
freeorion.log
freeoriond.log

@geoffthemedio
Copy link
Member

How did you build FreeOrion? Given the broken version string in the screenshot, you might not have run CMake properly.

Aside from that, much of the galaxy setup seems to have run OK, but it ends suddenly after adding starting buildings.

@ghost
Copy link
Author

ghost commented Aug 31, 2015

Maintain this game for Slackware.

This is the build script

@geoffthemedio
Copy link
Member

How are you running FreeOrion? From what directory?

The crash appears to be happening when the Python scripts setting up the player homeworlds tries to load "starting_buildings.txt" from the resource directory. Given that you have weird errors in the version string, I'm guessing you haven't built and installed FreeOrion as it is expecting, so it can't find that file.

@geoffthemedio
Copy link
Member

Is that build script running CMake? If not, you might need to modify it to do everything that the CMake script is doing, including putting content files where they're expected.

Alternatively, adding the resource directory to the path so the Python scripts can find "starting_buildings.txt" and similar files it might help. Apparently the in-Python path resolution isn't working the same as the in C++ equivalents, for you.

@ghost
Copy link
Author

ghost commented Aug 31, 2015

A solution would you suggest ?

@Vezzra Vezzra added the category:bug The Issue/PR describes or solves a perceived malfunction within the game. label Aug 31, 2015
@geoffthemedio
Copy link
Member

Use CMake to build and install instead of whatever that Slackware-specific script is doing.

Or try adding the resources directory to the system path. That might be the easiest / simplest, though might just reveal another related problem given the odd installation.

Or wait for someone else who knows more about the CMake build system and how it works on Linux or Slackware in particular to comment.

That all said, there should be some error reporting if that string list load fails, and I don't know why you wouldn't see that in the server log file, so there might be something else going on.

@geoffthemedio geoffthemedio changed the title The connection to the server has been lost Universe Generation Crash / Bad Version String when using Slackware Custom Build Script Aug 31, 2015
@Vezzra
Copy link
Member

Vezzra commented Aug 31, 2015

Ok, a few things:

  • @dslackw: Don't use the source tarball github creates automatically. For Linux distros we provide a manually created source tarball which contains an already correctly generated version.cpp. The one for RC1 is FreeOrion_v0.4.5-RC1_2015-08-24.5fc20ff_Source.tar.gz (you may want to wait a few hours, today evening I will produce RC2, I suggest trying again with that). When you use these source tarballs, you can remove the lines from the build script which copy util/version.cpp.in to util/version.cpp (or better: you must, otherwise the pre-generated version.cpp is overwritten with the template file!).
  • As far as I can glance from the build script, it definitely calls cmake, so the freeorion build scripts should be executed correctly.
  • I doubt that the issue here is that the Python script can't find and open this file. That should produce a respective error message in the log, not a crash like this. The problem is, we can't be sure that this operation is actually the one which causes the crash, because in the past there have been cases where some portion of the log file had been lost in case of such a crash. I know we use another logging library now, but still, I don't trust these things entirely. One way to check that is to try to run FO several times and examine freeoriond.log afterwards. If the last line in the log is always the same, that would indicate that indeed this is the operation where the crash occurs. If it varies, well, we've got a problem...

Well, whatever, first thing I'd suggest doing now is trying again with RC2 which is going to be released today, and this time using the manually created source tarball. Second, if FO still crashes (which is probably to be expected), launch it several times, check freeoriond.log if the last lines are always the same or if they vary, and report back.

@ghost
Copy link
Author

ghost commented Aug 31, 2015

Ok... Thank you,
I will wait for the RC2 released and try again to build.

@geoffthemedio
Copy link
Member

Boost::Log is being set to auto_flush = true which should result in logs being written immediately. ("should")

Another thing to try is adding
'print "Loaded and added buildings"

to line 427 of empire.py to see if it crashes before or after finishing adding the buildings.
https://github.com/freeorion/freeorion/blob/master/default/python/universe_generation/empires.py#L427

@Vezzra
Copy link
Member

Vezzra commented Aug 31, 2015

Hm, I think it would make more sense to add

print "...adding building", building

between line 425 and 426, as part of the loop, like this:

    for building in load_string_list(os.path.join(fo.get_resource_dir(), "starting_buildings.txt")):
        print "...adding building", building
        fo.create_building(building, homeworld, empire)

because I don't think adding a print statement after the loop has finished will help much - after all, there is a print statement on the next line that apparently didn't get executed, which already indicates that the loop never finished.

Adding the print statement within the loop will give us some more precise info on where exactly the crash happens - if the print statement never executes, then it's the load_string_list call that fails, otherwise it's the fo.create_building call, and we'll even get to know at which building this call fails.

@Vezzra
Copy link
Member

Vezzra commented Aug 31, 2015

RC2 is up now.

@ghost
Copy link
Author

ghost commented Aug 31, 2015

Try to build without rename mv util/Version.cpp.in util/Version.cpp

http://pastebin.com/gKRsDhRg

@ghost
Copy link
Author

ghost commented Aug 31, 2015

Now i try to build from sourceforge tarball https://sourceforge.net/projects/freeorion/files/FreeOrion/FreeOrion%20Version%200.4.5/

@ghost
Copy link
Author

ghost commented Aug 31, 2015

Build normal from sourceforge tarball but the error is the same "The connection to the server has been lost".
Screenshot

@geoffthemedio
Copy link
Member

Wouldn't have expected otherwise. See above suggestions to try adding logging to the building adding part of the Python universe generation.

@ghost
Copy link
Author

ghost commented Aug 31, 2015

freeoriond.log

@geoffthemedio
Copy link
Member

What did you change to produce the new log?

@ghost
Copy link
Author

ghost commented Aug 31, 2015

After building the release version RC2 deleted directory ~ / .freeorion and ran the FO around 7 times.

@ghost
Copy link
Author

ghost commented Aug 31, 2015

This is latest FO configuration http://pastebin.com/5wpS7np8

@geoffthemedio
Copy link
Member

Please try adding the logging line suggested by Vezzra above, to narrow down the location of the crash.

@Vezzra
Copy link
Member

Vezzra commented Sep 1, 2015

@dslackw, I produced a special source tarball with the additional logging commands, you can download it here (please note: this link only works for a week).

Try again with that source tarball and post the contents of freeoriond.log after the crash.

@ghost
Copy link
Author

ghost commented Sep 1, 2015

Thank you, i will try it now...

@ghost
Copy link
Author

ghost commented Sep 1, 2015

@Vezzra
Copy link
Member

Vezzra commented Sep 1, 2015

Apparently you didn't use the source tarball I linked to (the server log contains error messages that are caused by a bug that has been fixed in the version of this source tarball), and also didn't remove the lines from your build script that copy util/version.cpp.in to util/version.cpp.

Please, try again and use this source tarball: FreeOrion_2015-09-01.e3ab1c5_Source.tar.gz. This is a special tarball with additional logging commands, which are not contained in the tarballs you can download from github or sourceforge. You have to use the link I provided.

And please also remove these lines from your build script:

# Fix cmake target for version file
mv util/Version.cpp.in util/Version.cpp

Otherwise the version string isn't correctly set, and we can't verify if the log files you provided really have been generated by the correct build.

@ghost
Copy link
Author

ghost commented Sep 1, 2015

Use, maybe something went wrong... no problem i try again...

@ghost
Copy link
Author

ghost commented Sep 1, 2015

I removed folder /build, source code and package, download again and build so:

freeorion.log
freeoriond.log

@geoffthemedio
Copy link
Member

try removing the line
description = "@1@"
?

@pitchforks
Copy link
Contributor

@geoffthemedio vanilla shared_macros.txt without this line still crashes. But as said above, freeorion does not crash with this version (it's version 2 above) of the file. And this version has this line.

@geoffthemedio
Copy link
Member

I know what you wrote above, but it doesn't make sense that this macro definition would cause problems just with parsing buildings.txt and only with the following macro also present. Particularly so since the DESCRIPTION_EFFECTSGROUP_MACRO macro isn't even used in buildings.txt or any of the files it includes. More so when nobody else has reported this problem except when also building/running on Slackware.

So, I'm trying to figure out if there's something else going on, like the fully-included buildings.txt being too long, or something unusual about the identified macros being a source of the problem, like the @1@ line.

I would again/still like to know what happens with a newer version of Boost.

@geoffthemedio
Copy link
Member

Any effect from removing the comment starting with // This Effect is ?

@pitchforks
Copy link
Contributor

@geoffthemedio

like the fully-included buildings.txt being too long

Do you mean the content from buildings.txt itself and the content from the files it includes at the bottom? If so, I can see your point. Though, since I don't understand the inner works of the parsing process - shouldn't a too big content cause out of memory errors rather than segfaults? I'm not noticing any excessive memory usage when running freeorion the regular way. I did notice excessive memory usage when I was running freeoriond with gdb, but I'm not sure that's related - those happened because of debug symbols enabled in all freeorion code were generating lots of additional backtrace details I suppose?

I would again/still like to know what happens with a newer version of Boost.

Perhaps @dslackw can repeat all these tests in his Slackware -current instance, I can only do the tests for 14.1 and boost 1.54

Any effect from removing the comment starting with // This Effect is ?

vanilla shared_macros.txt with just this comment removed still produces a negative result

@Vezzra
Copy link
Member

Vezzra commented Sep 11, 2015

Sorry for reporting back only now, but I've been trying to install a Slackware 14.1 system in a VM and build and run FO there. Me being the Linux noob I am, that took some trial and error, but right now FO is compiling (hopefully it will complete successfully, that remains to be seen). @dslackw, @pitchforks, assuming I'll get the same crashes as you, I might need some assistance/instructions at installing a newer boost version.

@Vezzra
Copy link
Member

Vezzra commented Sep 11, 2015

Good and bad news. Good news: FO built successfully. Bad news: when I launch FO, I get the following error messages on the command line:

ALSA lib confmisc.c:768:(parse_card) cannot find card '0'
ALSA lib conf.c:4248:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory
ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings
ALSA lib conf.c:4248:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
ALSA lib confmisc.c:1251:(snd_func_refer) error evaluating name
ALSA lib conf.c:4248:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4727:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM default
AL lib: (EE) ALCplaybackAlsa_open: Could not open playback device 'default': No such file or directory
Segmentation fault

I guess that might have to do with running inside a VM and not having sound? How can I work around that?

@geoffthemedio
Copy link
Member

try blanking buildings.txt; perhaps the segfault is the topic of this issue, and the AL lib issues are unrelated?

@Vezzra
Copy link
Member

Vezzra commented Sep 11, 2015

Oh, sorry, these error messages have nothing to do with the issue itself. They occur immediately when I try to launch the human client and are printed to the console, freeorion.log contains only two lines. I've been posting them here in hope that @dslackw or @pitchforks might have any ideas what's going wrong here and how I can work around it (as it looks like something doesn't work right here with the alsa lib installation on Slackware 14.1).

@pitchforks
Copy link
Contributor

@Vezzra

Are you running freeorion as regular user that you eventually did not add to the audio group?

@Vezzra
Copy link
Member

Vezzra commented Sep 12, 2015

@pitchforks: yes and yes, I did not add that user to the audio group. Doing this didn't help though, I still get the same error messages.

However, I think @geoffthemedio might be right after all, at least insofar as the segfault might not be related to the alsa lib issues. Unfortunately, as I already said, the failure occurs very early during program startup, freeorion.log only contains two lines which give no hint at all at the cause of the problem.

Which means I'm most likely screwed, as my know how is far to limited to tackle this kind of issue on a Linux system. I can only tell what I did to get FO built, maybe I did something wrong.

The main problem for me was to get the dependencies on SDL2 and OpenAL installed. slackpkg didn't get me far, because its search function didn't yield any packages for SDL2 or OpenAL. So I went to pkgs.org, where I found .txz packages for both libraries. Installing them with upgradepkg --install-new went absolutely smoothly, after that the cmake && make && make install procedure went without any hassles. Note: I ran cmake with the same options it's invoked with in the build script @dslackw posted above.

Running the resulting freeorion gives me the error messages I posted above.

@pitchforks
Copy link
Contributor

@Vezzra diving into an unknown Linux distro without much prior experience is probably as challenging as trying to build freeorion without knowing the codebase and generally much programming - and hitting a nasty bug like this one :)

So I went to pkgs.org

This is certainly not the best approach, but I won't comment on it to avoid derailing the topic a lot.

Perhaps we can find a way to deal with these long delays between posts here by chatting interactively? I was pointed to a forum thread where you say you don't do IRC, it can be Jabber or tox then. If you're willing to push this further, perhaps we can catch eachother online somehow?

I'm not thinking about any arranged meeting at certain hours, just that it would be nice if we can talk interactively when we happen to be both online.

@Vezzra
Copy link
Member

Vezzra commented Sep 13, 2015

diving into an unknown Linux distro without much prior experience is probably as challenging as trying to build freeorion without knowing the codebase and generally much programming - and hitting a nasty bug like this one

And not to forget, only having so much time I can allocate to this problem... 😉

So I went to pkgs.org

This is certainly not the best approach, but I won't comment on it to avoid derailing the topic a lot.

Yeah, I've already been wondering how far I should continue discussing this side-issue here. You're probably right, we should continue this elsewhere. I can open a thread on our forum, you'll need to register there.

Perhaps we can find a way to deal with these long delays between posts here by chatting interactively? I was pointed to a forum thread where you say you don't do IRC, it can be Jabber or tox then.

Hm, the problem is, the only IM program I'm actually using is Skype, and even there I'm only online when I actually need it. Furthermore, for purposes like these I only use the account of my cyber-alter-ego Vezzra (I try to avoid revealing my real identity), with which I'm almost never online - pratically only when making specific appointments (which does happen, I already had chat sessions for FO purposes).

But we should probalbly continue this discussion too elsewhere. As github lacks PMs, either per email (a public email should be visible at my profile), or on our forum via PM?

@Vezzra
Copy link
Member

Vezzra commented Sep 16, 2015

Ok, I've found a fix for the issue. @geoffthemedio was spot on:

you might have a compiler issue with the parser code

That's apparently exactly what's going on here, gcc somehow messes something up in the parser code.

Fortunately, Slackware 14.1 also ships with clang/llvm, and when you build FO with that toolchain everything works fine. The question now is, if it's mandatory for the build script to use gcc, or if switching to clang/llvm is ok. The following adjustments to the build script are required:

  1. Add the following lines somewhere at the beginning of the script, before the invocation of cmake:
export CC=/user/bin/clang
export CXX=/usr/bin/clang++

CMake honors those environment variables and consequently uses clang instead of gcc.

  1. Add the following option to the invocation oc cmake: -D_CMAKE_TOOLCHAIN_PREFIX=llvm-, so the cmake invocation looks like this (important: please note the underscore between the '-D' and 'CMAKE' parts!):
  cmake \
    -D_CMAKE_TOOLCHAIN_PREFIX=llvm- \
    -DCMAKE_C_FLAGS:STRING="$SLKCFLAGS" \
    -DCMAKE_CXX_FLAGS:STRING="$SLKCFLAGS" \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DLIB_SUFFIX=${LIBDIRSUFFIX} \
    -DCMAKE_BUILD_TYPE=Release ..

This apparently is required to tell cmake to use the llvm toolchain. Building FO with this modifications to the build script produced a working build.

@dslackw, @pitchforks, can you test on your systems if this fix works for you?

If you can't use clang/llvm for building packages, we're out of luck, unfortunately. Fixing gcc issues is beyond our possibilities, sorry.

@geoffthemedio
Copy link
Member

I would once again suggest trying a newer version of Boost. It's possible they noticed the issue and fixed it in a later release.

@ghost
Copy link
Author

ghost commented Sep 17, 2015

Thanks everyone, for your patience and for the great job, works fine now.

@geoffthemedio at present I can not use the latest version of boost is not available at release Slackware 14.1 -stable.

So i have update freeorion with version 0.4.5 in the repository SlackBuilds.org and it is available next week after public update.

freeorion.SlackBuild

Best regards,
Dimitris

@geoffthemedio
Copy link
Member

Are you unable to download and build Boost from source?

@Vezzra
Copy link
Member

Vezzra commented Sep 17, 2015

According to what @dslackw said earlier:

I do not think is boost issues. Before ten days built the FO in Slackware 14.1 -current and I got the same problem.

And @pitchforks mentioned:

However, the -current branch of Slackware - the one that will become the next stable Slackware version when it will be released (there's no release date fixed yet) - seems to have boost 1.58.

This would indicate that @dslackw already tried to build against boost 1.58, which didn't solve the issue. So it really looks like we're dealing with a compiler issue here (I mean, building with clang/llvm works after all...).

Anyway, we have a working solution, so I'm closing this issue.

@Vezzra Vezzra closed this as completed Sep 17, 2015
@Vezzra Vezzra added this to the Next release milestone Sep 17, 2015
@MagaTailor
Copy link
Contributor

The solution probably works but the slackbuild author must have messed the clang/llvm architecture flags and the alleged i586 build crashes with SIGILL on my non-SSE2 machine.

I contacted the author of the script about it and he mentioned this issue here. If there's a standard way of passing the clang march flags I'm sure he'll be adding it to his script.
Thx.

@Vezzra
Copy link
Member

Vezzra commented Nov 5, 2015

Hm, the only thing that caught my attention when looking at the freeorion.SlackBuild script are the following lines (45-50):

if [ "$ARCH" = "i486" ]; then
  SLKCFLAGS="-O2 -march=i486 -mtune=i686"
  LIBDIRSUFFIX=""
elif [ "$ARCH" = "i686" ]; then
  SLKCFLAGS="-O2 -march=i686 -mtune=i686"
  LIBDIRSUFFIX=""

When $ARCH="i686", both the -march and the -mtune options are set to "i686", but for $ARCH="i486", only the -march option is also set to "i486", -mtune is set to "i686", maybe that's the problem here? @petevine, can you try and edit your freeorion.SlackBuild script and set -mtune to "i486" for the i486 architecture, then try to build and report back?

@MagaTailor
Copy link
Contributor

I used a binary build from the slackonly repo and won't be able to build FO myself but it's clear you've found the problem (cross-compiling or even building natively on a newer SSE(x) machine with -march=i686 leads to this, the correct way would be to use pentium2 or pentiumpro flag if it's meant for generic i686) as the current 32-bit slackware has switched to i586. Even though the package was labelled i586, the proper slackbuild march flag is nowhere to be found.

(the 14.1 version compiled for i486 didn't have the SIGILL issue, just the one that's been fixed here)

@Vezzra
Copy link
Member

Vezzra commented Nov 5, 2015

Ok, if I understand correctly, you're using a prebuilt binary that has (apparently mistakenly) been built for i686 and later architectures, which caused the SIGILL when you tried to run it on your system.

So, whoever produced these prebuilt binaries will have to adjust the build script accordingly, or the maintainer of the slackbuild package has to do that for the build script hosted on slackbuilds.org (so anyone who uses it to produce prebuilt binaries will get ones that won't cause SIGILL crashes on i586 architectures).

@petevine, can you report those findings back to whoever needs to know to make the necessary adjustments? Anything else you need from me or I can assist you with?

@MagaTailor
Copy link
Contributor

Thanks, that's exactly what's needed - the slackbuild author has redirected me here so I'm sure he's reading the solution as well. The script must have defaulted to i686 after encountering an unknown arch (i586 in the current slackware)

@Vezzra
Copy link
Member

Vezzra commented Nov 5, 2015

the slackbuild author has redirected me here so I'm sure he's reading the solution as well

Well, I'm not entirely sure if he actively monitors the discussion here, it's a closed issue after all.

@dslackw, can you confirm that you've taken notice of this issue and our discussion about it here and implemented the fix? Just so I can remove this from my todo list... 😉

@ghost
Copy link
Author

ghost commented Nov 5, 2015

Slackware -current not supported from SBo repository.
http://slackbuilds.org/faq/
FAQ 15

@MagaTailor
Copy link
Contributor

I've notified the author.

We're talking about this repo, as you can see it has those two versions (14.1 and current) covered:

http://packages.slackonly.com/pub/packages/

The fix is simple, just add a new elif section for -march=i586

elif [ "$ARCH" = "i586" ]; then
  SLKCFLAGS="-O2 -march=i586 -mtune=i686"
  LIBDIRSUFFIX=""

@Vezzra
Copy link
Member

Vezzra commented Nov 5, 2015

Ok, I see. So, if I understand correctly, the prebuilt binary for 14.1 has been working anyway, it was only the one for -current that was messed up (the build script not adjusted correctly). Well, I guess you guys have to take it from here.

Let me know if you need any further assistance from me (as far as I'm able to provide it, that is).

@ghost
Copy link
Author

ghost commented Nov 10, 2015

The script has been made to work correctly in version Slackware -stable 14.1, as the repository ( slackbuilds.org ) requires. Unfortunately I can not support the version -current because there are major changes in the distribution structure. If Slackware -current users want to run this script you need to make changes themselves. I think this issue closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:bug The Issue/PR describes or solves a perceived malfunction within the game.
Projects
None yet
Development

No branches or pull requests

4 participants