-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some make problems in fedora20 - incl a way to get it to work, somehow #1
Comments
ps: i found that In CUDA 5.0 and CUDA 5.5, the CUBLAS routine SGEMM() for operations NN and NT can give wrong results on Kepler Architecture SM35 when the following conditions are met : |
The cstdlib dependency was added, thank you for spotting the problem. Both CUDA 5.0 and 5.5 work fine on Fermi architecture. Unfortunately I if test If you remove these lines, only Compute Capability 2.0 code will be Thanks again. On 2014-03-18 01:41, standfest wrote:
|
thanks for your response. digging further into the problem (and changing my hardware back to a tesla c2070) i still struggle with this linking problem while compiling:
do you have any ideas what to do? maybe my configure output helps:
|
There was a logical flaw with the preprocessor statements in the |
Thanks, now it is compiling without complaining - but with a persistent linking flaw:
if i set
i cannot find libmpi.so.1 and vice versa. Maybe including the path in the ‘-rpath’ linker option could help - sadly i am a c++ noob and so far all my approaches in modifying the makefile fail. Any hints? |
As for the MPI dependency, do not worry about it, unless you have more Not finding the CUDA libraries is more troubling. You said earlier that If the error persist, please post again the parameters for the configure Thanks and apologies for the delay. |
originally i had a symbolic link called CUDA pointing to CUDA-5.5, but now i deleted it and renamed CUDA-5.5 to CUDA. additionally i set the --without-mpi flag and was able to compile and run without the linking flaw - as long as i set
unfortunately i get this when testing it with -k 1 (0 and 2 are working, but there is a long waiting period after the final training iteration - i cannot imagine saving the data is taking so long, or is it?)
so back to square one. at least here my log:
thank you for thinking about it! |
I want to find out what goes wrong here. I will install a Fedora on my |
I cannot reproduce the problem. I started with a plain vanilla Fedora 20 install. Then I followed these instructions to get the proprietary driver working: http://www.if-not-true-then-false.com/2014/fedora-20-nvidia-guide/ yum update kernel* selinux-policy*
reboot yum localinstall --nogpgcheck http://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm http://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm
yum install akmod-nvidia xorg-x11-drv-nvidia-libs kernel-devel acpid Genuine weird stuff was going on with the initramfs, but eventually os-detect on Arch Linux figured out the correct boot configuration, blacklisted nouveau, and I had the Nvidia driver working: lsmod|grep nvidia
nvidia 10686781 44
drm 283937 4 nvidia
i2c_core 38476 4 drm,i2c_i801,nvidia,videodev Then I followed the instructions here: http://fedoraproject.org/wiki/Cuda I installed the prerequisites, also adding git, automake, and perl-Env: yum install wget make gcc-c++ freeglut-devel libXi-devel libXmu-devel mesa-libGLU-devel git perl-Env automake Then I switched over to these instructions for CUDA 5.5: http://hobiger.org/blog/2013/12/19/fedora-20-and-cuda/ issuing the command sh cuda_5.5.22_linux_64.run -override I accepted the EULA, said yes to attempting the install on an unsupported configuration, did not install the drivers, said yes to installing, the path was /opt/cuda, and the CUDA samples were also installed to the default location ($HOME/NVIDIA_CUDA-5.5_Samples]. After compiling deviceQuery, it complained that the driver did not support this CUDA version. I downloaded the latest driver and installed it: systemctl stop gdm
sh NVIDIA-Linux-x86_64-331.49.run
reboot After this, deviceQuery reported my GPU, an old 330M with Compute Capability 1.2. I cloned and compiled the git version of Somoclu: git clone https://github.com/peterwittek/somoclu
cd somoclu
./autogen.sh
./configure --without-mpi
make -s
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cuda/lib64
src/somoclu -k 1 data/rgbs.txt data/gpu_test A memory deallocation glitch crept in yesterday, I fixed it. Otherwise, it runs without problems. So I do not know what could be the issue on your machine. |
thanks for trying. i will look into my machine the day after tomorrow, maybe i'm going to reset it. i will update you on any findings. |
after all i finally found the time to redo the whole installation again, and now it worked quite well. the only thing not found instantly was libcudart.so.6 (apparently others have this problem too http://stackoverflow.com/questions/10808958/why-cant-libcudart-so-4-be-found-when-compiling-the-cuda-samples-under-ubuntu ) but following line helped:
thank you again for all your help and of course for your library, |
I am glad it finally works. Peter |
hi,
in order to make it was necessary to edit io.cpp and add "#include " because its dependency in iostream has been removed with gcc 4.3.
further more i comment the line with setDevice because of the error "undefined reference to `setDevice'", which allowed me to make it. BUT now i have problems with CUDA. if i try the gpu kernel, i get following error:
$somoclu -x 100 -y 200 file folder -e 20 -k 1
-->
nVectors: 417 nVectorsPerRank: 417 nDimensions: 0
Epoch: 0 Radius: 50
** On entry to SGEMM parameter number 8 had an illegal value
!!!! kernel execution error.
Aborted
terminate called after throwing an instance of 'thrust::system::system_error'
what(): unload of CUDA runtime failed
Aborted (core dumped)
would you have any suggestions?
thanks a lot!
The text was updated successfully, but these errors were encountered: