Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMake error [AMD A10] #25

Closed
skn123 opened this issue May 27, 2017 · 47 comments
Closed

CMake error [AMD A10] #25

skn123 opened this issue May 27, 2017 · 47 comments

Comments

@skn123
Copy link

skn123 commented May 27, 2017

CMake Error at CMakeLists.txt:205 (add_library):
  Cannot find source file:

    src/CLBlast/src/database/database.cpp

  Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
  .hxx .in .txx

CMake Error: CMake can not determine linker language for target: clblast
CMake Error: Cannot determine link language for target "clblast".
CMake Error: CMake can not determine linker language for target: clblast

Also would be nice if the third party dependencies are :
a.) Either downloaded when cloning the repo or,
b.) Set as Find_PACKAGE(xxx) so that users know that these additional packages are needed.

(edited by Hugh for formatting)

@hughperkins
Copy link
Owner

Can you do git submodule update --init --recursive, and try again please?

@skn123
Copy link
Author

skn123 commented May 27, 2017

Yes it worked. Can you please add this line in the ReadMe Section? I have the same problem with DeepCL :)

@skn123 skn123 closed this as completed May 27, 2017
@skn123 skn123 reopened this May 27, 2017
@skn123
Copy link
Author

skn123 commented May 27, 2017

Allright; now linker errors!

https://gist.github.com/hughperkins/baa7a942f5e02ea527a02cc2157b82b0

(edited by Hugh, to move to a gist)

@hughperkins
Copy link
Owner

Good that the git submodule update --init --recursive worked. Added to doc in a610179 . As far as the linker issues, can you clarify:

  • what operating system and version are you using? (Ubuntu 16.04? Mac Sierra? Something else?)
  • what is the exact full path for your clang 4.0.0 download?

@skn123
Copy link
Author

skn123 commented May 27, 2017

UBuntu 15.10 (I need that for fglrx); AMD A10 processor

naths@naths-HP-Pavilion-15-Notebook-PC:~/build/coriander$ clang -v
clang version 4.0.0 (https://github.com/llvm-mirror/clang 12dcbf43701c142e8313d322c14b53a6c2957826) (https://github.com/llvm-mirror/llvm 386ab19245cb9b6bcb73a0209ac76e730125faf8)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/bin
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/5
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/5.2.1
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.2.1
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.2.1
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64

(edited by Hugh for formatting)

@hughperkins
Copy link
Owner

Cool. Can you provide a screenshot of the ccmake .. screen please?

@skn123
Copy link
Author

skn123 commented May 27, 2017

/home/naths/srcs/coriander/cmake/get-llvm-cxxflags.sh: line 5: /usr/local/opt/llvm-4.0/bin/llvm-config: No such file or directory

/home/naths/srcs/coriander/cmake/llvm-syslibs.sh: line 4: /usr/local/opt/llvm-4.0/bin/llvm-config: No such file or directory

/home/naths/srcs/coriander/cmake/get-llvm-libs.sh: line 6: /usr/local/opt/llvm-4.0/bin/llvm-config: No such file or directory

Configuring done
Generating done

(edited by Hugh for formatting)

@hughperkins
Copy link
Owner

hughperkins commented May 27, 2017 via email

@skn123
Copy link
Author

skn123 commented May 28, 2017

I am getting the same logs and the same error. Is there any flag/settings that I should adhere to ?

@hughperkins
Copy link
Owner

Can you type ccmake .. and send me a screenshot? By the way, what i reckon is, there's a setting called CLANG_HOME, and you probalby need to modify that?

@skn123
Copy link
Author

skn123 commented May 28, 2017

I changed CLANG_HOME to /usr/local (the directory where llvm/clang executables are installed). Those previous warnings disappeared. However, the linker errors still persist.
I am sending a verbose log. Let me know if I am missing out any flags?

https://gist.github.com/hughperkins/e2c1826f9d691c1e96374225753d7f63

(edited by Hugh, to move into gist)

@hughperkins
Copy link
Owner

hughperkins commented May 28, 2017 via email

@skn123
Copy link
Author

skn123 commented May 29, 2017

I went a step further. I rebuilt my llvm toolchain to the latest and greatest from github.
This makes it to clang 5.0 ! (and not clang 4.0)

https://gist.github.com/hughperkins/feae9eb2adfee57fc5d40de8f8c1a924

(edited by Hugh, to move into gist)

@hughperkins
Copy link
Owner

hughperkins commented May 29, 2017 via email

@skn123
Copy link
Author

skn123 commented May 29, 2017

I thought as much! So let me get back to 4.0

@hughperkins
Copy link
Owner

:-)

@skn123
Copy link
Author

skn123 commented May 29, 2017

Ok.. I tried another thing. Went back to my archaic g++ compiler (In this case 5.2.1). And guess what. It Built successfully! Which means now I am even more confused. Just what the hell is going on?

The problems are only related with
https://stackoverflow.com/questions/33394934/converting-std-cxx11string-to-stdstring

And I have tried both options with CLang and it does not work! Same old linker errors.

(edited by Hugh, to move into gist)

@hughperkins
Copy link
Owner

what were you using before, instead of g++? do you mean, you were linking using clang?

I think it's normal that you cant link with a g++ library using a clang linker. Or at least, doesnt worry/surprise me too much.

ie,llvm, on ubuntu, is presumably compiled/linked itself using g++?

@skn123
Copy link
Author

skn123 commented May 29, 2017

Yes I was..A built version of clang4.0 from github with gcc5.2.1 as the compiler. All my other libraries work fine (including your other library DeepCL with all its dependencies). That's why I am surprised why this is happening for this particular library?

@hughperkins
Copy link
Owner

Ah. I dont support using non-native compiler/linker. You're on your own for that too ;-) . Which is not to say it wont work, but: I think that you are breaking new ground, and will need to figure out ways to handle the challenges that you encounter.

@skn123
Copy link
Author

skn123 commented May 29, 2017

DeepCL works flawlessly! So what could be the reason here? Can you check at your end if clang4.0 works on UBuntu? If so then I will use that version only going forward.

@hughperkins
Copy link
Owner

I have a Mac. I use Ubuntu 16.04 by spinnning up an aws box. I guess this is something you could try too :-)

@hughperkins
Copy link
Owner

(but yeah, my build process works on Ubuntu 16.04, using native compiler for the linking)

@hughperkins
Copy link
Owner

hughperkins commented May 29, 2017 via email

@skn123
Copy link
Author

skn123 commented May 29, 2017

Ok.. Some good news at last:

https://gist.github.com/hughperkins/a05057054f04b6e590bc87c1bc6f0b95

[100%] Linking CXX executable ir-to-opencl
[100%] Built target ir-to-opencl

(edited by Hugh to move into gist)

@skn123
Copy link
Author

skn123 commented May 29, 2017

So LLVM as in fresh from github (along with a host of other libraries like libcxx libcxxabi etc.,) as of May 28th 2017
And I could build the code:

  1. Compiled the code with gcc5.2.1 (why???)
  2. Had to disable linking with libstdc++ (why??)

Now, the code builds find but there were some issues. These are primarily changes in API for llvm 5.0.
If you can fix them at your end then the rest of the code should be fine:

  1. I had to disable patch_host. This one seems to be the culprit. But again, only a few instructions need to be changed to match llvm 5.0 API
  2. You had an #ifdef of llvm-4.0 somewhere in the code. I had to modify the "else" part of the code (new_instruction_dumper.cpp)
  3. "mutations.cpp" was the tricky one:
    1. There is this line of code
    Constant *structValues[] = {
        ConstantInt::get(IntegerType::get(M->getContext(), 32), 1000000),
        M->getOrInsertFunction(
            functionName,
            Type::getVoidTy(M->getContext()),
            NULL),
        ConstantPointerNull::get(PointerType::get(IntegerType::get(M->getContext(), 8), 0))

The function M->getOrInsertFunction is the culprit and the change has to come from your end. But it is a minor change.
Apparently, clang5.0 was complaining of an implicit typecast not allowed. Unfortunately, I do not have the API from your end to make that change. So I went and changed the installed header file. And it built. Now, I thought what if I were to rebuild llvm with this change (as /IR/Module.h is a system file). Looks like llvm does not like it and I had to revert back. So, I have a "non-complaint" installation of llvm and it does the job.

(edited by Hugh for formatting)

@hughperkins
Copy link
Owner

b.) You had an #ifdef of llvm-4.0 somewhere in the code. I had to modify the "else" part of the code (new_instruction_dumper.cpp)

haha :-) . Well spotted :-)

Unfortunately, I do not have the API from your end to make that change

What do you mean by 'the api from my end'? What information are you lacking?

@skn123
Copy link
Author

skn123 commented May 29, 2017

  /// Look up the specified function in the module symbol table. If it does not
  /// exist, add a prototype for the function and return it. This function
  /// guarantees to return a constant of pointer to the specified function type
  /// or a ConstantExpr BitCast of that type if the named function has a
  /// different type. This version of the method takes a list of
  /// function arguments, which makes it easier for clients to use.
  template<typename... ArgsTy>
  Constant *getOrInsertFunction(StringRef Name,
                                AttributeList AttributeList,
                                Type *RetTy, ArgsTy... Args)
  {
    SmallVector<Type*, sizeof...(ArgsTy)> ArgTys{Args...};
    return getOrInsertFunction(Name,
                               FunctionType::get(RetTy, ArgTys, false),
                               AttributeList);
  }

This is the piece of code that llvm is complaining. Apparently it likes the following:

  /// Look up the specified function in the module symbol table. If it does not
  /// exist, add a prototype for the function and return it. This function
  /// guarantees to return a constant of pointer to the specified function type
  /// or a ConstantExpr BitCast of that type if the named function has a
  /// different type. This version of the method takes a list of
  /// function arguments, which makes it easier for clients to use.
  template<typename... ArgsTy>
  Constant *getOrInsertFunction(StringRef Name,
                                AttributeList AttributeList,
                                Type *RetTy, ArgsTy... Args)
  {
    SmallVector<Type*, sizeof...(ArgsTy)> ArgTys{static_cast<size_t>(Args)...};
    return getOrInsertFunction(Name,
                               FunctionType::get(RetTy, ArgTys, false),
                               AttributeList);
  }

However, as this is a system file I also had to effect the change in the main trunk. Doing so broke the build. But it built your libraries. So, if we have some way to effect this change from your API (as this function is templated), then the problem would be solved. That will still leave the problematic library "patch_host"...

(edited by Hugh for formatting)

@hughperkins
Copy link
Owner

If you mean, "what does this function appendGlobalConstructorCall do?", I dont remember :-P . Why dont you comment it out, entirely ,the whole function, and see what it complains about? I ahve a few theories, but I dont remember clearly, without following the same process. Hypotheses:

  • function declarations?
  • adding global strings, like for appending the deviceside bytecode inside the hostside bytecode, as a string?
  • calling some static initializers, to register the deviceside bytecode globally? (Note that this isnt currently being used, hence, it's entirely possible this function is not actually used :-P )

@hughperkins
Copy link
Owner

(ps, I kind of think it's interesting you feel it's easier to modify clang/llvm, than to modify my own library :-P . I suppose it makes sense though: your goal is (partially) to become an expert in llvm/clang, is that right?)

@skn123
Copy link
Author

skn123 commented May 29, 2017

Absolutely not. I want to port some cuda libraries into opencl. Sadly my working library is clang on Ubuntu and I have no choice but to dig deeper. But this is interesting if it works. Which makes me wonder. How did DeepCL build and not this?

@skn123
Copy link
Author

skn123 commented May 29, 2017

This is the other major error and it has shown up once I enabled all the tests:

/home/naths/srcs/coriander/test/gtest/test_LocalNames.cpp:40:21: error: no matching constructor for initialization of 'llvm::AllocaInst'
    Value *v1 = new AllocaInst(IntegerType::get(context, 32));
                    ^          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(edited by Hugh for formatting)

@hughperkins
Copy link
Owner

Ok. Well, please consider commenting out the appendGlobalConstructorCall, and either figuring out where it is used (if anywhere), or pasting the resulting build errors here (or in https://gist.github.com , and paste the linke here, either ok )

@hughperkins
Copy link
Owner

hughperkins commented May 29, 2017

An AllocaInst allocates memory. In the C code, things like:

int foo;

... become allocas. Note that an alloca returns a pointer. So, intuitively, although in the opencl, there'd be:

int foo;

Alloca works more like:

int *foo = new int[1];

It's typically followed by a load to represent the value pointed at:

%1 = alloca i32, 1
%2 = load %1

(this is totally not syntax correct). So, here %1 is a pointer to int32, and %2 is the value of the int32 pointed to by %1.

To create an alloca you need:

  • what type of 'thing' do you want to create? (some struct type? an integer? etc ...?)
  • how many of them to create? (just 1? 10? ...?) . (actualluy, I cant remember if you need this, but you probably do, I guess. maybe)

You also need:

  • a thing called an LLVMContext, stored in context variable in my code, which just holds basically all the llvm global varaibels essentially, and I store it in a global varaible in my code (I think)

You can optionally give some instructions names. The effect is to change the resulting bytecode. Without a name, eg:

%1 = alloca i32, 1

With a name, eg:

%foo1 = alloca i32, 1

Not sure if this gives you some background to help you figure out how to hack on the allocainst instruction?

@skn123
Copy link
Author

skn123 commented May 29, 2017

Ok, I think the fix turns out to be easy. You seem to have a NULL as the last parameter of every function call. I deleted that and then voila!

@hughperkins
Copy link
Owner

Haha, awesome! :-)

@skn123
Copy link
Author

skn123 commented May 29, 2017

But having said that; llvm also supports 3.9 4.0 and 5.0. Maybe you should also do that via #ifdefs. Also, I am trying out some of the unit tests and will report failure cases on those too.

@skn123
Copy link
Author

skn123 commented May 29, 2017

Next set of errors:

https://gist.github.com/hughperkins/43c5f12d3fe24ee1e8b91175a73199ab

(edited by Hugh, to move into gist)

@hughperkins
Copy link
Owner

hughperkins commented May 29, 2017

But having said that; llvm also supports 3.9 4.0 and 5.0. Maybe you should also do that via #ifdefs.

So, I dont have resources to support more than a single version of llvm at a time. I'm already struggling to get Tensorflow-cl working as it is :-) . So, I dont support 3.8, or 3.9, or 4.0.1, or 5.0. 4.0.0 only :-P

However, if you want to handle supporting 5.0.0, that sounds good to me :) And feel free to create a separate fork for that, like eg coriander-l5, or whatever you want to call it.

I'm happy to accept pull requests to upstream, that help with 5.0.0 compatibility, as long as they dont affect readability or maintainability of the current 4.0.0 code. So, concretely:

  • I'm not keen on having #ifdefs scattered liberally around the code ;-)
  • if you can write the function calls in such a way, that they work in both 4.0.0 and 5.0.0, that is my preferred solution :-)
  • otherwise, you could create a single wrapper function, that encapuslates the #ifdef, and then call this function, istead of the underlying llvm getOrCreateFunction function, throughout the Coriander code.

@skn123
Copy link
Author

skn123 commented May 30, 2017

OK.. make life simple. Did an apt-get update from llvm and installed llvm4.0
The code builds fine. This is the error I am getting:

adminspin@adminspin-System-Product-Name:~/build/coriander$ ./test_char
./test_char: error while loading shared libraries: libcocl.so: cannot open shared object file: No such file or directory
adminspin@adminspin-System-Product-Name:~/build/coriander$ ls *.so
libclblast.so  libclew.so  libcocl_gtest.so  libcocl.so  libeasycl.so
adminspin@adminspin-System-Product-Name:~/build/coriander$ ldd test_char
        linux-vdso.so.1 =>  (0x00007ffcc1cc5000)
        libcocl.so => not found
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb1bf113000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb1bed83000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb1be9b3000)
        /lib64/ld-linux-x86-64.so.2 (0x00005622a67c4000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb1be6a3000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb1be48b000)
adminspin@adminspin-System-Product-Name:~/build/coriander$ ls -la *.so
-rwxrwxr-x 1 adminspin adminspin 27907528 May 30 17:39 libclblast.so
-rwxrwxr-x 1 adminspin adminspin    29224 May 30 17:32 libclew.so
-rwxrwxr-x 1 adminspin adminspin  2500704 May 30 17:32 libcocl_gtest.so
-rwxrwxr-x 1 adminspin adminspin  4840952 May 30 17:40 libcocl.so
-rwxrwxr-x 1 adminspin adminspin  2926008 May 30 17:32 libeasycl.so
adminspin@adminspin-System-Product-Name:~/build/coriander$ chmod 777 -R *
adminspin@adminspin-System-Product-Name:~/build/coriander$ ./test_char
./test_char: error while loading shared libraries: libcocl.so: cannot open shared object file: No such file or directory
adminspin@adminspin-System-Product-Name:~/build/coriander$

(Edited by Hugh, to add ` formatting marks)

@skn123
Copy link
Author

skn123 commented May 30, 2017

Phew... Had to do a sudo make install and soft links in /usr/lib and then it works..
Now this is the result of a test

adminspin@adminspin-System-Product-Name:~/build/coriander$ ./test_char 123
OpenCL platform: AMD Accelerated Parallel Processing
OpenCL device: Pitcairn
__internal__ build log: 
"/tmp/OCL2964T1.cl", line 16: warning: goto statement may cause irreducible
          control flow
          goto v2;
          ^

"/tmp/OCL2964T1.cl", line 18: warning: goto statement may cause irreducible
          control flow
          goto v3;
          ^

"/tmp/OCL2964T1.cl", line 26: warning: goto statement may cause irreducible
          control flow
      goto v3;
      ^

"/tmp/OCL2964T1.cl", line 10: warning: label "v1" was declared but never
          referenced
  v1:;
  ^


opencl execution error, code -51 CL_INVALID_ARG_SIZE
caught runtime error OpenCL error, code: CL_INVALID_ARG_SIZE
terminate called after throwing an instance of 'std::runtime_error'
  what():  OpenCL error, code: CL_INVALID_ARG_SIZE
Aborted (core dumped)

what may be the problem?

(edited by Hugh for formatting)

@hughperkins
Copy link
Owner

invalid argument size means that there is a mismatch between the kernel args declared in the opencl, and those being called by the hostside code.

So, we need to get both of these, and compare.

We can get the opencl, by defining the environment variable COCL_DUMP_CL=1, and looking in /tmp, for files with names like /tmp/0.cl, /tmp/1.cl. You can paste these into a https://gist.github.com , and I'll interpet them.

For the hostside calls ... hmmm... can you start by:

  • updating to very latest master, ie do a git pull
  • do ccmake .., and set COCL_SPAM to ON
  • do make -j 8 && sudo make install
  • then rerun, and paste the entire output into a https://gist.github.com , and I'l take a look

@hughperkins
Copy link
Owner

(You can also find a bunch of .ll files in the directory where the .cu file was. I think it'd be useful to get those, as a gist, too)

@hughperkins
Copy link
Owner

Background on the various .ll files:

  • xxx-hostraw.ll: the hostside bytecode, from running clang++ parser against your .cu file
  • xxx-device-noopt.ll: the deviceside bytecode, from running clang++ parser against your .cu file
  • xxx-hostpatched.ll: hostside bytecode, after processing by Coriander, to handle kernel launch, passing arguments to kernels, etc

@hughperkins
Copy link
Owner

Hi skn123. Any updates on getting some of these dump/debug files?

@hughperkins hughperkins changed the title CMake error CMake error [AMD A10] Jun 3, 2017
@hughperkins
Copy link
Owner

(Note: I've taken the liberty of reformatting many of the posts above, so I can read it a bit more easily :-) Hope that's ok-ish?)

@hughperkins
Copy link
Owner

Closing this for now, since fairly old-ish, and this issue contains a ton of different sub-issues in a sense. Let's open new issue(s) for any remaining issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants