Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CacheLib won't build on Centos 8.1 with kernel 5.6.13-0_fbk6_4203_g4cb46d044bc6 #24

Closed
jeffreyalien opened this issue Jul 7, 2021 · 17 comments

Comments

@jeffreyalien
Copy link
Contributor

I have modified the NandWrites.cpp file, wdcWriteBytes function to support getting the bytes written for WDC drives and having issues building the CacheLib executable. The NandWrites.txt attached file contains the changes made to NandWrites.cpp file needed to support WDC drives.

The gmake is failing with the following errors:
CMakeFiles/cmTC_78ecd.dir/src.c.o: In function main': src.c:(.text+0x2f): undefined reference to pthread_create'
src.c:(.text+0x3b): undefined reference to pthread_detach' src.c:(.text+0x47): undefined reference to pthread_cancel'
src.c:(.text+0x58): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status

See the attached log and out files in the build-fail.zip file for more details.

build-fail.zip
NandWrites.txt

@agordon
Copy link
Contributor

agordon commented Jul 7, 2021

Hello @jeffreyalien ,
I assume the pthread error messages came from the CMakeError.log file - however, this is an artifact of the way cmake work, and not the direct culprit.
May I ask you to delete the build-cachelib directory and rebuild it with ./contrib/build-package.sh -O -v cachelib 2>&1 build.log - then (assuming it still fails), share with us the build.log, CMakeError.log and CMakeOutput.log ?
Thanks!

@jeffreyalien
Copy link
Contributor Author

@agordon ,
There is no -O option on our version of contrib/build-package.sh. We are building with the latest code from the master branch. Is there a different branch/tag that we should be using?

Here's the list of options we have available for the build-package.sh:
./contrib/build-package.sh -h
CacheLib dependencies builder

usage: build-package.sh [-BdhijStv] NAME

options:
-B skip build step
(default is to build with cmake & make)
-d build with DEBUG configuration
(default is RELEASE with debug information)
-h This help screen
-i install after build using 'sudo make install'
(default is to build but not install)
-j build using all available CPUs ('make -j')
(default is to use single CPU)
-S skip git-clone/git-pull step
(default is to get the latest source)
-t build tests
(default is to skip tests if supported by the package)
-v verbose build

@agordon
Copy link
Contributor

agordon commented Jul 7, 2021

Apologies, my bad. No need for -O - just remove it (it belongs to the more general build.sh script).

@jeffreyalien
Copy link
Contributor Author

@agordon
Here's the build output and log files with the -v option.
build-fail-070721.zip

@dreddy
Copy link

dreddy commented Jul 16, 2021

So, here is what I think is the issue. The FindSodium.cmake is being double included. There are two version of it. One in fizz and one in cachelib. Putting the right protections as the fizz version of FindSodium does gets rid of the "Configuring Incomplete!" cmake error.
Found that by running cmake in trace more.. log attached.
cachelib_cmake.log

@agordon
Copy link
Contributor

agordon commented Jul 16, 2021

Hello @dreddy,
Indeed, that is one of the issues (another is that the datatype library was not using suitable parameters which caused the pthread linking error). I just sent an email to @jeffreyalien with updated patch - I'm happy to forward it to you (shall I use your "@intel.com" email from GitHub) ?

@dreddy
Copy link

dreddy commented Jul 19, 2021

Yes agordon indeed. Kindly forward it to me.
Please use my @intel.com e-mail.

Hello @dreddy,
Indeed, that is one of the issues (another is that the datatype library was not using suitable parameters which caused the pthread linking error). I just sent an email to @jeffreyalien with updated patch - I'm happy to forward it to you (shall I use your "@intel.com" email from GitHub) ?

@jeffreyalien
Copy link
Contributor Author

@agordon
I pulled the latest version of the CacheLib code and applied you patch. Now we're seeing this failure in attached build log. Could you check it out and let me if you need more info or how to resolve. Thanks.
build_error_during_glog.log

@jeffreyalien
Copy link
Contributor Author

@agordon
With your patch applied, it's getting a build error during the cachelib build now. Attached the build log.
build_error_during_cachelib_build.log

@agordon
Copy link
Contributor

agordon commented Jul 28, 2021

@dreddy @jeffreyalien we just pushed to the repository the mentioned patch plus few other fixes - mind giving the latest git-version a try and tell if it builds better?

@jeffreyalien
Copy link
Contributor Author

@agordon We were finally able to build and install Cachelib with this update. We'll be doing some testing with cachebench now. I'll update you on how that goes when we're done.

@jeffreyalien
Copy link
Contributor Author

@agordon
We are still hitting issues; now when trying to execute cachelib. Here's a summary of the problem:

After manually cloning the following dependancies folly, wangle, fbthrift, fizz to CacheLib/cachelib/external. I am able to successfully build CacheLib from the latest CacheLib git hub code. However when attempting to run cachebench I recieve the following error.
CacheLib/build-cachelib/cachebench/cachebench: error while loading shared libraries: libthriftcpp2.so.1.0.0: cannot open shared object file: No such file or directory

I am unfamiliar with cmake files but I did attempt to add the option to build shared libraries for fbthrift. Adding BUILD_SHARED_LIBS=ON to CMakeCache.txt and regenerating the make file and then rebuilding fbthrift did not resolve the issue.

@agordon
Copy link
Contributor

agordon commented Jul 29, 2021

Hello @jeffreyalien ,

After manually cloning the following dependancies folly, wangle, fbthrift, fizz to CacheLib/cachelib/external. I am able to successfully build CacheLib from the latest CacheLib git hub code.

Question: why do you need to manually git-clone the dependencies ? the script ./contrib/build.sh does it automatically.

However when attempting to run cachebench I recieve the following error.
CacheLib/build-cachelib/cachebench/cachebench: error while loading shared libraries: libthriftcpp2.so.1.0.0: cannot open shared object file: No such file or directory

Please check if you have an opt/cachelib subdirectory - the new build scripts install all dependencies AND the cachebench binary there.
When cachelib is build using the ./contrib/build.sh, it automatically configures the binary to look for the libraries in the ../lib and ../lib64 relative directories (using a linker feature called RPATH).
So if you run ./opt/cachelib/bin/cachebench, it should automatically find all the required libraries in ./opt/cachelib/lib/ .

I am unfamiliar with cmake files but I did attempt to add the option to build shared libraries for fbthrift. Adding BUILD_SHARED_LIBS=ON to CMakeCache.txt and regenerating the make file and then rebuilding fbthrift did not resolve the issue.

That is correct (the BUILD_SHARED_LIBS=ON) - and new build script builds all dependencies with this flag, and all the shared libraries should be installed in the subdirectories ./opt/cachelib/lib and/or ./opt/cachelib/lib64 .

Please let me know if you see these files, and if using them work for you.

@jeffreyalien
Copy link
Contributor Author

@agordon
Here's an explanation to your comments above.

  1. Why manually git-clone dependencies?
    See the attached log (build_error_during_folly.log) Without manually cloning the dependencies I get “fatal: not a git repository (or any of the parent directories): .git” when running the build.sh script. So I am interpreting that as the script does attempt to clone but fails to do so.

  2. Check for the opt/cachelib subdirectory
    That subdirectory does exist. I copied the cachebench from there and ran it from a different directory where our automation looks for it by default. Running ./opt/cachelib/bin/cachebench it does appear that all of the dependencies are located. Now the model number is not recognized as a WDC device. As seen in Cachebench_run_error.log
    build_error_during_folly.log
    Cachebench_run_error.log

@sathyaphoenix
Copy link
Contributor

Now the model number is not recognized as a WDC device. As seen in Cachebench_run_error.log

@jeffreyalien you can now apply your earlier patch and verify if the device is appropriately handled. Once you verify, please send out a PR to merge.

Can you confirm that your build issues are fixed so that I can close out this issue ?

@jeffreyalien
Copy link
Contributor Author

@agordon @sathyaphoenix
I've fixed the model number check but we're hitting this exception when reading "WriteyBytes": Exception fetching nand writes for nvme1n1. I see how to specify the field number and factor in the getBytesWritten function. But have a question on how that function knows which line in vs smart log data to read. Here's what our data looks like. Is "Physical media units written" ok?

[root@fb-yv2-s3-n4 CacheLib]# nvme wdc vs-smart-add-log /dev/nvme1n1
NVMe Status:SUCCESS: The command completed successfully(0)
SMART Cloud Attributes :-

Physical media units written - 0 2068679535362048

Physical media units read - 0 2658736152236032
Bad user nand blocks - Raw 0
Bad user nand blocks - Normalized 100
Bad system nand blocks - Raw 0

@sathyaphoenix
Copy link
Contributor

@jeffreyalien

But have a question on how that function knows which line in vs smart log data to read. Here's what our data looks like. Is "Physical media units written" ok?

You have to write the logic to do that. You can use the function getBytesWritten by passing the right arguments. See here for some examples here https://github.com/facebookincubator/CacheLib/blob/master/cachelib/cachebench/util/NandWrites.cpp#L154

I'll close this issue out since the build issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants