Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Centos9 Stream + GCC12 #9903

Closed

Conversation

majetideepak
Copy link
Collaborator

@majetideepak majetideepak commented May 23, 2024

Resolves #9879

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 23, 2024
Copy link

netlify bot commented May 23, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 7da4072
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6667316247c40a00084db500

Copy link
Contributor

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to clean up scripts/ at some point ^^ The docker.yml workflow will need to be updated as well
LGTM so far, let's test it in CI like you suggested!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is anything using this, a quick grep didn't show anything? If not we should remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

scripts/setup-centos9.sh Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also defunct imo and should be removed :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Copy link
Collaborator

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for these changes!
I have only a few comments.

function install_cuda {
# See https://developer.nvidia.com/cuda-downloads
dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
yum install -y cuda-nvcc-$(echo $1 | tr '.' '-') cuda-cudart-devel-$(echo $1 | tr '.' '-')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed the path on line 190 needs update to RHEL9.
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo

Should probably use dnf. Also this doesn't work with how the install functions are called. $1 is never set. $1 is representing a version in the form major.minor (e.g. 12.5) which gets converted to major-minor (e.g. 12-5) which is the suffix for the packages. We don't have to fix it here but without a change to the file this function cannot be used as input to the script.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the github workflow file, I see that this is invoked along with the version.
I don't see why we can't include the CUDA version in this script via an ENV variable.
I will followup on this separately.

function install_velox_deps_from_dnf {
dnf_install libevent-devel \
openssl-devel re2-devel libzstd-devel lz4-devel double-conversion-devel \
libdwarf-devel curl-devel libicu-devel bison flex libsodium-devel zlib-devel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, lets add elfutils-libelf-devel. This is needed to ensure that folly is able to turn on the symbolization for the stacktraces. It isn't present by default.

# See the License for the specific language governing permissions and
# limitations under the License.

# This script documents setting up a Centos8 host for Velox
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Centos8 -> Centos9.

# Use "n" to never wipe directories.
#
# You can also run individual functions below by specifying them as arguments:
# $ scripts/setup-centos8.sh install_googletest install_fmt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: centos8 -> centos9 or centos.

@majetideepak
Copy link
Collaborator Author

@assignUser CUDA 11.8 is not supported with GCC 12.2.1.
According to https://stackoverflow.com/questions/6622454/cuda-incompatible-with-my-gcc-version, we need to use CUDA > 12.1.
Do you know if it is okay to upgrade the CUDA version?

@assignUser
Copy link
Contributor

Do you know if it is okay to upgrade the CUDA version?

I think there was some discussion about this in the PR adding cuda to adapters but I don't know any specifics #9335

@wanglinsong
Copy link

We also need to upgrade the base image for the java build.

@assignUser
Copy link
Contributor

This should fix the abseil issue (it needs to be built with C++17 so absl::string_view == std::stringview):

diff --git a/scripts/setup-adapters.sh b/scripts/setup-adapters.sh
index 7a487e877..679729f99 100755
--- a/scripts/setup-adapters.sh
+++ b/scripts/setup-adapters.sh
@@ -61,7 +61,7 @@ function install_gcs-sdk-cpp {
   github_checkout abseil/abseil-cpp 20240116.2 --depth 1
   cmake_install \
     -DABSL_BUILD_TESTING=OFF \
-    -DCMAKE_CXX_STANDARD=14 \
+    -DCMAKE_CXX_STANDARD=17 \
     -DABSL_PROPAGATE_CXX_STD=ON \
     -DABSL_ENABLE_INSTALL=ON

@majetideepak
Copy link
Collaborator Author

majetideepak commented Jun 4, 2024

it needs to be built with C++17

@assignUser thanks for the pointer. I was searching for ABSL_USE_STRINGVIEW. I had a hunch that the CXX standard was different.

@kgpai
Copy link
Contributor

kgpai commented Jun 4, 2024

@majetideepak any reason to keep it in draft ?

@majetideepak
Copy link
Collaborator Author

@kgpai Multiple issues came up and are now fixed.
The last issue is the test failures

@majetideepak
Copy link
Collaborator Author

In Linux release with adapters

The following tests FAILED:
	328 - velox_gcsfile_test (Failed)
	330 - velox_hive_connector_test (Failed)

@majetideepak
Copy link
Collaborator Author

I see /opt/miniforge/envs/adapters/bin/python3: No module named testbench for velox_gcsfile_test failures.
@assignUser, @tigrux any clue?

@assignUser
Copy link
Contributor

It's a test dependency for gcs and should be installed in the env of the docker file. I can take a look

@tigrux
Copy link
Contributor

tigrux commented Jun 4, 2024

I see /opt/miniforge/envs/adapters/bin/python3: No module named testbench for velox_gcsfile_test failures. @assignUser, @tigrux any clue?

I see it is installed in:
https://github.com/facebookincubator/velox/blob/main/scripts/adapters.dockerfile#L34

However I don't really know how to add it to the image used in the current PR.

@majetideepak
Copy link
Collaborator Author

However I don't really know how to add it to the image used in the current PR.

@tigrux, I added a new adapters-cpp target in the docker-compose file in this PR. I used that to build the adapters image. The docker-compose build invokes the adapters.dockerfile.

@assignUser
Copy link
Contributor

@majetideepak I just rebuild the adapters image with the docker file from this PR and it does install testbench in the environment:

[root@48ce3479dcd7 /]# mamba run -n adapters /bin/bash -c "pip list"
Package                      Version
---------------------------- --------
certifi                      2024.6.2
charset-normalizer           3.3.2
click                        8.1.7
crc32c                       2.3
Flask                        2.2.4
googleapis-common-protos     1.59.0
googleapis-storage-testbench 0.33.0
[...]

But when I run the same command in the tmp container used in ci nothing is installed:

[root@6502d6f7f833 /]# mamba run -n adapters /bin/bash -c "pip list"
Package    Version
---------- -------
pip        24.0
setuptools 70.0.0
wheel      0.43.0

You probably need to just rebuild and push your container?

@majetideepak
Copy link
Collaborator Author

@assignUser thanks! I was using podman to build the image in a background shell. I now see the following log

WARN[0923] SHELL is not supported for OCI image format, [mamba run -n adapters /bin/bash -c] will be ignored. Must use `docker` format 
--> 5bc97a7b2e85
STEP 13/15: ENV HADOOP_HOME=/usr/local/hadoop     HADOOP_ROOT_LOGGER="WARN,DRFA"     LC_ALL=C     LIBHDFS3_CONF=/velox/scripts/hdfs-client.xml     PATH=/usr/local/hadoop/bin:${PATH}
WARN[0936] SHELL is not supported for OCI image format, [mamba run -n adapters /bin/bash -c] will be ignored. Must use `docker` format 
--> db9e7d78910f
STEP 14/15: ENTRYPOINT ["/bin/bash", "-c", "source /opt/rh/gcc-toolset-12/enable && exec \"$@\"", "--"]
WARN[0936] SHELL is not supported for OCI image format, [mamba run -n adapters /bin/bash -c] will be ignored. Must use `docker` format 
--> d71f700d8e86
STEP 15/15: CMD ["/bin/bash"]
COMMIT ghcr.io/facebookincubator/velox-dev:adapters
WARN[0936] SHELL is not supported for OCI image format, [mamba run -n adapters /bin/bash -c] will be ignored. Must use `docker` format 
--> 56d9ef7c366a

@majetideepak
Copy link
Collaborator Author

There is one last test failure velox_hive_connector_test which happens only in the release build. I am looking into that.

@majetideepak
Copy link
Collaborator Author

I opened a separate PR for the last commit #10079

@majetideepak majetideepak marked this pull request as ready for review June 6, 2024 03:51
@majetideepak
Copy link
Collaborator Author

All tests are now passing.

facebook-github-bot pushed a commit that referenced this pull request Jun 7, 2024
Summary:
I noticed in #9903 that the upload errors due to missing credentials. This PR adds a guard that allows the job to finish ✔️ even from a fork. It will still run the upload (for testing) when the PR is from within the main repo.

Pull Request resolved: #10080

Reviewed By: bikramSingh91

Differential Revision: D58244591

Pulled By: kevinwilfong

fbshipit-source-id: e75bf84292ad046dfc85be9f1a3b813af474cf11
facebook-github-bot pushed a commit that referenced this pull request Jun 7, 2024
Summary:
Pass by value to avoid strict aliasing issues for REAL and DOUBLE types.
We are seeing the HivePartitionFunctionTest failing with GCC12 here #9903

Pull Request resolved: #10079

Reviewed By: Yuhta

Differential Revision: D58294082

Pulled By: kevinwilfong

fbshipit-source-id: 765af472a35d1f8f2b619d5c58ee4399a8359ee2
Copy link
Contributor

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@kgpai kgpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Can mark for merge after fix of nit.

@@ -37,7 +37,7 @@ jobs:
name: Linux ${{ matrix.type }} with adapters
if: ${{ github.repository == 'facebookincubator/velox' }}
runs-on: ${{ matrix.runner }}
container: ghcr.io/facebookincubator/velox-dev:adapters
container: docker.io/majetideepak4/velox:adapters9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I presume this was for testing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@kgpai kgpai added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Jun 12, 2024
@facebook-github-bot
Copy link
Contributor

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kgpai merged this pull request in f9f6d82.

wypb pushed a commit to wypb/velox that referenced this pull request Jun 12, 2024
Summary:
Resolves facebookincubator#9879

Pull Request resolved: facebookincubator#9903

Reviewed By: kevinwilfong

Differential Revision: D58491851

Pulled By: kgpai

fbshipit-source-id: 5edbb6c9b5eff40adc0888291a9107c078d1125e
Copy link

Conbench analyzed the 1 benchmark run on commit f9f6d822.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

deepashreeraghu pushed a commit to deepashreeraghu/velox that referenced this pull request Jun 13, 2024
…bator#10080)

Summary:
I noticed in facebookincubator#9903 that the upload errors due to missing credentials. This PR adds a guard that allows the job to finish ✔️ even from a fork. It will still run the upload (for testing) when the PR is from within the main repo.

Pull Request resolved: facebookincubator#10080

Reviewed By: bikramSingh91

Differential Revision: D58244591

Pulled By: kevinwilfong

fbshipit-source-id: e75bf84292ad046dfc85be9f1a3b813af474cf11
deepashreeraghu pushed a commit to deepashreeraghu/velox that referenced this pull request Jun 13, 2024
Summary:
Pass by value to avoid strict aliasing issues for REAL and DOUBLE types.
We are seeing the HivePartitionFunctionTest failing with GCC12 here facebookincubator#9903

Pull Request resolved: facebookincubator#10079

Reviewed By: Yuhta

Differential Revision: D58294082

Pulled By: kevinwilfong

fbshipit-source-id: 765af472a35d1f8f2b619d5c58ee4399a8359ee2
deepashreeraghu pushed a commit to deepashreeraghu/velox that referenced this pull request Jun 13, 2024
Summary:
Resolves facebookincubator#9879

Pull Request resolved: facebookincubator#9903

Reviewed By: kevinwilfong

Differential Revision: D58491851

Pulled By: kgpai

fbshipit-source-id: 5edbb6c9b5eff40adc0888291a9107c078d1125e
@majetideepak majetideepak deleted the centos9-support branch June 21, 2024 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move to Centos9 Stream as Centos8 Stream is going EOL 2024-05-31
7 participants