Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): support hdfs as non-s3 object store #7283

Merged
merged 56 commits into from Feb 10, 2023
Merged

Conversation

wcy-fdu
Copy link
Contributor

@wcy-fdu wcy-fdu commented Jan 10, 2023

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

support hdfs by using opendal, and use opendal memory engine for unit tests.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.

Refer to a related PR or issue link (optional)

part of #7310

@wcy-fdu wcy-fdu marked this pull request as draft January 10, 2023 06:13
@neverchanje neverchanje added the user-facing-changes Contains changes that are visible to users label Jan 10, 2023
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

src/object_store/src/object/opendal_engine.rs Outdated Show resolved Hide resolved
src/risedevtool/src/task/utils.rs Outdated Show resolved Hide resolved
src/object_store/src/object/mod.rs Outdated Show resolved Hide resolved
risedev.yml Outdated Show resolved Hide resolved
risedev.yml Outdated Show resolved Hide resolved
src/object_store/src/object/opendal_engine.rs Outdated Show resolved Hide resolved
src/object_store/src/object/opendal_engine.rs Outdated Show resolved Hide resolved
src/object_store/src/object/opendal_engine.rs Outdated Show resolved Hide resolved
src/object_store/src/object/opendal_engine.rs Outdated Show resolved Hide resolved
src/object_store/src/object/opendal_engine.rs Outdated Show resolved Hide resolved
@wcy-fdu wcy-fdu requested a review from hzxa21 January 13, 2023 09:36
@wcy-fdu wcy-fdu marked this pull request as ready for review January 13, 2023 09:36
@codecov
Copy link

codecov bot commented Feb 9, 2023

Codecov Report

Merging #7283 (40242a2) into main (91518f1) will decrease coverage by 0.03%.
The diff coverage is 57.95%.

@@            Coverage Diff             @@
##             main    #7283      +/-   ##
==========================================
- Coverage   71.75%   71.73%   -0.03%     
==========================================
  Files        1108     1109       +1     
  Lines      176521   176796     +275     
==========================================
+ Hits       126669   126824     +155     
- Misses      49852    49972     +120     
Flag Coverage Δ
rust 71.73% <57.95%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/object_store/src/object/mod.rs 51.19% <0.00%> (-0.33%) ⬇️
src/risedevtool/src/bin/risedev-compose.rs 0.43% <0.00%> (-0.01%) ⬇️
src/risedevtool/src/bin/risedev-dev.rs 0.26% <0.00%> (-0.02%) ⬇️
src/risedevtool/src/config.rs 0.00% <0.00%> (ø)
src/risedevtool/src/risectl_env.rs 0.00% <0.00%> (ø)
src/risedevtool/src/service_config.rs 0.00% <0.00%> (ø)
src/risedevtool/src/task/compactor_service.rs 0.00% <0.00%> (ø)
src/risedevtool/src/task/compute_node_service.rs 0.00% <0.00%> (ø)
src/risedevtool/src/task/utils.rs 0.00% <0.00%> (ø)
src/object_store/src/object/error.rs 47.22% <33.33%> (-4.63%) ⬇️
... and 9 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Xuanwo
Copy link
Contributor

Xuanwo commented Feb 9, 2023

Congrats!

image

@wcy-fdu wcy-fdu requested a review from hzxa21 February 10, 2023 05:23
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM! Great work!

Comment on lines +876 to +883
opendal:
id: opendal

engine: hdfs

namenode: 127.0.0.1:9000"

root: risingwave
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some documentations for each field like other sections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will integrate hdfs in risedev in next PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May remove this or at least add a TODO here.

Comment on lines +128 to +133
# - use: prometheus
# - use: grafana
# - use: zookeeper
# persist-data: true
# - use: kafka
# persist-data: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: remove if unused

@wcy-fdu wcy-fdu added the mergify/can-merge Indicates that the PR can be added to the merge queue label Feb 10, 2023
@mergify mergify bot merged commit dd364f4 into main Feb 10, 2023
@mergify mergify bot deleted the wcy/support_hdfs branch February 10, 2023 08:01
@@ -652,6 +671,9 @@ template:
# Minio instances used by this compute node
provide-minio: "minio*"

# AWS s3 bucket used by this compute node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we provide correct documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this typo, will fix it in next PR.

@@ -49,6 +49,12 @@ services:
image: public.ecr.aws/x5u3w5h6/rw-build-env:v20230208_05
volumes:
- ..:/risingwave

rw-build-env-hdfs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need another image? It should be okay for us to integrate the hdfs environment in the default image, just like Kafka, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently our build image is about 2.7G, and adding hadoop environment will raise it to 3.7G, which is really heavy. After some offline discussion, we decide to build an image with hadoop only used in CI check, and do not include hdfs in e2e tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, there were the steps of building the hdfs env in the Dockerfile on Feb.9 in this PR, so we get an image that can be used for hdfs tests by specifying this image tag here. However, this seems to prevent us from updating the image in the future like bumping the sqllogictest version by specifying a fixed image tag here. cc @xxchan

Copy link
Contributor Author

@wcy-fdu wcy-fdu Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there were the steps of building the hdfs env in the Dockerfile on Feb.9 in this PR

Yes, that is to build the special hdfs image, and if we keep the steps of building the hdfs env in the Dockerfile, every time someone change this Dockerfile, it will build a new image with hdfs, but we only used in one step in pull request. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mergify/can-merge Indicates that the PR can be added to the merge queue type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants