Skip to content

[data] add hive catalog#51638

Closed
Jay-ju wants to merge 1 commit intoray-project:masterfrom
Jay-ju:support-hive-catalog
Closed

[data] add hive catalog#51638
Jay-ju wants to merge 1 commit intoray-project:masterfrom
Jay-ju:support-hive-catalog

Conversation

@Jay-ju
Copy link
Contributor

@Jay-ju Jay-ju commented Mar 24, 2025

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@Jay-ju Jay-ju requested a review from a team as a code owner March 24, 2025 11:59
@Jay-ju Jay-ju force-pushed the support-hive-catalog branch 4 times, most recently from 7cb39a5 to ee38519 Compare March 24, 2025 12:44
@jcotant1 jcotant1 added the data Ray Data-related issues label Mar 24, 2025
@Jay-ju Jay-ju force-pushed the support-hive-catalog branch 5 times, most recently from a6032f4 to 27487e3 Compare March 28, 2025 05:56
@hainesmichaelc hainesmichaelc added the community-contribution Contributed by the community label Apr 4, 2025
@gvspraveen
Copy link
Contributor

@Bye-legumes Thanks for the contribution. Can you please fix the test failures?


[2025-03-28T07:24:35Z] python/ray/data/tests/test_hive_catalog.py::test_file_formats[PARQUET-parquet] -- Test timed out at 2025-03-28 07:22:34 UTC --
--
  | [2025-03-28T07:24:35Z] ================================================================================
  | [2025-03-28T07:24:35Z] ==================== Test output for //python/ray/data:test_hive_catalog:
  | [2025-03-28T07:24:35Z] ============================= test session starts ==============================
  | [2025-03-28T07:24:35Z] platform linux -- Python 3.12.9, pytest-7.4.4, pluggy-1.3.0 -- /opt/miniconda/bin/python3
  | [2025-03-28T07:24:35Z] cachedir: .pytest_cache
  | [2025-03-28T07:24:35Z] rootdir: /root/.cache/bazel/_bazel_root/1df605deb6d24fc8068f6e25793ec703/execroot/com_github_ray_project_ray
  | [2025-03-28T07:24:35Z] configfile: pytest.ini
  | [2025-03-28T07:24:35Z] plugins: mock-3.14.0, repeat-0.9.3, anyio-3.7.1, fugue-0.8.7, aiohttp-1.1.0, asyncio-0.17.2, docker-tools-3.1.3, forked-1.4.0, httpserver-1.0.6, lazy-fixture-0.6.3, remotedata-0.3.2, rerunfailures-11.1.2, sphinx-0.5.1.dev0, sugar-0.9.5, timeout-2.1.0, typeguard-2.13.3
  | [2025-03-28T07:24:35Z] asyncio: mode=Mode.AUTO
  | [2025-03-28T07:24:35Z] timeout: 180.0s
  | [2025-03-28T07:24:35Z] timeout method: signal
  | [2025-03-28T07:24:35Z] timeout func_only: False
  | [2025-03-28T07:24:35Z] collecting ... collected 5 items
  | [2025-03-28T07:24:35Z]
  | [2025-03-28T07:24:35Z] python/ray/data/tests/test_hive_catalog.py::test_file_formats[PARQUET-parquet] -- Test timed out at 2025-03-28 07:23:34 UTC --
  | [2025-03-28T07:24:35Z] ================================================================================
  | [2025-03-28T07:24:35Z] ==================== Test output for //python/ray/data:test_hive_catalog:
  | [2025-03-28T07:24:35Z] ============================= test session starts ==============================
  | [2025-03-28T07:24:35Z] platform linux -- Python 3.12.9, pytest-7.4.4, pluggy-1.3.0 -- /opt/miniconda/bin/python3
  | [2025-03-28T07:24:35Z] cachedir: .pytest_cache
  | [2025-03-28T07:24:35Z] rootdir: /root/.cache/bazel/_bazel_root/1df605deb6d24fc8068f6e25793ec703/execroot/com_github_ray_project_ray
  | [2025-03-28T07:24:35Z] configfile: pytest.ini
  | [2025-03-28T07:24:35Z] plugins: mock-3.14.0, repeat-0.9.3, anyio-3.7.1, fugue-0.8.7, aiohttp-1.1.0, asyncio-0.17.2, docker-tools-3.1.3, forked-1.4.0, httpserver-1.0.6, lazy-fixture-0.6.3, remotedata-0.3.2, rerunfailures-11.1.2, sphinx-0.5.1.dev0, sugar-0.9.5, timeout-2.1.0, typeguard-2.13.3
  | [2025-03-28T07:24:35Z] asyncio: mode=Mode.AUTO
  | [2025-03-28T07:24:35Z] timeout: 180.0s
  | [2025-03-28T07:24:35Z] timeout method: signal
  | [2025-03-28T07:24:35Z] timeout func_only: False
  | [2025-03-28T07:24:35Z] collecting ... collected 5 items
  | [2025-03-28T07:24:35Z]
  | [2025-03-28T07:24:35Z] python/ray/data/tests/test_hive_catalog.py::test_file_formats[PARQUET-parquet] -- Test timed out at 2025-03-28 07:24:34 UTC --

@gvspraveen gvspraveen added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Apr 30, 2025
@Jay-ju
Copy link
Contributor Author

Jay-ju commented May 21, 2025

@Bye-legumes Thanks for the contribution. Can you please fix the test failures?


[2025-03-28T07:24:35Z] python/ray/data/tests/test_hive_catalog.py::test_file_formats[PARQUET-parquet] -- Test timed out at 2025-03-28 07:22:34 UTC --
--
  | [2025-03-28T07:24:35Z] ================================================================================
  | [2025-03-28T07:24:35Z] ==================== Test output for //python/ray/data:test_hive_catalog:
  | [2025-03-28T07:24:35Z] ============================= test session starts ==============================
  | [2025-03-28T07:24:35Z] platform linux -- Python 3.12.9, pytest-7.4.4, pluggy-1.3.0 -- /opt/miniconda/bin/python3
  | [2025-03-28T07:24:35Z] cachedir: .pytest_cache
  | [2025-03-28T07:24:35Z] rootdir: /root/.cache/bazel/_bazel_root/1df605deb6d24fc8068f6e25793ec703/execroot/com_github_ray_project_ray
  | [2025-03-28T07:24:35Z] configfile: pytest.ini
  | [2025-03-28T07:24:35Z] plugins: mock-3.14.0, repeat-0.9.3, anyio-3.7.1, fugue-0.8.7, aiohttp-1.1.0, asyncio-0.17.2, docker-tools-3.1.3, forked-1.4.0, httpserver-1.0.6, lazy-fixture-0.6.3, remotedata-0.3.2, rerunfailures-11.1.2, sphinx-0.5.1.dev0, sugar-0.9.5, timeout-2.1.0, typeguard-2.13.3
  | [2025-03-28T07:24:35Z] asyncio: mode=Mode.AUTO
  | [2025-03-28T07:24:35Z] timeout: 180.0s
  | [2025-03-28T07:24:35Z] timeout method: signal
  | [2025-03-28T07:24:35Z] timeout func_only: False
  | [2025-03-28T07:24:35Z] collecting ... collected 5 items
  | [2025-03-28T07:24:35Z]
  | [2025-03-28T07:24:35Z] python/ray/data/tests/test_hive_catalog.py::test_file_formats[PARQUET-parquet] -- Test timed out at 2025-03-28 07:23:34 UTC --
  | [2025-03-28T07:24:35Z] ================================================================================
  | [2025-03-28T07:24:35Z] ==================== Test output for //python/ray/data:test_hive_catalog:
  | [2025-03-28T07:24:35Z] ============================= test session starts ==============================
  | [2025-03-28T07:24:35Z] platform linux -- Python 3.12.9, pytest-7.4.4, pluggy-1.3.0 -- /opt/miniconda/bin/python3
  | [2025-03-28T07:24:35Z] cachedir: .pytest_cache
  | [2025-03-28T07:24:35Z] rootdir: /root/.cache/bazel/_bazel_root/1df605deb6d24fc8068f6e25793ec703/execroot/com_github_ray_project_ray
  | [2025-03-28T07:24:35Z] configfile: pytest.ini
  | [2025-03-28T07:24:35Z] plugins: mock-3.14.0, repeat-0.9.3, anyio-3.7.1, fugue-0.8.7, aiohttp-1.1.0, asyncio-0.17.2, docker-tools-3.1.3, forked-1.4.0, httpserver-1.0.6, lazy-fixture-0.6.3, remotedata-0.3.2, rerunfailures-11.1.2, sphinx-0.5.1.dev0, sugar-0.9.5, timeout-2.1.0, typeguard-2.13.3
  | [2025-03-28T07:24:35Z] asyncio: mode=Mode.AUTO
  | [2025-03-28T07:24:35Z] timeout: 180.0s
  | [2025-03-28T07:24:35Z] timeout method: signal
  | [2025-03-28T07:24:35Z] timeout func_only: False
  | [2025-03-28T07:24:35Z] collecting ... collected 5 items
  | [2025-03-28T07:24:35Z]
  | [2025-03-28T07:24:35Z] python/ray/data/tests/test_hive_catalog.py::test_file_formats[PARQUET-parquet] -- Test timed out at 2025-03-28 07:24:34 UTC --

@gvspraveen The address tested here is my local address. There is no problem with local testing. In the test environment, due to the absence of the HMS server, a timeout error will be reported. However, I have added a check on whether the link is accessible in the UT to determine whether to run the test cases.

@Jay-ju Jay-ju force-pushed the support-hive-catalog branch from dde7547 to cf10c2a Compare May 21, 2025 02:10
Signed-off-by: jukejian <jukejian@bytedance.com>
@github-actions
Copy link

github-actions bot commented Jun 7, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 7, 2025
@github-actions
Copy link

This pull request has been automatically closed because there has been no more activity in the 14 days
since being marked stale.

Please feel free to reopen or open a new pull request if you'd still like this to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for your contribution!

@github-actions github-actions bot closed this Jun 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. stale The issue is stale. It will be closed within 7 days unless there are further conversation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments