-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SERVER] Refresh dev branch with recent snowpark changes #1537
Merged
sfc-gh-lspiegelberg
merged 61 commits into
server-side-snowpark
from
ls-refresh-server-side-snowpark
May 8, 2024
Merged
[SERVER] Refresh dev branch with recent snowpark changes #1537
sfc-gh-lspiegelberg
merged 61 commits into
server-side-snowpark
from
ls-refresh-server-side-snowpark
May 8, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There were a few issues which prevent the daily snowpark pandas API jenkins job to run successfully: - tox.ini had `MODIN_PYTEST_NO_COV_CMD` missing - the jenkins invocation script requires tox to be installed - the tox_install_cmd.sh was using sh, but should be bash and variables were not defined. this PR addresses these issues.
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1348621 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Forgot to update the changelog in the PR with the fix, therefore updating pandas changelog in this PR. Previous PR fixed using DataFrame keys with `__getitem__`. --------- Co-authored-by: Andong Zhan <andong.zhan@snowflake.com>
…1453) Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1355199 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Fix incorrect regex used in DataFrame/Series.replace.
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1316977 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Remove fallback for all dataframe and series APIs. This list is primarily driven from https://docs.google.com/document/d/1uMwNgLqFhtoAFeEj59XjR3uKQ77QjA84aTmv8Erbvw0/edit#heading=h.qhdbmdhgqdh5. There still might be more fallback which are not documented. This will be cleaned up in https://snowflakecomputing.atlassian.net/browse/SNOW-1347394 NOTE: I have kept the existing tests mostly unmodified even though there is redundancy. This can be useful when we implement these APIs. NOTE: This also require updates in docs/*supported.rst files. These files are not yet ported from snowpandas. So will do a follow up PR to update the docs.
Stop warning the user about our internal usage of private preview dynamic pivot features. The user should have already been warned that Snowpark pandas is in public or private preview. They very likely don't know or care that we are using Snowpark DataFrame pivot() internally, let alone that we are using private preview features of Snowpark Python. I'm not adding a changelog note because we haven't released a version of Snowpark Python with this bug yet. --------- Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes #SNOW-13220543 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Move all the snowpark docs into `docs/source/snowpark/` to prepare for merging in Snowpark pandas documentation. Created a new `index.rst` to serve as the landing page for Snowpark docs and Snowpark pandas docs.
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1348886 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Following steps result in AssertionError: row count column "__row_count__" not found in ['"__row_position__"', '"PRICE"'] df = pd.read_snowflake("LINEITEM") print(df) df["PRICE"].sort_values(ascending=True) Root cause: in ordered_dataframe.sort method we are passing row_count column to new ordered dataframe but didn't include row_count column in projection.
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1355764 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. Tested: https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPythonSnowPandasDailyRegressRunner/358/
…pe is unsigned int (#1441) Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR xfails tests `test_unique_homogeneous_data` and `test_unique_post_sort_values` to unblock daily CI runs. I filed https://snowflakecomputing.atlassian.net/browse/SNOW-1356685 which describes the issue. --------- Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
…+ INFER_SCHEMA option causes an error if the filename has a whitespace (#1449)
…By APIs (#1460) Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. SNOW-1347393 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR removes fallback and raises NotImplementedError for GroupBy APIs. --------- Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1320543 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Adding the modin/Snowpark pandas docs under `docs/source/modin`. The autogenerated files should reside under `docs/source/modin/pandas-api`. Added a section for Snowpark pandas in `docs/source/index.rst`. --------- Co-authored-by: Naren Krishna <naren.krishna@snowflake.com>
…1456) Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1057861 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Investigated deviating behavior in pandas `df.loc` and `__setitem__` when a scalar column key is used to assign a single value DataFrame item. This issue is present in pandas 2.x.x, pandas 1.5.x has the correct behavior. I modified the xfail'd test to correctly test Snowpark pandas behavior and included the pandas GitHub issue for the bug: pandas-dev/pandas#58482
…TripData nb) (#1451) Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1301084 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Adds VisualizingTaxiTripData.ipynb notebook to tests/notebooks/modin and runs the notebook in daily github workflow using nbmake via `daily_jupyter_nb_test.yml` --------- Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
…17 (#1477) skip all the tests in https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPythonSnowPandasDailyRegressRunner/363/console I'm skipping some extra tests because the failures sometimes occur for particular parameter combinations, and it's much easier to `xfail` all the parameter combinations. --------- Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1320543 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Change modin classpaths from `snowflake.snowpark.modin.pandas.<>` to `modin.pandas.<>`. To see the changes locally run: ``` cd docs make clean; make html open -a "Google Chrome" build/html/index.html ``` To see the docs in a nice format, change the variable `html_theme` in `docs/source/conf.py:70` to `"alabaster"`.
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR removes Codecov/Combine Coverage from merge gate for PRs. Codecov/Combine Coverage will still be present in the daily precommit job. --------- Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
#1454) Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1347394 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR removes our vendored copy of the `BaseQueryCompiler` class, inheriting the class from upstream Modin instead. Similarly, it removes all the operator registration classes defined in `snowflake.snowpark.modin.core.dataframe.algebra.default2pandas`, with one exception. Upstream Modin does not properly render the names of `property` objects (modin-project/modin#7233), so we should override `DataFrameDefault.register` to fix this until this issue is fixed upstream. This PR incidentally removes `Series.dt.week` + `Series.dt.weekofyear`, which were already removed in pandas 2.0. --------- Co-authored-by: Naren Krishna <naren.krishna@snowflake.com>
…sage.not_implemented. (#1472) Use decorators that share most of their implementation to replace "This functionality is not yet available in Snowpark pandas API" with more informative errors. Remove the ability to call `not_implemented()` without a custom `message`. Eventually we will expand these decorators so they can give useful messages for missing support for particular parameter values as well. --------- Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
The PR is introducing a new API (lineage.trace). Detailed description of the API is in the design doc : https://docs.google.com/document/d/1vDmOWWk5BRGUb74h0UgGScfTcJ4GsyGTT5b32llvJYo/edit#heading=h.hs3dx8soa4jr
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1347401 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Raise NotImplementedError for all Series.str API that use fallback pandas code.
Fixes the workaround logic for DGQL bug until 8.18 is rolled out.
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1347387 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Raise not implemented error for Series.dt.* methods instead of falling back to native pandas. NOTE: I have kept the existing tests mostly unmodified even though there is redundancy. This can be useful when we implement these APIs. NOTE: This also require updates in docs/*supported.rst files. These files are not yet ported from snowpandas. So will do a follow up PR to update the docs.
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1359379 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue.
…na (#1471) Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1257115 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. The docstrings for the `isnull` and `isna` methods of `DataFrame`/`Series` previously used the top-level `pd.isna`/`isnull` documentation instead. This PR changes them to use the correct docstring.
…1) (#1470) Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1357700 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. pandas bug pandas-dev/pandas#57301 was fixed in 2.2.1, so we should remove the workaround for it.
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue.
Description Make sure UDAF doc can be referenced in the right locations Testing Checked build/html Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1284684 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue.
…race (#1518) The PR adds support for handling case sensitive name of the objects. Constructs case sensitive names and versions Queries DGQL by Id for distance >= 2. This also helps in going beyond deleted objects.
Please answer these questions before submitting your pull requests. Thanks! 1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. Fixes SNOW-1176072 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Add partial SeriesGroupBy.apply support.
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. Move Snowpark pandas modin import changelog to 1.15
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1345607 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Fix README/md pip install command.
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1357748 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR updates read_snowflake to use string matching for the order by warning.
#1527) 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. [SNOW-1368640](https://snowflakecomputing.atlassian.net/browse/SNOW-1368640) 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Reproduced customer issue where `qcut` caused a lot of queries. Upon inspection, a wrong type (`np.array`) was returned when the input is `Series`, and some check logic was executed relying on `to_pandas().to_numpy()`. Changes return type to be correct for input is `Series` and reduces query count by alternative check logic.
…xpression` to apply to Local Testing (#1520)
…#1535) There have been access issues in Azure (ubuntu) and GCP (windows) to access data in `SNOWFLAKE_SAMPLE_DATA`. Enable running test, which will pass if privileges are granted.
sfc-gh-azwiegincew
approved these changes
May 8, 2024
4305cf2
into
server-side-snowpark
13 of 14 checks passed
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates our server branch with recent snowpark changes.