Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SERVER] Refresh dev branch with recent snowpark changes #1537

Merged

Conversation

sfc-gh-lspiegelberg
Copy link
Contributor

Updates our server branch with recent snowpark changes.

sfc-gh-lspiegelberg and others added 30 commits April 29, 2024 20:01
   There were a few issues which prevent the daily snowpark pandas API jenkins job to run successfully:
      - tox.ini had `MODIN_PYTEST_NO_COV_CMD` missing
      - the jenkins invocation script requires tox to be installed
      - the tox_install_cmd.sh was using sh, but should be bash and variables were not defined.
   this PR addresses these issues.
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1348621

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Forgot to update the changelog in the PR with the fix, therefore
updating pandas changelog in this PR. Previous PR fixed using DataFrame
keys with `__getitem__`.

---------

Co-authored-by: Andong Zhan <andong.zhan@snowflake.com>
…1453)

Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1355199

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

   Fix incorrect regex used in DataFrame/Series.replace.
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1316977

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Remove fallback for all dataframe and series APIs. This list is
primarily driven from
https://docs.google.com/document/d/1uMwNgLqFhtoAFeEj59XjR3uKQ77QjA84aTmv8Erbvw0/edit#heading=h.qhdbmdhgqdh5.
There still might be more fallback which are not documented. This will
be cleaned up in
https://snowflakecomputing.atlassian.net/browse/SNOW-1347394
NOTE: I have kept the existing tests mostly unmodified even though there
is redundancy. This can be useful when we implement these APIs.
NOTE: This also require updates in docs/*supported.rst files. These
files are not yet ported from snowpandas. So will do a follow up PR to
update the docs.
Stop warning the user about our internal usage of private preview
dynamic pivot features. The user should have already been warned that
Snowpark pandas is in public or private preview. They very likely don't
know or care that we are using Snowpark DataFrame pivot() internally,
let alone that we are using private preview features of Snowpark Python.
    
I'm not adding a changelog note because we haven't released a version of
Snowpark Python with this bug yet.

---------
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes #SNOW-13220543

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Move all the snowpark docs into `docs/source/snowpark/` to prepare for
merging in Snowpark pandas documentation. Created a new `index.rst` to
serve as the landing page for Snowpark docs and Snowpark pandas docs.
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1348886

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.
Following steps result in AssertionError: row count column
"__row_count__" not found in ['"__row_position__"', '"PRICE"']
   df = pd.read_snowflake("LINEITEM")
   print(df)
   df["PRICE"].sort_values(ascending=True)

Root cause: in ordered_dataframe.sort method we are passing row_count
column to new ordered dataframe but didn't include row_count column in
projection.
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1355764

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

Tested:
https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPythonSnowPandasDailyRegressRunner/358/
…pe is unsigned int (#1441)

Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR xfails tests `test_unique_homogeneous_data` and
`test_unique_post_sort_values` to unblock daily CI runs. I filed
https://snowflakecomputing.atlassian.net/browse/SNOW-1356685 which
describes the issue.

---------

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
…+ INFER_SCHEMA option causes an error if the filename has a whitespace (#1449)
…By APIs (#1460)

Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   SNOW-1347393

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR removes fallback and raises NotImplementedError for GroupBy
APIs.

---------

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1320543

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Adding the modin/Snowpark pandas docs under `docs/source/modin`. The
autogenerated files should reside under `docs/source/modin/pandas-api`.
Added a section for Snowpark pandas in `docs/source/index.rst`.

---------

Co-authored-by: Naren Krishna <naren.krishna@snowflake.com>
…1456)

Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1057861

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Investigated deviating behavior in pandas `df.loc` and `__setitem__`
when a scalar column key is used to assign a single value DataFrame
item. This issue is present in pandas 2.x.x, pandas 1.5.x has the
correct behavior. I modified the xfail'd test to correctly test Snowpark
pandas behavior and included the pandas GitHub issue for the bug:
pandas-dev/pandas#58482
…TripData nb) (#1451)

Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1301084

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Adds VisualizingTaxiTripData.ipynb notebook to tests/notebooks/modin and
runs the notebook in daily github workflow using nbmake via
`daily_jupyter_nb_test.yml`

---------

Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
…17 (#1477)

skip all the tests in
https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPythonSnowPandasDailyRegressRunner/363/console

I'm skipping some extra tests because the failures sometimes occur for
particular parameter combinations, and it's much easier to `xfail` all
the parameter combinations.

---------
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1320543

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Change modin classpaths from `snowflake.snowpark.modin.pandas.<>` to
`modin.pandas.<>`.

To see the changes locally run:
```
cd docs
make clean; make html
open -a "Google Chrome" build/html/index.html
```

To see the docs in a nice format, change the variable `html_theme` in
`docs/source/conf.py:70` to `"alabaster"`.
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR removes Codecov/Combine Coverage from merge gate for PRs.
Codecov/Combine Coverage will still be present in the daily precommit
job.

---------

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
#1454)

Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1347394

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR removes our vendored copy of the `BaseQueryCompiler` class,
inheriting the class from upstream Modin instead. Similarly, it removes
all the operator registration classes defined in
`snowflake.snowpark.modin.core.dataframe.algebra.default2pandas`, with
one exception. Upstream Modin does not properly render the names of
`property` objects (modin-project/modin#7233),
so we should override `DataFrameDefault.register` to fix this until this
issue is fixed upstream.

This PR incidentally removes `Series.dt.week` + `Series.dt.weekofyear`,
which were already removed in pandas 2.0.

---------

Co-authored-by: Naren Krishna <naren.krishna@snowflake.com>
…sage.not_implemented. (#1472)

Use decorators that share most of their implementation to replace "This
functionality is not yet available in Snowpark pandas API" with more
informative errors. Remove the ability to call `not_implemented()`
without a custom `message`.

Eventually we will expand these decorators so they can give useful
messages for missing support for particular parameter values as well.

---------
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1347401

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Raise NotImplementedError for all Series.str API that use fallback
pandas code.
Fixes the workaround logic for DGQL bug until 8.18 is rolled out.
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1347387

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Raise not implemented error for Series.dt.* methods instead of falling
back to native pandas.
NOTE: I have kept the existing tests mostly unmodified even though there
is redundancy. This can be useful when we implement these APIs.
NOTE: This also require updates in docs/*supported.rst files. These
files are not yet ported from snowpandas. So will do a follow up PR to
update the docs.
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1359379

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.
…na (#1471)

Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1257115

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

The docstrings for the `isnull` and `isna` methods of
`DataFrame`/`Series` previously used the top-level `pd.isna`/`isnull`
documentation instead. This PR changes them to use the correct
docstring.
…1) (#1470)

Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1357700

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

pandas bug pandas-dev/pandas#57301 was fixed
in 2.2.1, so we should remove the workaround for it.
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.
sfc-gh-sfan and others added 15 commits May 6, 2024 09:28
Description

Make sure UDAF doc can be referenced in the right locations

Testing

Checked build/html

Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1284684

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.
…race (#1518)

The PR adds support for handling case sensitive name of the objects.

Constructs case sensitive names and versions
Queries DGQL by Id for distance >= 2. This also helps in going beyond deleted objects.
Please answer these questions before submitting your pull requests.
Thanks!

1. What GitHub issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   Fixes SNOW-1176072

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

   Add partial SeriesGroupBy.apply support.
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

Move Snowpark pandas modin import changelog to 1.15
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1345607

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

   Fix README/md pip install command.
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1357748

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR updates read_snowflake to use string matching for the order by
warning.
#1527)

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

[SNOW-1368640](https://snowflakecomputing.atlassian.net/browse/SNOW-1368640)

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Reproduced customer issue where `qcut` caused a lot of queries. Upon
inspection, a wrong type (`np.array`) was returned when the input is
`Series`, and some check logic was executed relying on
`to_pandas().to_numpy()`. Changes return type to be correct for input is
`Series` and reduces query count by alternative check logic.
…#1535)

There have been access issues in Azure (ubuntu) and GCP (windows) to
access data in `SNOWFLAKE_SAMPLE_DATA`. Enable running test, which will
pass if privileges are granted.
@sfc-gh-lspiegelberg sfc-gh-lspiegelberg marked this pull request as ready for review May 8, 2024 17:33
@sfc-gh-lspiegelberg sfc-gh-lspiegelberg requested review from a team as code owners May 8, 2024 17:33
@sfc-gh-lspiegelberg sfc-gh-lspiegelberg merged commit 4305cf2 into server-side-snowpark May 8, 2024
13 of 14 checks passed
@sfc-gh-lspiegelberg sfc-gh-lspiegelberg deleted the ls-refresh-server-side-snowpark branch May 8, 2024 17:34
@github-actions github-actions bot locked and limited conversation to collaborators May 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet