-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1335071: add method DataFrame.transform #1400
Labels
feature
New feature or request
status-triage_done
Initial triage done, will be further handled by the driver team
Comments
github-actions
bot
changed the title
add method DataFrame.transform
SNOW-1335071: add method DataFrame.transform
Apr 18, 2024
thanks for your feedback! we will look into supporting this |
sfc-gh-dszmolka
added
the
status-triage_done
Initial triage done, will be further handled by the driver team
label
Apr 29, 2024
5 tasks
sfc-gh-mvashishtha
added a commit
that referenced
this issue
May 4, 2024
) Currently the template says "What GitHub issue is this PR addressing", but we only want Jira numbers. We should always add a Snowflake JIRA number, even if a GitHub issue exists. If a user creates a GitHub issue and wants to reference it in a PR, a bot will create a SNOW-jira ticket for them, as in #1400. --------- Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
sfc-gh-vbudati
pushed a commit
that referenced
this issue
May 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. Move Snowpark pandas modin import changelog to 1.15
sfc-gh-vbudati
pushed a commit
that referenced
this issue
May 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. Move Snowpark pandas modin import changelog to 1.15
sfc-gh-vbudati
added a commit
that referenced
this issue
May 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1345607 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Fix README/md pip install command.
sfc-gh-vbudati
added a commit
that referenced
this issue
May 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1345607 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Fix README/md pip install command.
sfc-gh-rdurrani
added a commit
that referenced
this issue
May 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1357748 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR updates read_snowflake to use string matching for the order by warning.
sfc-gh-nkrishna
added a commit
that referenced
this issue
May 8, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR adds double quotes to the pip install message users see when installing Modin to accomodate for zsh. Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
sfc-gh-joshi
added a commit
that referenced
this issue
May 8, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1370365 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR avoids UNION ALL operations for computing quantiles over 1-column datasets. This optimization has significant implications for `pd.qcut`, which frequently computes a large number of quantiles and previously would had extremely high union counts in queries. In particular, `test_qcut.py::test_qcut_two_columns` goes from 90 unions -> 0 unions, 34 joins -> 14 joins; and `series/test_quantile.py::test_quantile_large' goes from ~80 queries -> 6 queries.
sfc-gh-vbudati
added a commit
that referenced
this issue
May 9, 2024
) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1348621 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Bug: ```py # Performing the following loc operation would fail. >>> df = pd.DataFrame( { "one": pd.Series(np.random.randn(3), index=["a", "b", "c"]), "two": pd.Series(np.random.randn(4), index=["a", "b", "c", "d"]), "three": pd.Series(np.random.randn(3), index=["b", "c", "d"]), } ) >>> df2 = df.copy() >>> df2.loc["a", "three"] = 1.0 # However, when you take a closer look the issue is not with loc set but with the way the DataFrame was being generated. Ignore the numbers inside since these are randomly generated. >>> df2 one two three # <-- notice how there are two rows of column names instead of one row one two three a -0.238524 0.900504 NaN b -1.603478 -0.715938 0.786343 c -0.603704 -1.046051 0.371374 d NaN -0.019357 0.353722 NotImplementedError: loc set for multiindex is not yet implemented # Expected result: >>> df2 one two three # <-- only one row a 0.357285 -1.225845 NaN b 0.709229 1.120475 1.551948 c -2.173472 0.682472 -0.738533 d NaN -1.211516 0.222008 ``` This is a bug in concat on axis=1 when all the objects are Series.
sfc-gh-azhan
added a commit
that referenced
this issue
May 9, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1357611 Fix all quarantined pandas tests for 8.18 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. remove all skipped tests from SNOW-1358681
sfc-gh-azhan
added a commit
that referenced
this issue
May 9, 2024
…lready (#1547) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1348919 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. This bug has been fixed in #1533
sfc-gh-nkrishna
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1296779, SNOW-1254730 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR removes unused sproc fallback code from Snowpark pandas, not that all APIs using fallback have been replaced with NotImplementedError. --------- Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
sfc-gh-azhan
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1347052 Update pandas API PuPr warning messages 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. - change the words from "private preview" to "public preview" --------- Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-mvashishtha
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1374343 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. CI times for this PR: - [GCP](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800489292?pr=1553): 25 minutes - [AWS](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800489826?pr=1553): 20 minutes - [Azure](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800490421?pr=1553): 22 minutes These are a bit better than the usual times, but it's hard to tell because CI time is so variable (probably dependent on how many different jobs are running at the same time-- see SNOW-1347210). Let's make this commit and see whether the warehouses can handle the extra load without too much queueing. We have set `MAX_CONCURRENCY_LEVEL` on the warehouse for each cloud provider.
sfc-gh-nkumar
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1375037 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Rewrite parts of qcut implementation to avoid joins completely. This results in significant performance improvement of qcut. With this change overall runtime of benchmark notebook reduced from 297 seconds to 66 seconds and number of sql queries reduced from 521 to 91.
sfc-gh-vbudati
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1361200 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. - The bug here is caused due to the implementation of `create_udtf_for_groupby_apply`. Groupby apply/transform use this method to create a UDTF. - When multiple groupby apply/transform operations are performed on the same DataFrame, this method is called multiple times. However, it uses a fixed name for the column labels used to create the OrderedDataFrame used with the UDTF. This is what causes the issue - "ambiguous column name 'ROW_POSITION_WITHIN_GROUP'". - "'ROW_POSITION_WITHIN_GROUP'" is a column label created and used by `create_udtf_for_groupby_apply`. - Similarly, this issue occurs with the "'ORIGINAL_ROW_POSITION'"column label. - To solve this issue, I appended a random number at the end of these column labels to prevent the collision and eradicate the error.
sfc-gh-nkrishna
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1374306 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR adds a docstring for resample. --------- Signed-off-by: Naren Krishna <naren.krishna@snowflake.com> Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
sfc-gh-stan
added a commit
that referenced
this issue
May 10, 2024
…ock tests (#1561) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1063738 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR changes `tox -e local` to run all tests in `tests/integ`, `tests/unit` and `tests/mock` (previously mock_unit) against Local Testing (except for modin tests specified in `SNOWFLAKE_PYTEST_IGNORE_MODIN_CMD`).
sfc-gh-vbudati
added a commit
that referenced
this issue
May 13, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1326280 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Update the links in the documentation for `to_snowpark_pandas` from the LIMITED-ACCESS version to public! These links will not work until all of the Snowpark pandas documentation is public.
sfc-gh-aalam
added a commit
that referenced
this issue
May 13, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue.
sfc-gh-joshi
added a commit
that referenced
this issue
May 13, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1373790 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR prunes some duplicated + unsupported methods from Snowpark pandas API docs, and adds more comprehensive docstrings (mostly copied from pandas) for other methods. This PR does not add any doctests, nor does it change any meaningful code. All changes are listed below: _De-duplicated listings_: - Series - is_unique - duplicated - DataFrame - round Note that Series/DF head and tail are deliberately left duplicated; this matches pandas documentation, as they are mentioned under both the "Indexing, iteration" and "Reindexing / selection" headings. _Removed unimplemented listing_: - Series - kurtosis - SeriesGroupBy - apply - Resampler - groups - indices - get_group - apply - aggregate - transform - bfill - nearest - fillna - asfreq - nunique - first - last - interpolate - ohlc - pad - pipe - prod - quantile - sem - size - Rolling - aggregate - apply - corr - count - cov - kurt - median - quantile - rank - sem - skew _Added implemented listing_: I did not comprehensively look for implemented methods that were not listed, these were just a few methods that I noticed in the course of checking other APIs. - SeriesGroupBy - head - idxmax - idxmin - nunique - tail _Improved documentation_: - pd - qcut (I'm not sure why it wasn't inheriting pandas docs, but we should override them anyway since we don't implement all parameters) - BasePandasDataset - convert_dtypes - rename_axis - values - ffill/pad - Series - name - empty - hasnans - ndim - shape - rename_axis - quantile - DataFrame - empty - quantile - select_dtypes - GroupBy - std - var - rank - nunique - quantile - __iter__ --------- Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-lmukhopadhyay
added a commit
that referenced
this issue
May 14, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1373899 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Updates notebook testing workflow with increased cell timeout, and adds SnowparkPandasAPIDemo.ipynb notebook from customer demo and SnowflakeChainTesting.ipynb which was previously blocked. --------- Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
sfc-gh-oplaton
added a commit
that referenced
this issue
May 14, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-0 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [x] I am adding a new dependency 3. Please describe how your code solves the related issue. Update `ast_pb2.py` (already present in the repository). Add the `setuptools` dependencies required for development. Include the module path for `ast_pb2.py` in the manifest, so that the file makes it into the Snowpark wheel.
sfc-gh-vbudati
added a commit
that referenced
this issue
May 14, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1375263 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Updating the changelog to reflect what was actually released with v1.15.0a1 and what is new.
sfc-gh-rdurrani
added a commit
that referenced
this issue
Oct 29, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1748174 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Add support for `size` in `groupby.agg` .
sfc-gh-jdu
added a commit
that referenced
this issue
Oct 29, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1770289 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. We don't have such an test before. The generated query is like: ``` WITH SNOWPARK_TEMP_CTE_N7V3KVPKOM AS (with t as (select 1 as A) select * from t) SELECT count(1) AS "COUNT(LITERAL())" FROM ( SELECT * FROM ( SELECT * FROM ( SELECT "A" FROM (( SELECT * FROM (SNOWPARK_TEMP_CTE_N7V3KVPKOM)) UNION ( SELECT * FROM (SNOWPARK_TEMP_CTE_N7V3KVPKOM)))) WHERE True :: BOOLEAN) ORDER BY "A" ASC NULLS FIRST) LIMIT 1 ```
sfc-gh-helmeleegy
added a commit
that referenced
this issue
Oct 30, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1773962 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Fix changelog for 1.25.0.
sfc-gh-jdu
added a commit
that referenced
this issue
Oct 31, 2024
…ntifiers (#2526) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1748403 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. We can use Aggregare.aggregate_expressions directly for quoted identifiers
sfc-gh-helmeleegy
added a commit
that referenced
this issue
Oct 31, 2024
…le/read_html/read_xml (#2540) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1748104,SNOW-1748107, SNOW-1748108 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Add support for pd.read_pickle/read_html/read_xml.
sfc-gh-joshi
added a commit
that referenced
this issue
Oct 31, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1652384 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. `test_to_numpy` previously did not actually test `to_numpy`, and instead ran `Series.to_list`. This PR properly uses `Series.to_numpy`, and adds a new test for `Series.to_list`.
sfc-gh-jdu
added a commit
that referenced
this issue
Nov 1, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1764136 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue.
sfc-gh-azhan
added a commit
that referenced
this issue
Nov 5, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1491199 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. Make sure protobuf version compatible with streamlit and snowbook.
sfc-gh-lninobrijaldo
pushed a commit
that referenced
this issue
Nov 5, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> ADHOC: Fix a misspelling 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-joshi
added a commit
that referenced
this issue
Nov 5, 2024
… general functions (#2532) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1646980 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. This PR allows separate declaration of docstrings in `modin/plugin/docstrings/io.py` and `modin/plugin/docstrings/general.py` for functions defined in the respective override files. Override functions in these namespaces should have the `_inherit_docstrings` deocrator attached; see `modin/plugin/extensions/{io_overrides,general_overrides}.py` for examples of how to do this. `pd.read_excel` had an identical frontend implementation to upstream modin; with this docs change we can now override docstrings for top-level functions defined upstream, so this has been removed from our codebase. I've verified that the output of `help(pd.read_excel)` and generated documentation for the function have not changed.
sfc-gh-jdu
added a commit
that referenced
this issue
Nov 6, 2024
…#2567) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1786772 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. We can track the lifetime of a temp table then
sfc-gh-yzou
added a commit
that referenced
this issue
Nov 6, 2024
…r x (#2568) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1764119 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. np.where with scalar x today requires 1) temp table creation for broadcasting the scalar x to shape of cond 2) a join when doing pandas where However, for such case, there should be no need of the extra temp table creation and join. In this change we removes the unnecessary temp table creation and join by using the indexing for scalar cast.
sfc-gh-lninobrijaldo
pushed a commit
that referenced
this issue
Nov 6, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> ADHOC: Fix a misspelling 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-lninobrijaldo
pushed a commit
that referenced
this issue
Nov 6, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> ADHOC: Fix a misspelling 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-jdu
added a commit
that referenced
this issue
Nov 6, 2024
) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1786772 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. The original atexit order is wrong because SnowflakeConnection from connector will be called before Snowpark session, so when we close the connection, it is already closed
sfc-gh-azhan
added a commit
that referenced
this issue
Nov 6, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1491199 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. This PR removed Protobuf generated code from source and add logic to generate them when it is need. This pull request includes the addition of a script to install `protoc` and updates to multiple GitHub Actions workflows to use this script. Additionally, there is an update to the `CODEOWNERS` file. ### Installation and usage of `protoc`: * Added a new script `install_protoc.sh` to install `protoc` on GitHub Actions runners. This script handles downloading and installing the appropriate version of `protoc` based on the operating system and architecture. ### Code ownership: * Updated the `CODEOWNERS` file to add ownership for the `tests/unit/ast/` directory to the `@snowflakedb/snowpark-ir` team. ### Test passed Daily pandas precommit test: https://github.com/snowflakedb/snowpark-python/actions/runs/11645868643/job/32429396169 Daily notebook precommit test: https://github.com/snowflakedb/snowpark-python/actions/runs/11645871020 Daily precommit test: https://github.com/snowflakedb/snowpark-python/actions/runs/11645600647/job/32428853317 pandas Sproc precommit test: https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPandasStoredProcPrecommitTest/185/console ### Development Behavior Change After this PR, when developers try to pip install source code or use tox, they need to make sure `protoc` is installed. When pip install from wheel file, no `protoc` is required since the generated code should exist in the wheel file.
sfc-gh-azhan
added a commit
that referenced
this issue
Nov 6, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1491199 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. This PR removed Protobuf generated code from source and add logic to generate them when it is need. This pull request includes the addition of a script to install `protoc` and updates to multiple GitHub Actions workflows to use this script. Additionally, there is an update to the `CODEOWNERS` file. * Added a new script `install_protoc.sh` to install `protoc` on GitHub Actions runners. This script handles downloading and installing the appropriate version of `protoc` based on the operating system and architecture. * Updated the `CODEOWNERS` file to add ownership for the `tests/unit/ast/` directory to the `@snowflakedb/snowpark-ir` team. Daily pandas precommit test: https://github.com/snowflakedb/snowpark-python/actions/runs/11645868643/job/32429396169 Daily notebook precommit test: https://github.com/snowflakedb/snowpark-python/actions/runs/11645871020 Daily precommit test: https://github.com/snowflakedb/snowpark-python/actions/runs/11645600647/job/32428853317 pandas Sproc precommit test: https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPandasStoredProcPrecommitTest/185/console After this PR, when developers try to pip install source code or use tox, they need to make sure `protoc` is installed. When pip install from wheel file, no `protoc` is required since the generated code should exist in the wheel file.
sfc-gh-azhan
added a commit
that referenced
this issue
Nov 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1491199 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. install protoc for pandas jenknis job
sfc-gh-azhan
added a commit
that referenced
this issue
Nov 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1491199 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. This PR removed Protobuf generated code from source and add logic to generate them when it is need. This pull request includes the addition of a script to install `protoc` and updates to multiple GitHub Actions workflows to use this script. Additionally, there is an update to the `CODEOWNERS` file. * Added a new script `install_protoc.sh` to install `protoc` on GitHub Actions runners. This script handles downloading and installing the appropriate version of `protoc` based on the operating system and architecture. * Updated the `CODEOWNERS` file to add ownership for the `tests/unit/ast/` directory to the `@snowflakedb/snowpark-ir` team. Daily pandas precommit test: https://github.com/snowflakedb/snowpark-python/actions/runs/11645868643/job/32429396169 Daily notebook precommit test: https://github.com/snowflakedb/snowpark-python/actions/runs/11645871020 Daily precommit test: https://github.com/snowflakedb/snowpark-python/actions/runs/11645600647/job/32428853317 pandas Sproc precommit test: https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPandasStoredProcPrecommitTest/185/console After this PR, when developers try to pip install source code or use tox, they need to make sure `protoc` is installed. When pip install from wheel file, no `protoc` is required since the generated code should exist in the wheel file.
sfc-gh-azhan
added a commit
that referenced
this issue
Nov 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. This CP #2518 and #2549 to sync the protobuf generation with main branch and fix some paths. --------- Co-authored-by: Ovidiu Platon <ovidiu.platon@snowflake.com>
sfc-gh-helmeleegy
added a commit
that referenced
this issue
Nov 7, 2024
…2581) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1791556 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Use docstrings folder for Index methods and properties. Docstrings for DatetimeIndex and TimedeltaIndex will be treated similarly in follow-up PRs.
sfc-gh-oplaton
added a commit
that referenced
this issue
Nov 8, 2024
… utilities (#2552) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1491199 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Introduce functionality that supports the AST collection logic. Add `batch.py` and `utils.py` to `src/snowflake/snowpark/_internal/ast`. `AstBatch` supports the collection of sequences of statements within the `Session` object. Statements can be either `Assign` or `Eval`. `AstBatch` accumulates statements and can then flush them into a `Request` object. `ast/utils.py` is the workhorse of the AST collection functionality. The job of the functions in this module is to traverse Python object graphs that might be used as inputs to Snowpark APIs. The `build_*` and `fill_*` functions are the core of the AST extraction logic. Follow-up work: * Enable `pyright` checks for the `ast` directory. * Improve test coverage when bringing in the rest of the AST collection changes, and remove the `no cover` pragmas. --------- Co-authored-by: Arthur Zwiegincew <arthur.zwiegincew@snowflake.com> Co-authored-by: Eric Vandenberg <eric.vandenberg@snowflake.com> Co-authored-by: Hemit Shah <hemit.shah@snowflake.com> Co-authored-by: Leonhard Spiegelberg <leonhard.spiegelberg@snowflake.com> Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-azhan
pushed a commit
that referenced
this issue
Nov 8, 2024
… utilities (#2552) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1491199 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Introduce functionality that supports the AST collection logic. Add `batch.py` and `utils.py` to `src/snowflake/snowpark/_internal/ast`. `AstBatch` supports the collection of sequences of statements within the `Session` object. Statements can be either `Assign` or `Eval`. `AstBatch` accumulates statements and can then flush them into a `Request` object. `ast/utils.py` is the workhorse of the AST collection functionality. The job of the functions in this module is to traverse Python object graphs that might be used as inputs to Snowpark APIs. The `build_*` and `fill_*` functions are the core of the AST extraction logic. Follow-up work: * Enable `pyright` checks for the `ast` directory. * Improve test coverage when bringing in the rest of the AST collection changes, and remove the `no cover` pragmas. --------- Co-authored-by: Arthur Zwiegincew <arthur.zwiegincew@snowflake.com> Co-authored-by: Eric Vandenberg <eric.vandenberg@snowflake.com> Co-authored-by: Hemit Shah <hemit.shah@snowflake.com> Co-authored-by: Leonhard Spiegelberg <leonhard.spiegelberg@snowflake.com> Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-azhan
added a commit
that referenced
this issue
Nov 8, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1792741 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. We specify the version of `protoc` to make sure the generated python code works for all protobuf versions we claimed. Also add tests into CI.
sfc-gh-yzou
added a commit
that referenced
this issue
Nov 9, 2024
…mp table (#2587) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1791241 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. scoped read only table doesn't work with dynamic pivot in notebook today, investigation needed. We added a flag to turns the scoped temp read only table off for snowpark padnas. Follow up: https://snowflakecomputing.atlassian.net/browse/SNOW-1793002 enable the flag at server side for native app
sfc-gh-azhan
pushed a commit
that referenced
this issue
Nov 10, 2024
…mp table (#2587) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1791241 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. scoped read only table doesn't work with dynamic pivot in notebook today, investigation needed. We added a flag to turns the scoped temp read only table off for snowpark padnas. Follow up: https://snowflakecomputing.atlassian.net/browse/SNOW-1793002 enable the flag at server side for native app
sfc-gh-azhan
pushed a commit
that referenced
this issue
Nov 10, 2024
…mp table (#2587) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1791241 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. scoped read only table doesn't work with dynamic pivot in notebook today, investigation needed. We added a flag to turns the scoped temp read only table off for snowpark padnas. Follow up: https://snowflakecomputing.atlassian.net/browse/SNOW-1793002 enable the flag at server side for native app
sfc-gh-azhan
added a commit
that referenced
this issue
Nov 11, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1791994 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. This PR generates typed Python code from protobuf and make sure CI check types in precommit. Copilot summary: This pull request includes several changes to improve type checking, streamline the build process, and enhance code quality. The most important changes include the addition of `mypy-protobuf` for generating typed Python code from protobuf, updates to the build and pre-commit configurations, and improvements to type annotations across various files. ### Build and Dependency Management: * Added `mypy-protobuf` installation in `.github/scripts/install_protoc.sh` and `scripts/jenkins_regress.sh` to generate typed Python code from protobuf. [[1]](diffhunk://#diff-0899f49e438b021e67e050dba42758e54dfb9f4004c38353a7b0824f691e50a0R74-R77) [[2]](diffhunk://#diff-b69c9f6a77a63d66300141897cfd4872f2f6c2ae6762f52c85b8824d973246ebL21-R21) [[3]](diffhunk://#diff-6b37b6a78b44ea8880f92d1263744ef1e0df235bf59f1af110b37f1068344498L18-R18) * Updated `.github/workflows/precommit.yml` to install `tox` and streamline the documentation build process using `tox`. * Modified `setup.py` to include `mypy-protobuf` as a dependency and updated the protobuf generation command to use `--mypy_out`. [[1]](diffhunk://#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7R60) [[2]](diffhunk://#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7L84-R85) [[3]](diffhunk://#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7L118-R119) ### Pre-commit Configuration: * Added new files and dependencies to the `mypy` check in `.pre-commit-config.yaml`. [[1]](diffhunk://#diff-63a9c44a44acf85fea213a857769990937107cf072831e1a26808cfde9d096b9R94) [[2]](diffhunk://#diff-63a9c44a44acf85fea213a857769990937107cf072831e1a26808cfde9d096b9R107-R113) ### Code Quality and Type Annotations: * Enhanced type annotations in `src/snowflake/snowpark/_internal/ast/batch.py` by specifying return types and dictionary types. [[1]](diffhunk://#diff-8d22dcd969690348d0b396f6b73598ecd2e42e5ec20a503f518f444ec3c74311L67-R69) [[2]](diffhunk://#diff-8d22dcd969690348d0b396f6b73598ecd2e42e5ec20a503f518f444ec3c74311L87-R87) [[3]](diffhunk://#diff-8d22dcd969690348d0b396f6b73598ecd2e42e5ec20a503f518f444ec3c74311L105-R105) * Improved type checking in `src/snowflake/snowpark/_internal/ast/utils.py` by adding `# type: ignore` comments and TODOs for better handling of type issues. [[1]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL63-R63) [[2]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL83-R90) [[3]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL139-R142) [[4]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL151-R165) [[5]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL185-R185) [[6]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL194-R231) Tested Jenkins jobs: - https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPythonSnowflakePythonClientRegressRunner/799/console - https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPandasStoredProcPrecommitTest/193/console
sfc-gh-yzou
added a commit
that referenced
this issue
Nov 11, 2024
…dex behavior (#2564) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1518791 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. concat when axis = 1 succeeded when the dataframe/series comes from the same dataframe with an align behavior. However, today we uses join which gives a different result when duplication occurs. For example: with following dataframe ``` C A D 2 1 a 3 1 2 b 2 2 3 c 1 ``` pd.concat([df['C'], df['A']], axis = 1) gives ``` C A 2 1 a 1 2 b 2 3 c ``` but snowpark pandas returns ``` C A 2 1 a 2 1 c 1 2 b 2 3 a 2 3 c ``` In this pr, we fixed the behavior by switching to align behavior. Also relaxed the align utility to enable customized sorting behavior for support the sort behavior of concat
sfc-gh-helmeleegy
added a commit
that referenced
this issue
Nov 12, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1438001 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. Add support for list values in Series.str.len.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
feature
New feature or request
status-triage_done
Initial triage done, will be further handled by the driver team
Apply a function/callable to Snowpark DF
What is the current behavior?
What is the desired behavior?
df = df.transform(func1).transform(func2)
How would this improve
snowflake-snowpark-python
?Make Snowpark code more consistent
References, Other Background
.transform()
is available in pySparkhttps://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.DataFrame.transform.html
The text was updated successfully, but these errors were encountered: