Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1335071: add method DataFrame.transform #1400

Open
ChuliangXiao opened this issue Apr 18, 2024 · 1 comment
Open

SNOW-1335071: add method DataFrame.transform #1400

ChuliangXiao opened this issue Apr 18, 2024 · 1 comment
Labels
feature New feature or request status-triage_done Initial triage done, will be further handled by the driver team

Comments

@ChuliangXiao
Copy link
Contributor

ChuliangXiao commented Apr 18, 2024

Apply a function/callable to Snowpark DF

What is the current behavior?

df = func1(df)
df = func2(df)
# or
df = func2(func1(df))

What is the desired behavior?

df = df.transform(func1).transform(func2)

How would this improve snowflake-snowpark-python?

Make Snowpark code more consistent

References, Other Background

.transform() is available in pySpark
https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.DataFrame.transform.html

@ChuliangXiao ChuliangXiao added the feature New feature or request label Apr 18, 2024
@github-actions github-actions bot changed the title add method DataFrame.transform SNOW-1335071: add method DataFrame.transform Apr 18, 2024
@sfc-gh-aling
Copy link
Contributor

thanks for your feedback! we will look into supporting this transform function call.
cc: @sfc-gh-yixie @sfc-gh-jdu @sfc-gh-aalam this is for feature parity with pyspark.

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage_done Initial triage done, will be further handled by the driver team label Apr 29, 2024
sfc-gh-mvashishtha added a commit that referenced this issue May 4, 2024
)

Currently the template says "What GitHub issue is this PR addressing",
but we only want Jira numbers.

We should always add a Snowflake JIRA number, even if a GitHub issue
exists.

If a user creates a GitHub issue and wants to reference it in a PR, a
bot will create a SNOW-jira ticket for them, as in #1400.

---------
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
sfc-gh-vbudati pushed a commit that referenced this issue May 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

Move Snowpark pandas modin import changelog to 1.15
sfc-gh-vbudati pushed a commit that referenced this issue May 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

Move Snowpark pandas modin import changelog to 1.15
sfc-gh-vbudati added a commit that referenced this issue May 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1345607

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

   Fix README/md pip install command.
sfc-gh-vbudati added a commit that referenced this issue May 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1345607

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

   Fix README/md pip install command.
sfc-gh-rdurrani added a commit that referenced this issue May 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1357748

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR updates read_snowflake to use string matching for the order by
warning.
sfc-gh-nkrishna added a commit that referenced this issue May 8, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR adds double quotes to the pip install message users see when
installing Modin to accomodate for zsh.

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
sfc-gh-joshi added a commit that referenced this issue May 8, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1370365

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR avoids UNION ALL operations for computing quantiles over
1-column datasets. This optimization has significant implications for
`pd.qcut`, which frequently computes a large number of quantiles and
previously would had extremely high union counts in queries.
In particular, `test_qcut.py::test_qcut_two_columns` goes from 90 unions
-> 0 unions, 34 joins -> 14 joins; and
`series/test_quantile.py::test_quantile_large' goes from ~80 queries ->
6 queries.
sfc-gh-vbudati added a commit that referenced this issue May 9, 2024
)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1348621

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

   Bug:
```py
# Performing the following loc operation would fail.
>>> df = pd.DataFrame(
    {
        "one": pd.Series(np.random.randn(3), index=["a", "b", "c"]),
        "two": pd.Series(np.random.randn(4), index=["a", "b", "c", "d"]),
        "three": pd.Series(np.random.randn(3), index=["b", "c", "d"]),
    }
)
>>> df2 = df.copy()
>>> df2.loc["a", "three"] = 1.0

# However, when you take a closer look the issue is not with loc set but with the way the DataFrame was being generated. Ignore the numbers inside since these are randomly generated.
>>> df2
        one       two     three               # <-- notice how there are two rows of column names instead of one row
        one       two     three
a -0.238524  0.900504       NaN
b -1.603478 -0.715938  0.786343
c -0.603704 -1.046051  0.371374
d       NaN -0.019357  0.353722
NotImplementedError: loc set for multiindex is not yet implemented

# Expected result:
>>> df2
        one       two     three              # <-- only one row
a  0.357285 -1.225845       NaN
b  0.709229  1.120475  1.551948
c -2.173472  0.682472 -0.738533
d       NaN -1.211516  0.222008
```

This is a bug in concat on axis=1 when all the objects are Series.
sfc-gh-azhan added a commit that referenced this issue May 9, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1357611 Fix all quarantined pandas tests for 8.18

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

remove all skipped tests from SNOW-1358681
sfc-gh-azhan added a commit that referenced this issue May 9, 2024
…lready (#1547)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1348919

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

This bug has been fixed in
#1533
sfc-gh-nkrishna added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1296779, SNOW-1254730

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR removes unused sproc fallback code from Snowpark pandas, not
that all APIs using fallback have been replaced with
NotImplementedError.

---------

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
sfc-gh-azhan added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1347052 Update pandas API PuPr warning messages

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

- change the words from "private preview" to "public preview"

---------

Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-mvashishtha added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1374343

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

    CI  times for this PR:
-
[GCP](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800489292?pr=1553):
25 minutes
-
[AWS](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800489826?pr=1553):
20 minutes
-
[Azure](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800490421?pr=1553):
22 minutes

These are a bit better than the usual times, but it's hard to tell
because CI time is so variable (probably dependent on how many different
jobs are running at the same time-- see SNOW-1347210).

Let's make this commit and see whether the warehouses can handle the
extra load without too much queueing. We have set
`MAX_CONCURRENCY_LEVEL` on the warehouse for each cloud provider.
sfc-gh-nkumar added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1375037

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.
Rewrite parts of qcut implementation to avoid joins completely. This
results in significant performance improvement of qcut.
With this change overall runtime of benchmark notebook reduced from 297
seconds to 66 seconds and number of sql queries reduced from 521 to 91.
sfc-gh-vbudati added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1361200

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.
- The bug here is caused due to the implementation of
`create_udtf_for_groupby_apply`. Groupby apply/transform use this method
to create a UDTF.
- When multiple groupby apply/transform operations are performed on the
same DataFrame, this method is called multiple times. However, it uses a
fixed name for the column labels used to create the OrderedDataFrame
used with the UDTF. This is what causes the issue - "ambiguous column
name 'ROW_POSITION_WITHIN_GROUP'".
- "'ROW_POSITION_WITHIN_GROUP'" is a column label created and used by
`create_udtf_for_groupby_apply`.
- Similarly, this issue occurs with the "'ORIGINAL_ROW_POSITION'"column
label.
- To solve this issue, I appended a random number at the end of these
column labels to prevent the collision and eradicate the error.
sfc-gh-nkrishna added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1374306

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

   This PR adds a docstring for resample.

---------

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
sfc-gh-stan added a commit that referenced this issue May 10, 2024
…ock tests (#1561)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1063738

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR changes `tox -e local` to run all tests in `tests/integ`,
`tests/unit` and `tests/mock` (previously mock_unit) against Local
Testing (except for modin tests specified in
`SNOWFLAKE_PYTEST_IGNORE_MODIN_CMD`).
sfc-gh-vbudati added a commit that referenced this issue May 13, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1326280

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Update the links in the documentation for `to_snowpark_pandas` from the
LIMITED-ACCESS version to public! These links will not work until all of
the Snowpark pandas documentation is public.
sfc-gh-aalam added a commit that referenced this issue May 13, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.
sfc-gh-joshi added a commit that referenced this issue May 13, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1373790

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR prunes some duplicated + unsupported methods from Snowpark
pandas API docs, and adds more comprehensive docstrings (mostly copied
from pandas) for other methods. This PR does not add any doctests, nor
does it change any meaningful code.

All changes are listed below:
_De-duplicated listings_:
- Series
  - is_unique
  - duplicated
- DataFrame
  - round
Note that Series/DF head and tail are deliberately left duplicated; this
matches pandas documentation, as they are mentioned under both the
"Indexing, iteration" and "Reindexing / selection" headings.

_Removed unimplemented listing_:
- Series
  - kurtosis
- SeriesGroupBy
  - apply
- Resampler
  - groups
  - indices
  - get_group
  - apply
  - aggregate
  - transform
  - bfill
  - nearest
  - fillna
  - asfreq
  - nunique
  - first
  - last
  - interpolate
  - ohlc
  - pad
  - pipe
  - prod
  - quantile
  - sem
  - size
- Rolling
  - aggregate
  - apply
  - corr
  - count
  - cov
  - kurt
  - median
  - quantile
  - rank
  - sem
  - skew

_Added implemented listing_:
I did not comprehensively look for implemented methods that were not
listed, these were just a few methods that I noticed in the course of
checking other APIs.
- SeriesGroupBy
  - head
  - idxmax
  - idxmin
  - nunique
  - tail

_Improved documentation_:
- pd
- qcut (I'm not sure why it wasn't inheriting pandas docs, but we should
override them anyway since we don't implement all parameters)
- BasePandasDataset
  - convert_dtypes
  - rename_axis
  - values
  - ffill/pad
- Series
  - name
  - empty
  - hasnans
  - ndim
  - shape
  - rename_axis
  - quantile
- DataFrame
  - empty
  - quantile
  - select_dtypes
- GroupBy
  - std
  - var
  - rank
  - nunique
  - quantile
  - __iter__

---------

Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-lmukhopadhyay added a commit that referenced this issue May 14, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1373899

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Updates notebook testing workflow with increased cell timeout, and adds
SnowparkPandasAPIDemo.ipynb notebook from customer demo and
SnowflakeChainTesting.ipynb which was previously blocked.

---------

Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
sfc-gh-oplaton added a commit that referenced this issue May 14, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-0

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [x] I am adding a new dependency

3. Please describe how your code solves the related issue.

Update `ast_pb2.py` (already present in the repository).
Add the `setuptools` dependencies required for development.
Include the module path for `ast_pb2.py` in the manifest, so that the
file makes it into the Snowpark wheel.
sfc-gh-vbudati added a commit that referenced this issue May 14, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1375263

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Updating the changelog to reflect what was actually released with
v1.15.0a1 and what is new.
sfc-gh-rdurrani added a commit that referenced this issue Oct 29, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1748174

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Add support for `size` in `groupby.agg` .
sfc-gh-jdu added a commit that referenced this issue Oct 29, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1770289

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

   We don't have such an test before. The generated query is like:
```
WITH SNOWPARK_TEMP_CTE_N7V3KVPKOM AS (with t as (select 1 as A) select * from t) SELECT count(1) AS "COUNT(LITERAL())" FROM ( SELECT  *  FROM ( SELECT  *  FROM ( SELECT "A" FROM (( SELECT  *  FROM (SNOWPARK_TEMP_CTE_N7V3KVPKOM)) UNION ( SELECT  *  FROM (SNOWPARK_TEMP_CTE_N7V3KVPKOM)))) WHERE True :: BOOLEAN) ORDER BY "A" ASC NULLS FIRST) LIMIT 1
```
sfc-gh-helmeleegy added a commit that referenced this issue Oct 30, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1773962

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

   Fix changelog for 1.25.0.
sfc-gh-jdu added a commit that referenced this issue Oct 31, 2024
…ntifiers (#2526)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1748403

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

We can use Aggregare.aggregate_expressions directly for quoted
identifiers
sfc-gh-helmeleegy added a commit that referenced this issue Oct 31, 2024
…le/read_html/read_xml (#2540)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1748104,SNOW-1748107, SNOW-1748108

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

   Add support for pd.read_pickle/read_html/read_xml.
sfc-gh-joshi added a commit that referenced this issue Oct 31, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1652384

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

`test_to_numpy` previously did not actually test `to_numpy`, and instead
ran `Series.to_list`. This PR properly uses `Series.to_numpy`, and adds
a new test for `Series.to_list`.
sfc-gh-jdu added a commit that referenced this issue Nov 1, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1764136

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.
sfc-gh-azhan added a commit that referenced this issue Nov 5, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1491199

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

Make sure protobuf version compatible with streamlit and snowbook.
sfc-gh-lninobrijaldo pushed a commit that referenced this issue Nov 5, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   ADHOC: Fix a misspelling

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-joshi added a commit that referenced this issue Nov 5, 2024
… general functions (#2532)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1646980

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

This PR allows separate declaration of docstrings in
`modin/plugin/docstrings/io.py` and `modin/plugin/docstrings/general.py`
for functions defined in the respective override files. Override
functions in these namespaces should have the `_inherit_docstrings`
deocrator attached; see
`modin/plugin/extensions/{io_overrides,general_overrides}.py` for
examples of how to do this.

`pd.read_excel` had an identical frontend implementation to upstream
modin; with this docs change we can now override docstrings for
top-level functions defined upstream, so this has been removed from our
codebase. I've verified that the output of `help(pd.read_excel)` and
generated documentation for the function have not changed.
sfc-gh-jdu added a commit that referenced this issue Nov 6, 2024
…#2567)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1786772

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

   We can track the lifetime of a temp table then
sfc-gh-yzou added a commit that referenced this issue Nov 6, 2024
…r x (#2568)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1764119

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

np.where with scalar x today requires 
1) temp table creation for broadcasting the scalar x to shape of cond
2) a join when doing pandas where

However, for such case, there should be no need of the extra temp table
creation and join. In this change we removes the unnecessary temp table
creation and join by using the indexing for scalar cast.
sfc-gh-lninobrijaldo pushed a commit that referenced this issue Nov 6, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   ADHOC: Fix a misspelling

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-lninobrijaldo pushed a commit that referenced this issue Nov 6, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   ADHOC: Fix a misspelling

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-jdu added a commit that referenced this issue Nov 6, 2024
)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1786772

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

The original atexit order is wrong because SnowflakeConnection from
connector will be called before Snowpark session, so when we close the
connection, it is already closed
sfc-gh-azhan added a commit that referenced this issue Nov 6, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1491199

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

This PR removed Protobuf generated code from source and add logic to
generate them when it is need.
This pull request includes the addition of a script to install `protoc`
and updates to multiple GitHub Actions workflows to use this script.
Additionally, there is an update to the `CODEOWNERS` file.

### Installation and usage of `protoc`:
* Added a new script `install_protoc.sh` to install `protoc` on GitHub
Actions runners. This script handles downloading and installing the
appropriate version of `protoc` based on the operating system and
architecture.

### Code ownership:
* Updated the `CODEOWNERS` file to add ownership for the
`tests/unit/ast/` directory to the `@snowflakedb/snowpark-ir` team.

### Test passed
Daily pandas precommit test:
https://github.com/snowflakedb/snowpark-python/actions/runs/11645868643/job/32429396169
Daily notebook precommit test:
https://github.com/snowflakedb/snowpark-python/actions/runs/11645871020
Daily precommit test:
https://github.com/snowflakedb/snowpark-python/actions/runs/11645600647/job/32428853317
pandas Sproc precommit test:
https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPandasStoredProcPrecommitTest/185/console

### Development Behavior Change
After this PR, when developers try to pip install source code or use
tox, they need to make sure `protoc` is installed. When pip install from
wheel file, no `protoc` is required since the generated code should
exist in the wheel file.
sfc-gh-azhan added a commit that referenced this issue Nov 6, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1491199

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

This PR removed Protobuf generated code from source and add logic to
generate them when it is need.
This pull request includes the addition of a script to install `protoc`
and updates to multiple GitHub Actions workflows to use this script.
Additionally, there is an update to the `CODEOWNERS` file.

* Added a new script `install_protoc.sh` to install `protoc` on GitHub
Actions runners. This script handles downloading and installing the
appropriate version of `protoc` based on the operating system and
architecture.

* Updated the `CODEOWNERS` file to add ownership for the
`tests/unit/ast/` directory to the `@snowflakedb/snowpark-ir` team.

Daily pandas precommit test:
https://github.com/snowflakedb/snowpark-python/actions/runs/11645868643/job/32429396169
Daily notebook precommit test:
https://github.com/snowflakedb/snowpark-python/actions/runs/11645871020
Daily precommit test:
https://github.com/snowflakedb/snowpark-python/actions/runs/11645600647/job/32428853317
pandas Sproc precommit test:
https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPandasStoredProcPrecommitTest/185/console

After this PR, when developers try to pip install source code or use
tox, they need to make sure `protoc` is installed. When pip install from
wheel file, no `protoc` is required since the generated code should
exist in the wheel file.
sfc-gh-azhan added a commit that referenced this issue Nov 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1491199

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

install protoc for pandas jenknis job
sfc-gh-azhan added a commit that referenced this issue Nov 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1491199

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

This PR removed Protobuf generated code from source and add logic to
generate them when it is need.
This pull request includes the addition of a script to install `protoc`
and updates to multiple GitHub Actions workflows to use this script.
Additionally, there is an update to the `CODEOWNERS` file.

* Added a new script `install_protoc.sh` to install `protoc` on GitHub
Actions runners. This script handles downloading and installing the
appropriate version of `protoc` based on the operating system and
architecture.

* Updated the `CODEOWNERS` file to add ownership for the
`tests/unit/ast/` directory to the `@snowflakedb/snowpark-ir` team.

Daily pandas precommit test:
https://github.com/snowflakedb/snowpark-python/actions/runs/11645868643/job/32429396169
Daily notebook precommit test:
https://github.com/snowflakedb/snowpark-python/actions/runs/11645871020
Daily precommit test:
https://github.com/snowflakedb/snowpark-python/actions/runs/11645600647/job/32428853317
pandas Sproc precommit test:
https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPandasStoredProcPrecommitTest/185/console

After this PR, when developers try to pip install source code or use
tox, they need to make sure `protoc` is installed. When pip install from
wheel file, no `protoc` is required since the generated code should
exist in the wheel file.
sfc-gh-azhan added a commit that referenced this issue Nov 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

This CP #2518 and #2549 to sync the protobuf generation with main branch
and fix some paths.

---------

Co-authored-by: Ovidiu Platon <ovidiu.platon@snowflake.com>
sfc-gh-helmeleegy added a commit that referenced this issue Nov 7, 2024
…2581)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1791556

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Use docstrings folder for Index methods and properties. Docstrings for
DatetimeIndex and TimedeltaIndex will be treated similarly in follow-up
PRs.
sfc-gh-oplaton added a commit that referenced this issue Nov 8, 2024
… utilities (#2552)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1491199

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Introduce functionality that supports the AST collection logic.
Add `batch.py` and `utils.py` to `src/snowflake/snowpark/_internal/ast`.

`AstBatch` supports the collection of sequences of statements within the
`Session` object. Statements can be either `Assign` or `Eval`.
`AstBatch` accumulates statements and can then flush them into a
`Request` object.

`ast/utils.py` is the workhorse of the AST collection functionality. The
job of the functions in this module is to traverse Python object graphs
that might be used as inputs to Snowpark APIs. The `build_*` and
`fill_*` functions are the core of the AST extraction logic.

Follow-up work:

* Enable `pyright` checks for the `ast` directory.
* Improve test coverage when bringing in the rest of the AST collection
changes, and remove the `no cover` pragmas.

---------

Co-authored-by: Arthur Zwiegincew <arthur.zwiegincew@snowflake.com>
Co-authored-by: Eric Vandenberg <eric.vandenberg@snowflake.com>
Co-authored-by: Hemit Shah <hemit.shah@snowflake.com>
Co-authored-by: Leonhard Spiegelberg <leonhard.spiegelberg@snowflake.com>
Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-azhan pushed a commit that referenced this issue Nov 8, 2024
… utilities (#2552)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1491199

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Introduce functionality that supports the AST collection logic.
Add `batch.py` and `utils.py` to `src/snowflake/snowpark/_internal/ast`.

`AstBatch` supports the collection of sequences of statements within the
`Session` object. Statements can be either `Assign` or `Eval`.
`AstBatch` accumulates statements and can then flush them into a
`Request` object.

`ast/utils.py` is the workhorse of the AST collection functionality. The
job of the functions in this module is to traverse Python object graphs
that might be used as inputs to Snowpark APIs. The `build_*` and
`fill_*` functions are the core of the AST extraction logic.

Follow-up work:

* Enable `pyright` checks for the `ast` directory.
* Improve test coverage when bringing in the rest of the AST collection
changes, and remove the `no cover` pragmas.

---------

Co-authored-by: Arthur Zwiegincew <arthur.zwiegincew@snowflake.com>
Co-authored-by: Eric Vandenberg <eric.vandenberg@snowflake.com>
Co-authored-by: Hemit Shah <hemit.shah@snowflake.com>
Co-authored-by: Leonhard Spiegelberg <leonhard.spiegelberg@snowflake.com>
Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-azhan added a commit that referenced this issue Nov 8, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1792741

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

We specify the version of `protoc` to make sure the generated python
code works for all protobuf versions we claimed. Also add tests into CI.
sfc-gh-yzou added a commit that referenced this issue Nov 9, 2024
…mp table (#2587)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1791241

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.
scoped read only table doesn't work with dynamic pivot in notebook
today, investigation needed.
We added a flag to turns the scoped temp read only table off for
snowpark padnas.

Follow up: https://snowflakecomputing.atlassian.net/browse/SNOW-1793002
enable the flag at server side for native app
sfc-gh-azhan pushed a commit that referenced this issue Nov 10, 2024
…mp table (#2587)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1791241

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.
scoped read only table doesn't work with dynamic pivot in notebook
today, investigation needed.
We added a flag to turns the scoped temp read only table off for
snowpark padnas.

Follow up: https://snowflakecomputing.atlassian.net/browse/SNOW-1793002
enable the flag at server side for native app
sfc-gh-azhan pushed a commit that referenced this issue Nov 10, 2024
…mp table (#2587)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1791241

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.
scoped read only table doesn't work with dynamic pivot in notebook
today, investigation needed.
We added a flag to turns the scoped temp read only table off for
snowpark padnas.

Follow up: https://snowflakecomputing.atlassian.net/browse/SNOW-1793002
enable the flag at server side for native app
sfc-gh-azhan added a commit that referenced this issue Nov 11, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1791994

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

This PR generates typed Python code from protobuf and make sure CI check
types in precommit.

Copilot summary:
This pull request includes several changes to improve type checking,
streamline the build process, and enhance code quality. The most
important changes include the addition of `mypy-protobuf` for generating
typed Python code from protobuf, updates to the build and pre-commit
configurations, and improvements to type annotations across various
files.

### Build and Dependency Management:
* Added `mypy-protobuf` installation in
`.github/scripts/install_protoc.sh` and `scripts/jenkins_regress.sh` to
generate typed Python code from protobuf.
[[1]](diffhunk://#diff-0899f49e438b021e67e050dba42758e54dfb9f4004c38353a7b0824f691e50a0R74-R77)
[[2]](diffhunk://#diff-b69c9f6a77a63d66300141897cfd4872f2f6c2ae6762f52c85b8824d973246ebL21-R21)
[[3]](diffhunk://#diff-6b37b6a78b44ea8880f92d1263744ef1e0df235bf59f1af110b37f1068344498L18-R18)
* Updated `.github/workflows/precommit.yml` to install `tox` and
streamline the documentation build process using `tox`.
* Modified `setup.py` to include `mypy-protobuf` as a dependency and
updated the protobuf generation command to use `--mypy_out`.
[[1]](diffhunk://#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7R60)
[[2]](diffhunk://#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7L84-R85)
[[3]](diffhunk://#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7L118-R119)

### Pre-commit Configuration:
* Added new files and dependencies to the `mypy` check in
`.pre-commit-config.yaml`.
[[1]](diffhunk://#diff-63a9c44a44acf85fea213a857769990937107cf072831e1a26808cfde9d096b9R94)
[[2]](diffhunk://#diff-63a9c44a44acf85fea213a857769990937107cf072831e1a26808cfde9d096b9R107-R113)

### Code Quality and Type Annotations:
* Enhanced type annotations in
`src/snowflake/snowpark/_internal/ast/batch.py` by specifying return
types and dictionary types.
[[1]](diffhunk://#diff-8d22dcd969690348d0b396f6b73598ecd2e42e5ec20a503f518f444ec3c74311L67-R69)
[[2]](diffhunk://#diff-8d22dcd969690348d0b396f6b73598ecd2e42e5ec20a503f518f444ec3c74311L87-R87)
[[3]](diffhunk://#diff-8d22dcd969690348d0b396f6b73598ecd2e42e5ec20a503f518f444ec3c74311L105-R105)
* Improved type checking in
`src/snowflake/snowpark/_internal/ast/utils.py` by adding `# type:
ignore` comments and TODOs for better handling of type issues.
[[1]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL63-R63)
[[2]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL83-R90)
[[3]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL139-R142)
[[4]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL151-R165)
[[5]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL185-R185)
[[6]](diffhunk://#diff-546ece6125c3622c5002e40620d5a4508bc40bd95e097f7261b3c82d361ec3ffL194-R231)

Tested Jenkins jobs:
-
https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPythonSnowflakePythonClientRegressRunner/799/console
-
https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPandasStoredProcPrecommitTest/193/console
sfc-gh-yzou added a commit that referenced this issue Nov 11, 2024
…dex behavior (#2564)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1518791

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.
concat when axis = 1 succeeded when the dataframe/series comes from the
same dataframe with an align behavior. However, today we uses join which
gives a different result when duplication occurs.
For example: with following dataframe
```
    C  A  D
2  1  a  3
1  2  b  2
2  3  c  1
```
pd.concat([df['C'], df['A']], axis = 1) gives 
```
    C  A
2  1  a
1  2  b
2  3  c
```
but snowpark pandas returns 
```
     C  A
2  1  a
2  1  c
1  2  b
2  3  a
2  3  c
```
In this pr, we fixed the behavior by switching to align behavior.
Also relaxed the align utility to enable customized sorting behavior for
support the sort behavior of concat
sfc-gh-helmeleegy added a commit that referenced this issue Nov 12, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1438001

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [ ] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k)

3. Please describe how your code solves the related issue.

   Add support for list values in Series.str.len.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

3 participants