Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-664933: session.create_dataframe() crashes when the Pandas dataframe is empty. #503

Closed
elongl opened this issue Sep 19, 2022 · 2 comments
Assignees
Labels
bug Something isn't working needs triage Initial RCA is required

Comments

@elongl
Copy link

elongl commented Sep 19, 2022

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.8

  1. What operating system and processor architecture are you using?

Snowflake Python UDF

  1. What are the component versions in the environment (pip freeze)?

Only snowflake-snowpark-python

  1. What did you do?

Using session.create_dataframe() does not work with an empty pandas.DataFrame object.

Here's the full code:

orders_table = session.table('orders')
pandas_df = orders_table.to_pandas()
filtered_pandas_df = pandas_df.loc[pandas_df['CUSTOMER_ID'] == '<DOESNT_EXIST>']
# filtered_df is an empty Pandas dataframe.
snowpark_df = session.create_dataframe(filtered_pandas_df)
# Here I got an exception.

Here it is:

File ~/work/venvs/snowpark/lib/python3.8/site-packages/snowflake/snowpark/session.py:1066, in Session.write_pandas(self, df, table_nam
e, database, schema, chunk_size, compression, on_error, parallel, quote_identifiers, auto_create_table, create_temp_table, overwrite)
   1060     else:                                                                                                                     
   1061         location = (                                       
   1062             (database + "." if database else "")                                                                              
   1063             + (schema + "." if schema else "")
   1064             + (table_name)                                 
   1065         )                                                                                                                     
-> 1066     success, nchunks, nrows, ci_output = write_pandas(                                                                        
   1067         self._conn._conn,
   1068         df,            
   1069         table_name,                                                                                                           
   1070         database=database,                                                                                                    
   1071         schema=schema,   
   1072         chunk_size=chunk_size,                                                                                                
   1073         compression=compression,                                                                                              
   1074         on_error=on_error,                                                                                                    
   1075         parallel=parallel,                                                                                                    
   1076         quote_identifiers=quote_identifiers,                                                                                  
   1077         auto_create_table=auto_create_table,                                                                                  
   1078         create_temp_table=create_temp_table,
   1079         overwrite=overwrite,                                                                                                  
   1080     )                                                      
   1081 except ProgrammingError as pe:                             
   1082     if pe.msg.endswith("does not exist"):
                                                                                                                                      
File ~/work/venvs/snowpark/lib/python3.8/site-packages/snowflake/connector/pandas_tools.py:183, in write_pandas(conn, df, table_name, 
database, schema, chunk_size, compression, on_error, parallel, quote_identifiers, auto_create_table, create_temp_table, overwrite, tab
le_type)                                                           
    180         raise                                                                                                                 
    182 with TemporaryDirectory() as tmp_folder:                   
--> 183     for i, chunk in chunk_helper(df, chunk_size):                                                                             
    184         chunk_path = os.path.join(tmp_folder, f"file{i}.txt")      
    185         # Dump chunk into parquet file                                                                                        
                                 
File ~/work/venvs/snowpark/lib/python3.8/site-packages/snowflake/connector/pandas_tools.py:37, in chunk_helper(lst, n)
     35 def chunk_helper(lst: T, n: int) -> Iterator[tuple[int, T]]:
     36     """Helper generator to chunk a sequence efficiently with current index like if enumerate was called on sequence."""       
---> 37     for i in range(0, len(lst), n):
     38         yield int(i / n), lst[i : i + n]                                                                                      
                                 
ValueError: range() arg 3 must not be zero     
  1. What did you expect to see?

I expected to get an empty snowflake.snowpark.DataFrame with the defined columns so that I'd be able to write it to a table with no rows.

@elongl elongl added bug Something isn't working needs triage Initial RCA is required labels Sep 19, 2022
@github-actions github-actions bot changed the title session.create_dataframe() crashes when the Pandas dataframe is empty. SNOW-664933: session.create_dataframe() crashes when the Pandas dataframe is empty. Sep 19, 2022
@sfc-gh-jdu
Copy link
Collaborator

sfc-gh-jdu commented Sep 21, 2022

@elongl Thanks for your feedback! Ack it's a bug and we'll look into it. Meanwhile, if you want to have an empty Snowpark DataFrame with the defined columns, you can do:

schema = StructType(
    [StructField("a", IntegerType()), StructField("b", IntegerType())]
)
df = session.create_dataframe([], schema=schema)

cc @sfc-gh-stan

@elongl
Copy link
Author

elongl commented Sep 21, 2022

@sfc-gh-jdu @sfc-gh-stan

Thank you very much for addressing the issue.
Did you get a chance to look at the other one I created?
It's a bit more important I believe.

Here it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Initial RCA is required
Projects
None yet
Development

No branches or pull requests

4 participants