Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Implement ibis.pandas.trace module to add time and call stack informa… #2233

Merged
merged 6 commits into from
Jun 9, 2020

Conversation

icexelloss
Copy link
Contributor

What is the change

This PR add a way to log time and call stack information for pandas backend.

The information can be enabled by turning logging level to DEBUG.

e.g.

import pandas as pd                                                                                                                                                                                                                                                                                                                                                       
import ibis.expr.datatypes as dt                                                                                                                                                                                                                                                                                                                                          
import ibis.pandas                                                                                                                                                                                                                                                                                                                                                        
from ibis.udf.vectorized import elementwise                                                                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                                                                                                          
import logging                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                          
logging.basicConfig(level=logging.DEBUG)                                                                                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                                                                                                          
df = pd.DataFrame(                                                                                                                                                                                                                                                                                                                                                        
    {                                                                                                                                                                                                                                                                                                                                                                     
        'a': [1, 2, 3]                                                                                                                                                                                                                                                                                                                                                    
    }                                                                                                                                                                                                                                                                                                                                                                     
)                                                                                                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                                                                          
con = ibis.pandas.connect({"table1": df})                                                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                                                          
@elementwise(                                                                                                                                                                                                                                                                                                                                                             
    input_type=[dt.double],                                                                                                                                                                                                                                                                                                                                               
    output_type=dt.double                                                                                                                                                                                                                                                                                                                                                 
)                                                                                                                                                                                                                                                                                                                                                                         
def add_one(v):                                                                                                                                                                                                                                                                                                                                                           
    import time                                                                                                                                                                                                                                                                                                                                                           
    time.sleep(5)                                                                                                                                                                                                                                                                                                                                                         
    return v + 1                                                                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                                                          
table = con.table("table1")                                                                                                                                                                                                                                                                                                                                               
table = table.mutate(b=add_one(table['a']))                                                                                                                                                                                                                                                                                                                               
table.execute()

Output:

DEBUG:root: main_execute Selection                                                                                                                                                                                                                                                                                                                                        
DEBUG:root:   execute_until_in_scope Selection                                                                                                                                                                                                                                                                                                                            
DEBUG:root:     execute_until_in_scope PandasTable                                                                                                                                                                                                                                                                                                                        
DEBUG:root:       execute_database_table_client PandasTable                                                                                                                                                                                                                                                                                                               
DEBUG:root:       execute_database_table_client PandasTable 0:00:00.000074                                                                                                                                                                                                                                                                                                
DEBUG:root:     execute_until_in_scope PandasTable 0:00:00.000396                                                                                                                                                                                                                                                                                                         
DEBUG:root:     execute_selection_dataframe Selection                                                                                                                                                                                                                                                                                                                     
DEBUG:root:     main_execute ElementWiseVectorizedUDF                                                                                                                                                                                                                                                                                                                     
DEBUG:root:       execute_until_in_scope ElementWiseVectorizedUDF                                                                                                                                                                                                                                                                                                         
DEBUG:root:         execute_until_in_scope TableColumn                                                                                                                                                                                                                                                                                                                    
DEBUG:root:           execute_until_in_scope PandasTable                                                                                                                                                                                                                                                                                                                  
DEBUG:root:           execute_until_in_scope PandasTable 0:00:00.000069                                                                                                                                                                                                                                                                                                   
DEBUG:root:           execute_table_column_df_or_df_groupby TableColumn                                                                                                                                                                                                                                                                                                   
DEBUG:root:           execute_table_column_df_or_df_groupby TableColumn 0:00:00.000324                                                                                                                                                                                                                                                                      
DEBUG:root:         execute_until_in_scope TableColumn 0:00:00.000608                                                                                                                                                                                                                                                                                                     
DEBUG:root:         execute_udf_node ElementWiseVectorizedUDF                                                                                                                                                                                                                                                                                                             
DEBUG:root:         execute_udf_node ElementWiseVectorizedUDF 0:00:05.014354                                                                                                                                                                                                                                                                                              
DEBUG:root:       execute_until_in_scope ElementWiseVectorizedUDF 0:00:05.015450                                                                                                                                                                                                                             
DEBUG:root:     main_execute ElementWiseVectorizedUDF 0:00:05.015659                                                                                                                                                                                                                                                                                                      
DEBUG:root:     execute_selection_dataframe Selection 0:00:05.017459                                                                                                                                                                                                                                                                                                      
DEBUG:root:   execute_until_in_scope Selection 0:00:05.018348                                                                                                                                                                                                                                                                                                             
DEBUG:root: main_execute Selection 0:00:05.020070     

How is the change tested

This is manually tested. If you have some ideas how to test this I am happy to hear.

@jreback
Copy link
Contributor

jreback commented Jun 4, 2020

looks pretty good

i think makes sense to hook this into config: https://github.com/ibis-project/ibis/blob/master/ibis/config_init.py

then i think u can instrument always and just hide it behind the flag
it could still go to logger

@icexelloss
Copy link
Contributor Author

Thanks @jreback. Make sense, let me hook this into the config.

@emilyreff7
Copy link
Contributor

LGTM

@icexelloss
Copy link
Contributor Author

@jreback Updated! Can you take another look please?

@jreback jreback added the pandas The pandas backend label Jun 8, 2020
@jreback jreback added this to the Next Bugfix Release milestone Jun 8, 2020
'enable_trace',
False,
"""
Whether enable tracing for pandas execution. See ibis.panads.trace for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp? here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't quite get that. What do you by sp?

return traced_func


class TraceDispatcher(Dispatcher):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this additional complexity here? can you just modify Dispatcher to add (if the config is on)? isn't that the point of using the option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dispatcher is from multipledispatch library..

@jreback
Copy link
Contributor

jreback commented Jun 8, 2020

lgtm @icexelloss

@jreback
Copy link
Contributor

jreback commented Jun 8, 2020

its going to run the CI again as conflict in the release note, but can merge on green.

@jreback jreback merged commit 5fd37e1 into ibis-project:master Jun 9, 2020
@jreback
Copy link
Contributor

jreback commented Jun 9, 2020

thanks @icexelloss
sp? == spelling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pandas The pandas backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants