Supporting dataframe with integer columns #203

dorisjlee · 2021-01-07T13:54:37Z

Addressing #202

modified all string concatenations to fstrings
fixed broken cases for taking in clause attribute of integer type
improved __repr__ for Vis/VisList edge cases

* Similarity as a default action (#182) * similarity formatting fixed * added another similarity test case; fixed bug where colored heatmap dimension is temporal (invalidate all 2 msr 1 temporal case) * filter and similarity together * filter and similarity together * remove filter * black line length * file reorg and clean; change sim metric Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * bump numpy min version for travis * Special character issue (#184) * rename col * broken * fixed period replacement bug * add tests * refine tests * refine tests * remove cols * fix tests * add agg * fixed tests * clean up PR Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Colored bar interestingness bug (#189) * rewrote chi2 contingency with pd.crosstab * catching KeyError issue with chi2 contingency * padding interestingness with warning instead of error * interestingness now reuses ndim and nmsr computed in Compiler * bug fix for parser with int values * improve Vis repr to better display inferred intent when data is absent but fully compiled intent (all clauses) * Add sampling parameters as a global config (#192) * update export tutorial to add explanation for standalone argument * minor fixes and remove cell output in notebooks * added contributing doc * fix bugs and uncomment some tests * remove raise warning * remove unnecessary import * split up rename test into two parts * fix setting warning, fix data_type bugs and add relevant tests * remove ordinal data type * add test for small dataframe resetting index * add loc and iloc tests * fix attribute access directly to dataframe * add small changes to code * added test for qcut and cut * add check if dtype is Interval * added qcut test * fix Record KeyError * add tests * take care of reset_index case * small edits * add data_model to column_group Clause * small edits for row_group * fixes to row group * add config for start and cap for samples * finish sampling config and tests * black formatting * add documentation for sampling config * remove small added issues * minor changes to docs * implement heatmap flag and add tests * black formatting and documentation edits Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Coalesce all data_type attributes of frame into one (#185) * coalesce data_types into data_type_lookup * black reformat * changed to better variable names * lux not defined error * fixed * black format * Update CONTRIBUTING.md * Bug Fix: User-provided Index causes KeyError in Pandas Execution (#191) * Moved Executor Parameters to Global Config * Black formatting * Moved table_name parameter to frame.py. Removed executor_type parameter executor_type parameter no longer necessary to maintain * Fixed reference to table_name parameter table_name is now a parameter within frame.py * Adjusted Functions to Set SQL Connection Moved set_SQL_connection function to config. Added set_SQL_table function within frame.py to let users specify which database table will be associated with their dataframe * Update SQLExecutor name parameter * Fix Executor Reference Update current_vis() to reference lux.config.executor * Update frame.py * Moved set functions to global config * Fixed Index Issue in Pandas Executor Issue caused when user sets an index. The Pandas Executor was not correctly renaming this new index column to Record in execute_aggregate() * Added tests for set_index functions * Black formatting * Update Pandas Executor to handle NA values Readded missing dropna parameter within execute_aggregate() groupby function call * Updated Pandas Coverage Tests Commented out set_index case which has not been addressed yet * Black Formatting * Update to Pandas Executor Index Handling Cleaned up how execute_aggregrate renames index columns. Now retrieves the index name from vis.data instead of filtering out non-index columns. Created separate test function for when user specifies an index in read_csv. Co-authored-by: 19thyneb <thyne.boonmark@gmail.com> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Initialize Config once only during __init__ (#194) * basic matplotlib chart example * migrate register default action to init * config class * move actions * fixed tests * changes * alright * fix plot_config * black reformat * black reformat Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu> * Update README.md * Series Bugfix for describe and convert_dtypes (#197) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * Update Lux Docs (#195) * add black to travis * reformat all code and adjust test * remove .idea * fix contributing doc * small change in contributing * update * reformat, update command to fix version * remove dev dependencies * first pass -- inline comments * _config/config.py * delete test notebook * action * line length 105 * executor * interestingness * processor * vislib * tests, travis, CONTRIBUTING * .format () changed * replace tabs with escape chars * update using black * more rewrites and merges into single line * update pyproject.toml and makefile * coalesce data_types into data_type_lookup * black reformat * changed to better variable names * lux not defined error * fixed * black format * config doc updated * fix link for executor * more links * fixed overview * more links fixed * pandas methods no longer included * updates to some docstrings * black reformat * minor fixes * minor fix Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Supporting dataframe with integer columns (#203) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * various fixes to support int columns * fixed merge conflict issues. vis.data shows None DF. * Merge master into sql-engine + minor mergeconflict fixes * Removing the PYNB * Cleaning up obsolete code Co-authored-by: Caitlyn Chen <caitlynachen@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> Co-authored-by: Kunal Agarwal <32151899+westernguy2@users.noreply.github.com> Co-authored-by: jinimukh <46768380+jinimukh@users.noreply.github.com> Co-authored-by: thyneb19 <thyneboonmark@berkeley.edu> Co-authored-by: 19thyneb <thyne.boonmark@gmail.com> Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu>

* Similarity as a default action (#182) * similarity formatting fixed * added another similarity test case; fixed bug where colored heatmap dimension is temporal (invalidate all 2 msr 1 temporal case) * filter and similarity together * filter and similarity together * remove filter * black line length * file reorg and clean; change sim metric Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * bump numpy min version for travis * Special character issue (#184) * rename col * broken * fixed period replacement bug * add tests * refine tests * refine tests * remove cols * fix tests * add agg * fixed tests * clean up PR Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Colored bar interestingness bug (#189) * rewrote chi2 contingency with pd.crosstab * catching KeyError issue with chi2 contingency * padding interestingness with warning instead of error * interestingness now reuses ndim and nmsr computed in Compiler * bug fix for parser with int values * improve Vis repr to better display inferred intent when data is absent but fully compiled intent (all clauses) * Add sampling parameters as a global config (#192) * update export tutorial to add explanation for standalone argument * minor fixes and remove cell output in notebooks * added contributing doc * fix bugs and uncomment some tests * remove raise warning * remove unnecessary import * split up rename test into two parts * fix setting warning, fix data_type bugs and add relevant tests * remove ordinal data type * add test for small dataframe resetting index * add loc and iloc tests * fix attribute access directly to dataframe * add small changes to code * added test for qcut and cut * add check if dtype is Interval * added qcut test * fix Record KeyError * add tests * take care of reset_index case * small edits * add data_model to column_group Clause * small edits for row_group * fixes to row group * add config for start and cap for samples * finish sampling config and tests * black formatting * add documentation for sampling config * remove small added issues * minor changes to docs * implement heatmap flag and add tests * black formatting and documentation edits Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Coalesce all data_type attributes of frame into one (#185) * coalesce data_types into data_type_lookup * black reformat * changed to better variable names * lux not defined error * fixed * black format * Update CONTRIBUTING.md * Bug Fix: User-provided Index causes KeyError in Pandas Execution (#191) * Moved Executor Parameters to Global Config * Black formatting * Moved table_name parameter to frame.py. Removed executor_type parameter executor_type parameter no longer necessary to maintain * Fixed reference to table_name parameter table_name is now a parameter within frame.py * Adjusted Functions to Set SQL Connection Moved set_SQL_connection function to config. Added set_SQL_table function within frame.py to let users specify which database table will be associated with their dataframe * Update SQLExecutor name parameter * Fix Executor Reference Update current_vis() to reference lux.config.executor * Update frame.py * Moved set functions to global config * Fixed Index Issue in Pandas Executor Issue caused when user sets an index. The Pandas Executor was not correctly renaming this new index column to Record in execute_aggregate() * Added tests for set_index functions * Black formatting * Update Pandas Executor to handle NA values Readded missing dropna parameter within execute_aggregate() groupby function call * Updated Pandas Coverage Tests Commented out set_index case which has not been addressed yet * Black Formatting * Update to Pandas Executor Index Handling Cleaned up how execute_aggregrate renames index columns. Now retrieves the index name from vis.data instead of filtering out non-index columns. Created separate test function for when user specifies an index in read_csv. Co-authored-by: 19thyneb <thyne.boonmark@gmail.com> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Initialize Config once only during __init__ (#194) * basic matplotlib chart example * migrate register default action to init * config class * move actions * fixed tests * changes * alright * fix plot_config * black reformat * black reformat Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu> * Update README.md * Series Bugfix for describe and convert_dtypes (#197) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * Update Lux Docs (#195) * add black to travis * reformat all code and adjust test * remove .idea * fix contributing doc * small change in contributing * update * reformat, update command to fix version * remove dev dependencies * first pass -- inline comments * _config/config.py * delete test notebook * action * line length 105 * executor * interestingness * processor * vislib * tests, travis, CONTRIBUTING * .format () changed * replace tabs with escape chars * update using black * more rewrites and merges into single line * update pyproject.toml and makefile * coalesce data_types into data_type_lookup * black reformat * changed to better variable names * lux not defined error * fixed * black format * config doc updated * fix link for executor * more links * fixed overview * more links fixed * pandas methods no longer included * updates to some docstrings * black reformat * minor fixes * minor fix Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Supporting dataframe with integer columns (#203) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * various fixes to support int columns * fixed merge conflict issues. vis.data shows None DF. * Override Pandas DataFrames created from I/O pandas operations (#207) * update export tutorial to add explanation for standalone argument * minor fixes and remove cell output in notebooks * added contributing doc * fix bugs and uncomment some tests * remove raise warning * remove unnecessary import * split up rename test into two parts * fix setting warning, fix data_type bugs and add relevant tests * remove ordinal data type * add test for small dataframe resetting index * add loc and iloc tests * fix attribute access directly to dataframe * add small changes to code * added test for qcut and cut * add check if dtype is Interval * added qcut test * fix Record KeyError * add tests * take care of reset_index case * small edits * add data_model to column_group Clause * small edits for row_group * fixes to row group * add config for start and cap for samples * finish sampling config and tests * black formatting * add documentation for sampling config * remove small added issues * minor changes to docs * implement heatmap flag and add tests * black formatting and documentation edits * add pd.io equalities for DataFrames Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Merge master into sql-engine + minor mergeconflict fixes * Removing the PYNB * Cleaning up obsolete code * Configuration for topk and sort order (#206) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * various fixes to support int columns * skip series vis for df.iterrows series element * config setting for modifying top K and sorting * note about regenerated config * Version lock for jupyter-client (#211) * move to single requirements-dev without lux-widget install manually * pin jedi version * pin jupyter-client version * add back old travis and requirement-dev * Mixed dtype issue (#205) * coalesce data_types into data_type_lookup * merge fixed * merge conflicts * add warning and suggestion on how to fix * formatting for warnings version * change to internal data * legibility update * test added * update test * test updated * xlrd in dev reqs * black * update link * changes to test logic, minor string format for warning Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Fixes issue where value_counts was not returning LuxSeries (#210) * add series equality and value counts test * black formatting * fix old value counts test instead * minor fix Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * bump version * update README Co-authored-by: Caitlyn Chen <caitlynachen@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> Co-authored-by: Kunal Agarwal <32151899+westernguy2@users.noreply.github.com> Co-authored-by: jinimukh <46768380+jinimukh@users.noreply.github.com> Co-authored-by: thyneb19 <thyneboonmark@berkeley.edu> Co-authored-by: 19thyneb <thyne.boonmark@gmail.com> Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu>

dorisjlee added 6 commits January 6, 2021 12:02

bugfix for describe and convert_dtypes

c1944a2

added back metadata series test

5c8b284

black

49daeec

default to pandas display when df.dtypes printed

801b469

various fixes to support int columns

a8ab02e

merge upstream/master

7a203df

dorisjlee merged commit 3393b9f into lux-org:master Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting dataframe with integer columns #203

Supporting dataframe with integer columns #203

dorisjlee commented Jan 7, 2021 •

edited

Loading

Supporting dataframe with integer columns #203

Supporting dataframe with integer columns #203

Conversation

dorisjlee commented Jan 7, 2021 • edited Loading

dorisjlee commented Jan 7, 2021 •

edited

Loading