Small improvements in the Benford law tests
Fixing some bugs in check_sequence()
No new features, some refactoring and bug fixes of fuzzy matching and clean_string()
No changes, tweaking versions in pyproject.toml and requirements.txt due to dependabot warning
- Added lookup_values() , a sort of xlookup() for odd use cases e.g. Airtables
which includes linked tables as list with one (or more) elements
- Changed requirements to the latest version of everything (pandas etc). This is due to Github actions not passing the test there while it is passing locally, so I will try to run in latest version and fix whatever GH doesn't like. As soon as I sort out why I will pin a particular version.
- Fixed check_blanks() and coalesce_values(), some small refactoring and enhancing there too
- Added count_notna, count_isna, and has_different_values to apply to several columns in a dataframe
- added merge_smart() replacing merge_force_suffix , offering more functionality, such as prefix and optionally renaming keys or preserving them
- adding merge_outer_and_split to the library, which generates inner join and extractions for the nans and non matches.
- fix bug in clean string
- fixed percertage count option in count function
- added silent=False option in the cleanup, groupby and coalesce functions to reduce the logging when used within other functions
- improved clean_string function to do unicode decoding and preserve dashes
- refactored coalesce columns to add more input validation but also accept columns that may not be in the dataframe, for cases where we are looping disparate dataframes
- Added get_latest_modif_file_from() to the filemanager.py
- Duplicates returns a log warning if there are no duplicates so it is visible
- Fixed duplicates to return all values if there are no duplicates and to have clearer logging of various cases with nan. BREAKING CHANGES: New parameter dropna=True introduced and add_indicator_column=False replaces indicator=False
- keyword_search_batch() fixed to work better when limiting the output to hits.
- Added map_values() function to map common values to numbers and the
other way around. e.g. 1,2,3 to "red","amber","green" and so on.
- Improving tests and dosctrings
- cleanup_dataframe_columns_names now replaces $ £ € with usd, gbp and eur respectively
- requirements.txt has lifted specific version requirement for sphinx (for the documentation), otherwise it doesnt install in gitpod, no impact on main library
- Added business hours calculator module
- Upgraded filemanager, now it uses a json file to store config check the docs
- Refactored sequence checks and added grouping in output
- profile_dataframe() fixes
- filemanager now has yaml instead of singleton object, full rewrite
- cleanup_column_names() now accepts a list in addition to a dataframe and cleans unicode accents
- keyword_search accepts labels including rollup columns
- logger now in colour and to stdout
- count_values_in_col() has percentage option
- keyword_search() refactoring and some bug fixes
- keyword_search allows columns with labels and rollups for multiple patterns for variations of a single conceptual keyword
- keyword_search can bring individual hits as a "thin and long" table
- coalesce_dataframe_columns supports "last" value to keep
- group_by_text concatenates better, and also added option unique=True to restrieve only unique instances
- save() now can direct to one of the channels
- Lots
- Fillna_smart() improved
- Implemented a count_values_in_col(), can do combined columns
- Implemented merge_smart() to improve handling of suffixes
- Many fixes and refactoring, most notably logging when saving and load files
- renamed count_related()
- Refactoring check_blanks() to be cleaner how it does the summary, better tests and a change in the default behaviour (by default is restricted to just nans)
- Lots of refactoring. Generally ensuring that using inplace=True returns True to avoid confusion.
- Also working on test suite to improve it
- working on documentation across the board, thanks Autopilot for the help, couldn't have done it without you
- Expanding the function to create test datasets to play and to put in examples, these are not used in the core test suite
- Loads and loads
- profile_dataframe() : Added an option to return a dictionary
- Now hosting documentation on Read the Docs
- Working through all the doctrings to make them useable
- Improving test suite, currently at 70%, but 3 out of 41 pending development.
- Various improvements
- First release of
pydit