Skip to content

Latest commit

 

History

History
222 lines (126 loc) · 5.29 KB

CHANGELOG.md

File metadata and controls

222 lines (126 loc) · 5.29 KB

Changelog

V0.1.07

V0.1.06

Small improvements in the Benford law tests

V0.1.05

Fixing some bugs in check_sequence()

V0.1.04

No new features, some refactoring and bug fixes of fuzzy matching and clean_string()

V0.1.03

No changes, tweaking versions in pyproject.toml and requirements.txt due to dependabot warning

V0.1.02

Features

  • Added lookup_values() , a sort of xlookup() for odd use cases e.g. Airtables
    which includes linked tables as list with one (or more) elements

V0.1.01

  • Changed requirements to the latest version of everything (pandas etc). This is due to Github actions not passing the test there while it is passing locally, so I will try to run in latest version and fix whatever GH doesn't like. As soon as I sort out why I will pin a particular version.

V0.0.18

Fixes

  • Fixed check_blanks() and coalesce_values(), some small refactoring and enhancing there too

V0.0.17

Features

  • Added count_notna, count_isna, and has_different_values to apply to several columns in a dataframe

V0.0.16

Features

  • added merge_smart() replacing merge_force_suffix , offering more functionality, such as prefix and optionally renaming keys or preserving them

V0.0.15

Features

  • adding merge_outer_and_split to the library, which generates inner join and extractions for the nans and non matches.

Fixes

  • fix bug in clean string
  • fixed percertage count option in count function

V0.0.14

Tweaks

  • added silent=False option in the cleanup, groupby and coalesce functions to reduce the logging when used within other functions
  • improved clean_string function to do unicode decoding and preserve dashes

Fixes

  • refactored coalesce columns to add more input validation but also accept columns that may not be in the dataframe, for cases where we are looping disparate dataframes

V0.0.13

Features

  • Added get_latest_modif_file_from() to the filemanager.py

Tweaks

  • Duplicates returns a log warning if there are no duplicates so it is visible

V0.0.12

Fixes

  • Fixed duplicates to return all values if there are no duplicates and to have clearer logging of various cases with nan. BREAKING CHANGES: New parameter dropna=True introduced and add_indicator_column=False replaces indicator=False
  • keyword_search_batch() fixed to work better when limiting the output to hits.

Features

  • Added map_values() function to map common values to numbers and the
    other way around. e.g. 1,2,3 to "red","amber","green" and so on.

V0.0.11

Fixes

  • Improving tests and dosctrings
  • cleanup_dataframe_columns_names now replaces $ £ € with usd, gbp and eur respectively
  • requirements.txt has lifted specific version requirement for sphinx (for the documentation), otherwise it doesnt install in gitpod, no impact on main library

V0.0.10

Features

  • Added business hours calculator module
  • Upgraded filemanager, now it uses a json file to store config check the docs

Tweaks

  • Refactored sequence checks and added grouping in output

Fixes

  • profile_dataframe() fixes

V0.0.9 (20/08/2022)

Features

  • filemanager now has yaml instead of singleton object, full rewrite

Tweaks

  • cleanup_column_names() now accepts a list in addition to a dataframe and cleans unicode accents
  • keyword_search accepts labels including rollup columns
  • logger now in colour and to stdout
  • count_values_in_col() has percentage option

Fixes

  • keyword_search() refactoring and some bug fixes

V0.0.8 (16/07/2022)

Features

  • keyword_search allows columns with labels and rollups for multiple patterns for variations of a single conceptual keyword
  • keyword_search can bring individual hits as a "thin and long" table

Tweaks

  • coalesce_dataframe_columns supports "last" value to keep
  • group_by_text concatenates better, and also added option unique=True to restrieve only unique instances
  • save() now can direct to one of the channels

Fixes

  • Lots

v0.0.7 (3/07/2022)

Features

  • Fillna_smart() improved
  • Implemented a count_values_in_col(), can do combined columns
  • Implemented merge_smart() to improve handling of suffixes

Fix

  • Many fixes and refactoring, most notably logging when saving and load files
  • renamed count_related()

v0.0.6 (19/06/2022)

Fix

  • Refactoring check_blanks() to be cleaner how it does the summary, better tests and a change in the default behaviour (by default is restricted to just nans)

v0.0.5 (4/06/2022)

Fix

  • Lots of refactoring. Generally ensuring that using inplace=True returns True to avoid confusion.
  • Also working on test suite to improve it

Documentation

  • working on documentation across the board, thanks Autopilot for the help, couldn't have done it without you

Feature

  • Expanding the function to create test datasets to play and to put in examples, these are not used in the core test suite

v0.0.4 (4/06/2022)

Fix

  • Loads and loads

Feature

  • profile_dataframe() : Added an option to return a dictionary

Documentation

  • Now hosting documentation on Read the Docs
  • Working through all the doctrings to make them useable

Test suite

  • Improving test suite, currently at 70%, but 3 out of 41 pending development.

v0.0.2-3

  • Various improvements

v0.0.1 (14/05/2022)

  • First release of pydit