Skip to content

v0.1.0

Choose a tag to compare

@Rickyy-Sam07 Rickyy-Sam07 released this 12 Jun 09:59
· 19 commits to main since this release

πŸ“¦ Puripy v0.1.0 – First Public Release

Welcome to the initial release of Puripy, a modular and powerful Python package designed for cleaning and preprocessing messy data across Text, Categorical, Numerical, and Datetime fields.


Features

Text Cleaner

  • βœ… Contraction expansion, emoji/URL/HTML removal
  • βœ… Stopword removal, stemming, lemmatization
  • βœ… Spelling correction, profanity filtering, n-gram generation
  • βœ… Auto column detection & parallel processing

🏷️ Categorical Cleaner

  • βœ… Fuzzy typo correction with thefuzz
  • βœ… Rare category grouping
  • βœ… OneHot, Ordinal, and Label encoding via sklearn
  • βœ… Text normalization and full reporting

Numerical Cleaner

  • βœ… Missing value imputation (mean, median, mode)
  • βœ… Outlier handling (IQR method)
  • βœ… Type conversion and precision control
  • βœ… Duplicate detection and domain rule enforcement

Datetime Cleaner

  • βœ… Flexible datetime parsing and fuzzy matching
  • βœ… Timezone normalization
  • βœ… Missing date imputation using STL decomposition
  • βœ… Feature extraction (year, month, day, quarter, fiscal, etc.)

What's New in v0.1.0?

  • Initial release with full support for text, categorical, numerical, and datetime cleaning.
  • Built-in support for parallel processing and logging.
  • Highly customizable pipelines using configuration dictionaries.
  • Auto-generated cleaning reports for auditability.

Tech Stack

  • pandas, numpy, nltk, textblob, sklearn, emoji, contractions, better_profanity, tqdm, joblib, pytz, statsmodels, and more.

Known Notes

  • This is a pre-1.0 release β€” APIs and behavior might change in future versions.
  • Ideal for testing, experimentation, and feedback.

Contribute

Feedback, issues, and pull requests are welcome!
Star ⭐ the repo and help shape Puripy into a go-to tool for data cleaning.


Let me know if you'd also like to generate a sample CHANGELOG.md or GitHub Action workflow for automated releases.