Skip to content
This repository has been archived by the owner on Aug 13, 2021. It is now read-only.
/ ds-pub-utils Public archive

Public data science utilities used @ reBuy

License

Notifications You must be signed in to change notification settings

rebuy-de/ds-pub-utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo


Public Data Science Utilities @ reBuy

Warning / License

This package is, by and large, under active development and nothing should be taken here for granted. It is intended to be used as part of other, internal, workflows. Therefore, it is very likely that changes will occur. It is available under the MIT license.

Lastly, this is a public repository; DO NOT INCLUDE ANY BUSINESS LOGIC NOR DATA NOR ANYTHING CONFIDENTIAL!

Provided Modules

features_engineering

Along the design lines of Scikit learn, the classes in this module provide fit, transform and fit_transform functionalities. However, when transforming data using these classes, new features are added, and nothing is removed.

preprocessing

Inspired by sklearn-pandas, this module provide preprocessing functionalities for columns of a DataFrame. In contrast to the features engineering module, this one doesn't append columns to the data, but rather replaces.

data_fetch

Utilities for data fetching

Installation

  1. (Optional but recommended) Start a new virtual environment.
    1. Either using conda create --name test-this python=3. The package needs Python 3.x.
    2. Or, use the provided environment.yml.
  2. Clone the repository
  3. Run pip install -e . from the directory of the package
  4. (Optional) you can run pytest from the root of the package and see if all tests passes

Remark on pymssql

The function data_fetch.from_sql_sever uses pymssql which in turn depends on freetds. If you want to use this function, make sure you install pymssql. This SO thread might be helpful as well

Uninstallation

At {virtualenv}/lib/python2.7/site-packages/ (if not using virtualenv then {system_dir}/lib/python2.7/dist-packages/) remove the egg file (e.g. pubdsutils-0.6.34-py2.7.egg) if there is any. From file easy-install.pth, remove the corresponding line (it should be a path to the source directory or of an egg file). Source is SO answer.

Maintaining issues:

  • Use flake8 --exclude=build to check that the code is well styled
  • Use pytest --cov-report term-missing --cov=pubdsutils tests/ to check the tests coverage
  • Execute sphinx-apidoc -f -o . ../pubdsutils/ from ./docs when adding/removing module/packages
  • Documentation:
    • make html from ./docs will generate the documentation.
    • After building the docs, you can publish them (./docs/_build/html) to the gh-pages branch. Most easily, this can be done, by ghp-import -n -p docs/_build/html from the project's root.

Releases

No releases published

Packages

No packages published

Languages