Skip to content

mustachemo/data-runner

Repository files navigation

Data Clean-up Tool

Data enforcement feature RegEx enforcement feature

Table of Contents

Setup

To set up and run the Data Clean-up Tool, follow these steps:

  1. Clone this repository to your local machine:
git clone https://github.com/mustachemo/data-runners.git
  1. Change to the project directory:
cd data-cleanup-tool
  1. Create a conda environment from the provided environment.yml file:
conda env create -f environment.yml

Optionally, to update conda environment using existing file:

conda env update --file environment.yml --prune
  1. Activate the newly created conda environment:
conda activate data_cleanup_env
  1. Run the Application:

    • Open a terminal and execute the following command:

      python run.py
    • OR

      • Open the command palette by pressing Ctrl+Shift+P (Windows/Linux) or Cmd+Shift+P (Mac).

      • Type "Run Task" and select "Tasks: Run Task" from the dropdown list.

      • Alternatively, you can use the keyboard shortcut Ctrl+Shift+B (Windows/Linux) or Cmd+Shift+B (Mac).

Features

Optimization

Extras

Problem

The presence of large amounts of bad data which does not comply with the required format, currently not relevant and that has been entered into the warehouse management system (WMS) incorrectly and cannot be utilized for any purpose. This data always causes hinderance in many daily activities, become hurdles when the company transitions to a new WMS and most importantly occupies huge amounts of memory in the server systems. A tool which can help identify this bad data, modify it to required format and delete gaps, if necessary, can help resolve many of the forementioned issues.

Objectives

Objective is to design and function tool that can help the company to identify and delete, modify, fix this bad data, gaps in data, and eliminate a large amount as per user requirement. This will reduce manual work related to fixing this bad data.

  • Standard Features:

    • Ability to read various formats of data (xml, csv, pdf etc.;) and display in rows and columns.
    • Give the user the ability to define each row or column of data according to the user’s preference. And modify or display the data that is not according to the defined parameters. Preferably in GUI for a layman to use it.
    • Combine different sets of data of same format into one set and customize as per user requirements.
    • Ability to export into different formats as per user needs.
  • Bonus Features:

    • Identify duplicate data in different formats, errors such as wrong address format, punctuation, spellings, and address styles. Filter the data and display the rows and columns with these discrepancies.
    • Creating visuals from the data.