Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Add batch_process tool #1625

Merged
merged 1 commit into from Feb 4, 2022
Merged

Conversation

jwalgran
Copy link
Contributor

@jwalgran jwalgran commented Feb 3, 2022

Overview

A development helper to make processing a CSV a bit more convenient for developers.

Demo

$ ./tools/batch_process 16
[+] Running 1/0
 ⠿ Container open-apparel-registry-database-1  Running                                                                                  0.0s
parse: 2 successes
[+] Running 1/0
 ⠿ Container open-apparel-registry-database-1  Running                                                                                  0.0s
geocode: 2 successes
[+] Running 1/0
 ⠿ Container open-apparel-registry-database-1  Running                                                                                  0.0s
INFO:api.matching:Rebuilding gazetteer
INFO:dedupe.api:reading training from file
INFO:dedupe.training:Final predicate set:
INFO:dedupe.training:(SimplePredicate: (commonTwoTokens, address), SimplePredicate: (sameSevenCharStartPredicate, name))
INFO:dedupe.training:(SimplePredicate: (sameSevenCharStartPredicate, address), SimplePredicate: (wholeFieldPredicate, country))
INFO:dedupe.training:(SimplePredicate: (commonIntegerPredicate, address), SimplePredicate: (sameFiveCharStartPredicate, name))
INFO:rlr.crossvalidation:using cross validation to find optimum alpha...
INFO:rlr.crossvalidation:optimum alpha: 0.100000, score 0.4985716877424044
INFO:dedupe.training:Final predicate set:
INFO:dedupe.training:(SimplePredicate: (commonSixGram, name), SimplePredicate: (sameSevenCharStartPredicate, address))
INFO:dedupe.training:(SimplePredicate: (commonIntegerPredicate, address), SimplePredicate: (sameFiveCharStartPredicate, name))
INFO:dedupe.training:(SimplePredicate: (commonThreeTokens, name), SimplePredicate: (fingerprint, name))
INFO:api.matching:Indexing started
INFO:api.matching:Indexing finished (0:00:00.095603)
INFO:api.matching:Cleanup training
INFO:dedupe.api:0 records
match: 2 successes

Testing Instructions

  • Upload a list and note the list id
  • Run ./tools/batch_process {id} and verify that it completes without error.

Checklist

  • fixup! commits have been squashed
  • CI passes after rebase
  • CHANGELOG.md updated with summary of features or fixes, following Keep a Changelog guidelines

Copy link
Contributor

@TaiWilkin TaiWilkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thrilled that this PR exists. It works perfectly.

I recommend adding instructions / information about this tool to the README before merging.

@TaiWilkin TaiWilkin assigned jwalgran and unassigned TaiWilkin Feb 3, 2022
A development helper to make processing a CSV a bit more convenient for
developers.
@jwalgran jwalgran force-pushed the feature/jcw/add-batch-process-tool branch from 5d0841c to bf919f0 Compare February 4, 2022 16:53
@jwalgran
Copy link
Contributor Author

jwalgran commented Feb 4, 2022

Thanks for the review. I added the script to the "Tools" section of the README in a rebase.

@jwalgran jwalgran merged commit 4bf6f60 into develop Feb 4, 2022
@jwalgran jwalgran deleted the feature/jcw/add-batch-process-tool branch February 4, 2022 17:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants