Skip to content

Latest commit

 

History

History
13 lines (9 loc) · 822 Bytes

README.md

File metadata and controls

13 lines (9 loc) · 822 Bytes

Started out by reading Xavier Mertens' SANS ISC post on searching for newly registered domains: https://isc.sans.edu/forums/diary/Tracking+Newly+Registered+Domains/23127/

This approach seemed neat, but rather narrow since it relied on one-to-one dictionary matching in order to find results on interest.

Rather than limit myself to dictionary matches, I decided to look into ways of leveraging Python to find 'similarity' items via several algorithms:

  • Python difflib implementation of Ratcliff Obershelp sequence matching
  • Python editdistance implementation of Levenshtein distance
  • Jaccard similarity

These are all provided as command line switch arguments to the script (s, e, and j respectively) to be run against a provided word list, one record per line.

Result is a list of best-match to lowest-match.