Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Add data validation to prevent rogue entries #32
Sometimes entries from the third-party lists have errors, such as adlog..com (two periods instead of one).
This will probably require some
subdomain(s) (if applicable), a period (.), the domain, another period (.), and finally, the top level domain.
The real challenge is to see if these can be combined into one, I'm not very good at simplification.
Yeah, simplifying regexp statements to be optimized is rough. As long as I really don't require the speed I usually combine statements like this:
This way it's easier for a human to look and see what's going on.
I tested it with the following example:
I also fixed our regexp examples in this post (we both had some errors). I had an extra