Learn Regular Expressions by Building a Spam Filter #122

QuincyLarson · 2019-03-08T01:15:17Z

No description provided.

lionel-rowe · 2019-03-15T09:39:17Z

I'd be interested in contributing to this challenge.

QuincyLarson · 2019-03-21T22:36:31Z

@lionel-rowe Awesome! I think this will be a fun one. See if you can get an extremely simple demo that checks a line of input for spam words and the use of characters meant to mask spam words like v1agr@ or something like that 😄 - I'm excited to see what you come up with.

lionel-rowe · 2019-03-22T10:26:06Z

I'm actually wondering how best to approach this one. In order to make a truly extensible filter that would catch v1agr@, vi@gra, v|agRa, V I A G R A, etc. it'd really make more sense to have programmatically-generated regexes, rather than regex literals:

https://gist.github.com/lionel-rowe/0724546e4b5c1a71be29502aac4c825e

However, this approach would probably introduce too many disparate concepts at once, plus it has some gotchas (double-escaping, for starters). Any thoughts on how to simplify it while still guiding students toward writing extensible code?

Note that the regexes generated by this method are almost completely unreadable:

[
  /\bv\s*[i|]\s*[a@4]\s*[g69]\s*r\s*[a@4]\b/i,
  /\bf\s*r\s*[e3]\s*[e3]\s* \s*m\s*[o0]\s*n\s*[e3]\s*y\b/i,
  /\bw\s*[o0]\s*r\s*k\s* \s*f\s*r\s*[o0]\s*m\s* \s*h\s*[o0]\s*m\s*[e3]\b/i,
  /\b[s5]\s*[t7]\s*[o0]\s*[c{\[(]\s*k\s* \s*[a@4]\s*l\s*[e3]\s*r\s*[t7]\b/i,
  /\bd\s*[e3]\s*[a@4]\s*r\s* \s*f\s*r\s*[i|]\s*[e3]\s*n\s*d\b/i
]

Alternatively, we could just take the hard-coding approach and limit the test cases somewhat to avoid having completely unreadable regexes.

brandenbyers · 2019-03-22T14:55:18Z

Yes, programmatically might make the most sense in the real world but as you have pointed out, it really is not readable (nor does it teach much about regex). What about making the lesson slowly build on itself? Go from a normal spelling of a spam word. Next lesson is to add an alternative spelling. Next is more alternative spellings. Oh, wait...the spammers have gotten smarter and are now doing “x” things instead...how can we improve it from here?

I’m imagining a text based game of sorts. Outsmarting the spammers. And with each step learning a bit more regex. And how to alter regex when the requirements change.

It could include a bit of history too for those that aren’t familiar with how spam use to be regexed when spam was a relatively new concept. Let the user experience what it might have been like as a mail server admin back in the simpler days...

Ultimately, the final step could be acknowledging the limits of human readable spam filtering. But that’s ok. The user isn’t trying to become a regex master with these lessons. Mostly what they need to learn is how to read and write basic regex. They can search the internet, or use one of the visual regex tools, for anything more complex.

lionel-rowe · 2019-03-22T19:07:22Z

@brandenbyers yeah, that sounds like a good way to approach it. Here's an updated version:

https://gist.github.com/lionel-rowe/72a19bf346858ede6f406ad20e7c157a

As I've split out the logic of deleting intra-word spaces from the logic of de-mangling, this version has a clearer path for how to iterate:

Step 1 - plain words (can be solved with includes, doesn't even need regexes)
Step 2 - plain words with mixed case (can be solved with toLowerCase, but here we introduce regexes with i flag as an alternative)
Step 3 - mangled words
Step 4 - mangled words + intra-word spaces
Step 5 - the spammers have moved on to using images, now you're into the realm of OCR and NLP... but that's for another lesson 😉

The biggest challenge conceptually is probably the wordCondenser regex used in step 4: /(?:^|\s)\S(?:(\s+)\S)(?:\1\S)*(?:$|\s)/g, but the funkiest bits of that ((?:^|\s) and (?:$|\s)) can actually be introduced in step 2 or 3.

QuincyLarson · 2019-03-26T20:30:19Z

@lionel-rowe I agree with Branden that it's much more important that we teach these regex techniques even if real life approaches to spam filtering would be different.

Remember that the entire curriculum will be a series of individual tests, and getting one test to pass at a time. So we will only be testing one aspect of their regular expressions at a time. And any concepts we want to impart, we will need to do so in just a few words as part of a test description. There won't be any paragraphs of explainer text.

scissorsneedfoodtoo · 2019-04-03T06:58:43Z

@lionel-rowe, just wanted to check in and see how everything's going with the project. Your gist looks like a great start!

My only suggestion would be to keep things simple, teaching just one regex concept with some repetition/review of earlier concepts between. Your project will be replacing the lessons here, so the wordCondenser regex you have (while really cool!) is probably too advanced for students at this point in the curriculum.

Anyway, hope that helps! Please let us know if there's anything we can help out with.

lionel-rowe · 2019-04-04T08:44:17Z

@scissorsneedfoodtoo Your project will be replacing the lessons here, so the wordCondenser regex you have (while really cool!) is probably too advanced for students at this point in the curriculum.

The idea will be to build each regex up incrementally (based on the second gist I posted, not the first, which I agree is overcomplicated). The only thing I'm concerned about is not covering enough concepts (e.g. \d, \w, etc. won't be covered). I think this is probably OK as long as the broader concept is covered. For example, if u flag isn't covered, the idea of flags in general is covered; if \W isn't covered, \S is, and so on.

I'll work on making this into a proper lesson where the incremental approach is clearer.

scissorsneedfoodtoo · 2019-04-04T10:18:57Z

@lionel-rowe, okay, that sounds great. I don't think you need to worry about the coverage of your project. Like you said, teaching the broad concept of flags and using several in depth is better than covering all the flags. Looking forward to seeing your lessons!

scissorsneedfoodtoo · 2019-04-12T06:31:04Z

@lionel-rowe, just checking in to see how things are going. Did you start breaking this project down into steps?

lionel-rowe · 2019-04-13T02:36:19Z

I'll open a work-in-progress PR sometime this weekend.

scissorsneedfoodtoo · 2019-06-24T14:41:54Z

Hi @lionel-rowe, just wanted to check on the status of this project, too. Were you able to start on another draft by any chance?

lionel-rowe · 2019-06-24T20:21:23Z

Draft 2 should be coming shortly, though I'm somewhat swamped with work at the moment. Aiming for within the week.

scissorsneedfoodtoo · 2019-06-26T02:23:05Z

@lionel-rowe, thank you for the update! Looking forward to seeing it soon.

scissorsneedfoodtoo · 2019-07-17T10:36:16Z

@lionel-rowe, were you able to make any progress on your next draft? Looking forward to seeing it soon.

scissorsneedfoodtoo · 2019-08-15T13:28:52Z

Hi @lionel-rowe, were you able to start on a new draft?

Bam92 · 2019-11-09T15:09:11Z

Yo!

How far is this project? Can I consider it as unclaimed?

scissorsneedfoodtoo · 2019-11-18T06:56:03Z

Hi @Bam92, thank you for your patience and sorry about the delay.

Yes, this project is unclaimed. Feel free to work on a prototype and post updates here as you go along.

CatalanCabbage · 2019-12-07T16:23:33Z

Hi @Bam92 and @scissorsneedfoodtoo , I'd be glad to contribute.
I'm back-end and not familiar with JS or Python, which rules out many of the listed projects; however, I believe I can contribute to this section, since it's regex based.
Do let me know if I can help!

Bam92 · 2019-12-09T02:35:46Z

Hi @CatalanCabbage
As for now, I am busy, you can go for if you can.

scissorsneedfoodtoo · 2019-12-09T05:23:13Z

Hi @CatalanCabbage, thank you for picking this project up. There's already been some work done that's been merged into the repo, but please feel free to start from scratch if that's easier.

Though this project will focus on regex, we'll still be using JavaScript to teach the fundamentals. Please go ahead and start working on CodePen, CodeSandbox, or some other similar platform, and post a link to your prototype here whenever it's ready.

CatalanCabbage · 2019-12-12T18:06:45Z

@scissorsneedfoodtoo, first off, you're doing great work. :)
There are some questions I have, and I'd be grateful if you could point me in the right direction. Also seeing how this seems to change hands so often, this could serve as a guide for anybody to pick it up, if I fail to(touch wood!) :)

UI: You had asked me to make a mock-up on CodePen; when we speak of regex, the learner generally progresses in terms of regex complexity; I'm at a loss how to improve on the UI step by step(or maybe look into it later), since it would just involve basically a regex, and running tests. Is one common UI enough for now? How do you visualize this course?

Concept-first vs Product-first : The last person who took this task up did a pretty good job, actually. The regex is built say, step by step to morph into a spam filter; however, should we shift focus onto various concepts, and look at the filter as a means to an end, even if it's actually a roundabout way? Should we do a concept-first approach and then try to somehow integrate it into the filter even if it's not how we'd actually do a filter? I think this will benefit the users more.

So assuming this is your line of reasoning, I'd like to first list out various general Regex concepts and JS-based regex functions, so we can integrate them into the lessons, and if not all, we could mention them in hints or somewhere along the way, so they're aware. It was stated that concepts need to be explained in a few words; do we provide links for further reading(and if so, are there approved sites)? Also, there are some concepts such as Catastrophic Backtracking which are extremely important, that I'm not sure how to introduce in the lesson(but will look into later).
This will also be helpful as a future reference if I compile this list.

My final question is documentation: Where do I place problem statements, hints, problem completion message and so on? Do I introduce them as comments now and we migrate them later into documentation? There are comments in the repo on the JS files themselves, just wanted to be sure.

scissorsneedfoodtoo · 2019-12-17T05:09:51Z

Hi @CatalanCabbage, thanks again for your patience. You have some great questions here, and I'll do my best to answer them one by one.

UI: I don't think there needs to be any sort of UI for this project. Looking at the current Regular Expressions section, learners just have the editor to focus on while they build up the code and learn concepts along the way. I see this project working similarly, where they start out with a blank editor and build up the code line by line. If we want them to see any output, we could prompt them to log something to the console.

Concept-first vs. Product-first: Great question, and this is something we've been trying to reconcile with a lot of these new projects. I agree that the concept-first approach is better in the long run, even if it's not how you would normally build a production ready spam filter. The RSA Cryptography project does something similar, where we explain several times that what we're teaching is not secure at all, and is for educational purposes only.

Ideally the spam filter will cover most of the concepts in the current Regular Expressions section. But we shouldn't go out of our way to introduce concepts or methods if they're not necessary for the spam filter. In some of the other projects we've introduced basic concepts like if/else statements, then later go back and refactor them into ternary operators. I could see doing that in this project as well.

Your idea to list out the various concepts first before finishing the prototype sounds good. As for things like catastrophic backtracking, I fear it might be too early in the curriculum to introduce a concept like that. If we do, I would recommend keeping it as simple as possible since this will only be the third project where learners work with JavaScript if they start from the very beginning. The two projects that come before this are the Basic JS RPG game and the Intermediate JS Calorie Counter, both of which are pretty simple.

Also that's a very good question about the documentation. These new projects will be quite different than the current challenges, and won't include things like completion messages or hints, at least for the time being. Right now we're just focused on building the prototypes, then breaking them down into short individual steps. The commented out sections are the instructions for each step, so you can introduce them as comments for now.

CatalanCabbage · 2020-05-22T14:33:35Z

Hi!
I've built up a basic lesson plan here, please check it out; any feedback/discussion is appreciated. :)

Some comments:

Proceeded with the basic idea we had: start off small, add complexity to the filter
Lessons count: As of that commit, the number of lessons has already crossed 20, and we have probably 2 sections left.
Reduce number of lessons? Might reduce practice/retention (too many concepts at once) and also complexity between consecutive exercises; I don't want to make it too long either, trying to find the balance. Better a bit more where students grasp it, than too less, leaving them exasperated/feeling inadequate imo.
I could work it out after deciding on all the lessons, but if there are any pointers/concerns, feel free to share, so that iterations can be reduced and there's less redundant effort. :)
Any suggestions on appropriate spam words/witty examples/low-key plots is appreciated!
Right now it's just a framework with a one generic example, but we could revise that after the basic plan to make it more engaging, and maybe even map out a kind of story to keep it flowing, like lionel-rowe had already envisioned.
I'd like to refrain from words such as 'viagra', which might be deemed to be distasteful in some cultures.
Are the concepts right? I understand that I've written this with my experience in learning regexes, i.e., the order I would like to take concepts in; it might not be the same for everyone.
If you have any concerns regarding the concepts introduced, their complexity or order of introduction please let me know so we can discuss and make this better for all who take it. :)
cc @scissorsneedfoodtoo

CatalanCabbage · 2020-05-22T15:34:29Z

A suggestion regarding the UI:
I believe the stance is:

I don't think there needs to be any sort of UI for this project
If we want them to see any output, we could prompt them to log something to the console.

However, imo, regex isn't like other lessons.
In other lessons, we have outputs/errors that we debug; however, here, we have 'matches' that we need to convey to the learner.
There are no errors/outputs per se; only expectations of matches.
Now, we can convey this by text or the UI; I find that the UI is very intuitive and helps grasp data quickly.

Example, current view, text (and I'm assuming, the current stance): From FCC Regex course

Contrast it to this overview: From regex101

For this extremely simple example, long did it take to grasp the overall objective in both cases?
The second seems more intuitive and better suited for regexes over repetitions of your regex should match....

My idea:
A graphical output window, where we can see what needs to be matched, and what the current regex matches.
We're building a spam filter, so I modified my ZohoMail layout a bit. We're filtering mails by subject, count of the mails we've flagged as Spam on the left.

Here, yellow is what needs to be matched, red is what your regex matches (incorrectly) and green is what your regex matches (correctly).
Please note that this is a very crude idea, just used Paint, this is just a basic vision for this; please overlook colors/formats/layout etc.

So the overall screen could have the graphical output pane (again, please overlook colors/order/sizes/font size):

TLDR, my case for Graphical output:

More intuitive
There aren't native errors/direct textual output comparison required (as in, doing console.log(outputArray) to check output), so graphically is better suited. Regex is fundamentally different from those lessons.
Encourages experimentation: We all know Regex comes from trying things out (like everything else). Question might say, Can you match 'cccc'?. The student can look at the text and says, Hey, 'cccc' if fine. How can I match 'p1p2p3'?
- Current Text output: Your regex does not match 'cccc'
- Graphical output: Indicates that your regex failed. AND, indicates what it matched. Student learns more.
Easier to debug: Can see what's being matched exactly, can work on even complex regexes with more ease.
Questions are grasped easily. Even when the question explains it well, like must match all words that... vs just understanding it at a glance. It's not a substitute, but a great supplement.
More engaging over repetitive your regex does not match... outputs since it's dynamic.

As always, feedback is appreciated :)

scissorsneedfoodtoo assigned lionel-rowe Apr 3, 2019

lionel-rowe mentioned this issue Apr 14, 2019

feat: first draft of Learn Regular Expressions by Building a Spam Filter #151

Merged

scissorsneedfoodtoo unassigned lionel-rowe Aug 28, 2019

scissorsneedfoodtoo added the help wanted label Aug 28, 2019

scissorsneedfoodtoo removed the help wanted label Dec 9, 2019

scissorsneedfoodtoo assigned CatalanCabbage Dec 9, 2019

jdwilkin4 closed this as completed Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learn Regular Expressions by Building a Spam Filter #122

Learn Regular Expressions by Building a Spam Filter #122

QuincyLarson commented Mar 8, 2019

lionel-rowe commented Mar 15, 2019

QuincyLarson commented Mar 21, 2019

lionel-rowe commented Mar 22, 2019

brandenbyers commented Mar 22, 2019 •

edited

Loading

lionel-rowe commented Mar 22, 2019 •

edited

Loading

QuincyLarson commented Mar 26, 2019

scissorsneedfoodtoo commented Apr 3, 2019

lionel-rowe commented Apr 4, 2019

scissorsneedfoodtoo commented Apr 4, 2019 •

edited

Loading

scissorsneedfoodtoo commented Apr 12, 2019

lionel-rowe commented Apr 13, 2019

scissorsneedfoodtoo commented Jun 24, 2019

lionel-rowe commented Jun 24, 2019

scissorsneedfoodtoo commented Jun 26, 2019

scissorsneedfoodtoo commented Jul 17, 2019

scissorsneedfoodtoo commented Aug 15, 2019

Bam92 commented Nov 9, 2019

scissorsneedfoodtoo commented Nov 18, 2019

CatalanCabbage commented Dec 7, 2019

Bam92 commented Dec 9, 2019

scissorsneedfoodtoo commented Dec 9, 2019 •

edited

Loading

CatalanCabbage commented Dec 12, 2019 •

edited

Loading

scissorsneedfoodtoo commented Dec 17, 2019 •

edited

Loading

CatalanCabbage commented May 22, 2020

CatalanCabbage commented May 22, 2020 •

edited

Loading

Learn Regular Expressions by Building a Spam Filter #122

Learn Regular Expressions by Building a Spam Filter #122

Comments

QuincyLarson commented Mar 8, 2019

lionel-rowe commented Mar 15, 2019

QuincyLarson commented Mar 21, 2019

lionel-rowe commented Mar 22, 2019

brandenbyers commented Mar 22, 2019 • edited Loading

lionel-rowe commented Mar 22, 2019 • edited Loading

QuincyLarson commented Mar 26, 2019

scissorsneedfoodtoo commented Apr 3, 2019

lionel-rowe commented Apr 4, 2019

scissorsneedfoodtoo commented Apr 4, 2019 • edited Loading

scissorsneedfoodtoo commented Apr 12, 2019

lionel-rowe commented Apr 13, 2019

scissorsneedfoodtoo commented Jun 24, 2019

lionel-rowe commented Jun 24, 2019

scissorsneedfoodtoo commented Jun 26, 2019

scissorsneedfoodtoo commented Jul 17, 2019

scissorsneedfoodtoo commented Aug 15, 2019

Bam92 commented Nov 9, 2019

scissorsneedfoodtoo commented Nov 18, 2019

CatalanCabbage commented Dec 7, 2019

Bam92 commented Dec 9, 2019

scissorsneedfoodtoo commented Dec 9, 2019 • edited Loading

CatalanCabbage commented Dec 12, 2019 • edited Loading

scissorsneedfoodtoo commented Dec 17, 2019 • edited Loading

CatalanCabbage commented May 22, 2020

CatalanCabbage commented May 22, 2020 • edited Loading

brandenbyers commented Mar 22, 2019 •

edited

Loading

lionel-rowe commented Mar 22, 2019 •

edited

Loading

scissorsneedfoodtoo commented Apr 4, 2019 •

edited

Loading

scissorsneedfoodtoo commented Dec 9, 2019 •

edited

Loading

CatalanCabbage commented Dec 12, 2019 •

edited

Loading

scissorsneedfoodtoo commented Dec 17, 2019 •

edited

Loading

CatalanCabbage commented May 22, 2020 •

edited

Loading