-
-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learn Regular Expressions by Building a Spam Filter #122
Comments
I'd be interested in contributing to this challenge. |
@lionel-rowe Awesome! I think this will be a fun one. See if you can get an extremely simple demo that checks a line of input for spam words and the use of characters meant to mask spam words like v1agr@ or something like that 😄 - I'm excited to see what you come up with. |
I'm actually wondering how best to approach this one. In order to make a truly extensible filter that would catch v1agr@, vi@gra, v|agRa, V I A G R A, etc. it'd really make more sense to have programmatically-generated regexes, rather than regex literals: https://gist.github.com/lionel-rowe/0724546e4b5c1a71be29502aac4c825e However, this approach would probably introduce too many disparate concepts at once, plus it has some gotchas (double-escaping, for starters). Any thoughts on how to simplify it while still guiding students toward writing extensible code? Note that the regexes generated by this method are almost completely unreadable: [
/\bv\s*[i|]\s*[a@4]\s*[g69]\s*r\s*[a@4]\b/i,
/\bf\s*r\s*[e3]\s*[e3]\s* \s*m\s*[o0]\s*n\s*[e3]\s*y\b/i,
/\bw\s*[o0]\s*r\s*k\s* \s*f\s*r\s*[o0]\s*m\s* \s*h\s*[o0]\s*m\s*[e3]\b/i,
/\b[s5]\s*[t7]\s*[o0]\s*[c{\[(]\s*k\s* \s*[a@4]\s*l\s*[e3]\s*r\s*[t7]\b/i,
/\bd\s*[e3]\s*[a@4]\s*r\s* \s*f\s*r\s*[i|]\s*[e3]\s*n\s*d\b/i
] Alternatively, we could just take the hard-coding approach and limit the test cases somewhat to avoid having completely unreadable regexes. |
Yes, programmatically might make the most sense in the real world but as you have pointed out, it really is not readable (nor does it teach much about regex). What about making the lesson slowly build on itself? Go from a normal spelling of a spam word. Next lesson is to add an alternative spelling. Next is more alternative spellings. Oh, wait...the spammers have gotten smarter and are now doing “x” things instead...how can we improve it from here? I’m imagining a text based game of sorts. Outsmarting the spammers. And with each step learning a bit more regex. And how to alter regex when the requirements change. It could include a bit of history too for those that aren’t familiar with how spam use to be regexed when spam was a relatively new concept. Let the user experience what it might have been like as a mail server admin back in the simpler days... Ultimately, the final step could be acknowledging the limits of human readable spam filtering. But that’s ok. The user isn’t trying to become a regex master with these lessons. Mostly what they need to learn is how to read and write basic regex. They can search the internet, or use one of the visual regex tools, for anything more complex. |
@brandenbyers yeah, that sounds like a good way to approach it. Here's an updated version: https://gist.github.com/lionel-rowe/72a19bf346858ede6f406ad20e7c157a As I've split out the logic of deleting intra-word spaces from the logic of de-mangling, this version has a clearer path for how to iterate:
The biggest challenge conceptually is probably the |
@lionel-rowe I agree with Branden that it's much more important that we teach these regex techniques even if real life approaches to spam filtering would be different. Remember that the entire curriculum will be a series of individual tests, and getting one test to pass at a time. So we will only be testing one aspect of their regular expressions at a time. And any concepts we want to impart, we will need to do so in just a few words as part of a test description. There won't be any paragraphs of explainer text. |
@lionel-rowe, just wanted to check in and see how everything's going with the project. Your gist looks like a great start! My only suggestion would be to keep things simple, teaching just one regex concept with some repetition/review of earlier concepts between. Your project will be replacing the lessons here, so the Anyway, hope that helps! Please let us know if there's anything we can help out with. |
The idea will be to build each regex up incrementally (based on the second gist I posted, not the first, which I agree is overcomplicated). The only thing I'm concerned about is not covering enough concepts (e.g. I'll work on making this into a proper lesson where the incremental approach is clearer. |
@lionel-rowe, okay, that sounds great. I don't think you need to worry about the coverage of your project. Like you said, teaching the broad concept of flags and using several in depth is better than covering all the flags. Looking forward to seeing your lessons! |
@lionel-rowe, just checking in to see how things are going. Did you start breaking this project down into steps? |
I'll open a work-in-progress PR sometime this weekend. |
Hi @lionel-rowe, just wanted to check on the status of this project, too. Were you able to start on another draft by any chance? |
Draft 2 should be coming shortly, though I'm somewhat swamped with work at the moment. Aiming for within the week. |
@lionel-rowe, thank you for the update! Looking forward to seeing it soon. |
@lionel-rowe, were you able to make any progress on your next draft? Looking forward to seeing it soon. |
Hi @lionel-rowe, were you able to start on a new draft? |
Yo! How far is this project? Can I consider it as unclaimed? |
Hi @Bam92, thank you for your patience and sorry about the delay. Yes, this project is unclaimed. Feel free to work on a prototype and post updates here as you go along. |
Hi @Bam92 and @scissorsneedfoodtoo , I'd be glad to contribute. |
Hi @CatalanCabbage |
Hi @CatalanCabbage, thank you for picking this project up. There's already been some work done that's been merged into the repo, but please feel free to start from scratch if that's easier. Though this project will focus on regex, we'll still be using JavaScript to teach the fundamentals. Please go ahead and start working on CodePen, CodeSandbox, or some other similar platform, and post a link to your prototype here whenever it's ready. |
@scissorsneedfoodtoo, first off, you're doing great work. :) UI: You had asked me to make a mock-up on CodePen; when we speak of regex, the learner generally progresses in terms of regex complexity; I'm at a loss how to improve on the UI step by step(or maybe look into it later), since it would just involve basically a regex, and running tests. Is one common UI enough for now? How do you visualize this course? Concept-first vs Product-first : The last person who took this task up did a pretty good job, actually. The regex is built say, step by step to morph into a spam filter; however, should we shift focus onto various concepts, and look at the filter as a means to an end, even if it's actually a roundabout way? Should we do a concept-first approach and then try to somehow integrate it into the filter even if it's not how we'd actually do a filter? I think this will benefit the users more. So assuming this is your line of reasoning, I'd like to first list out various general Regex concepts and JS-based regex functions, so we can integrate them into the lessons, and if not all, we could mention them in hints or somewhere along the way, so they're aware. It was stated that concepts need to be explained in a few words; do we provide links for further reading(and if so, are there approved sites)? Also, there are some concepts such as Catastrophic Backtracking which are extremely important, that I'm not sure how to introduce in the lesson(but will look into later). My final question is documentation: Where do I place problem statements, hints, problem completion message and so on? Do I introduce them as comments now and we migrate them later into documentation? There are comments in the repo on the JS files themselves, just wanted to be sure. |
Hi @CatalanCabbage, thanks again for your patience. You have some great questions here, and I'll do my best to answer them one by one. UI: I don't think there needs to be any sort of UI for this project. Looking at the current Regular Expressions section, learners just have the editor to focus on while they build up the code and learn concepts along the way. I see this project working similarly, where they start out with a blank editor and build up the code line by line. If we want them to see any output, we could prompt them to log something to the console. Concept-first vs. Product-first: Great question, and this is something we've been trying to reconcile with a lot of these new projects. I agree that the concept-first approach is better in the long run, even if it's not how you would normally build a production ready spam filter. The RSA Cryptography project does something similar, where we explain several times that what we're teaching is not secure at all, and is for educational purposes only. Ideally the spam filter will cover most of the concepts in the current Regular Expressions section. But we shouldn't go out of our way to introduce concepts or methods if they're not necessary for the spam filter. In some of the other projects we've introduced basic concepts like if/else statements, then later go back and refactor them into ternary operators. I could see doing that in this project as well. Your idea to list out the various concepts first before finishing the prototype sounds good. As for things like catastrophic backtracking, I fear it might be too early in the curriculum to introduce a concept like that. If we do, I would recommend keeping it as simple as possible since this will only be the third project where learners work with JavaScript if they start from the very beginning. The two projects that come before this are the Basic JS RPG game and the Intermediate JS Calorie Counter, both of which are pretty simple. Also that's a very good question about the documentation. These new projects will be quite different than the current challenges, and won't include things like completion messages or hints, at least for the time being. Right now we're just focused on building the prototypes, then breaking them down into short individual steps. The commented out sections are the instructions for each step, so you can introduce them as comments for now. |
Hi! Some comments:
|
A suggestion regarding the UI:
However, imo, regex isn't like other lessons. Example, current view, text (and I'm assuming, the current stance): From FCC Regex course Contrast it to this overview: From regex101 For this extremely simple example, long did it take to grasp the overall objective in both cases? My idea: Here, yellow is what needs to be matched, red is what your regex matches (incorrectly) and green is what your regex matches (correctly). So the overall screen could have the graphical output pane (again, please overlook colors/order/sizes/font size): TLDR, my case for Graphical output:
As always, feedback is appreciated :) |
No description provided.
The text was updated successfully, but these errors were encountered: