Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No mention of the many regular expression engines in use nor the engine used in the lesson's examples #71

Closed
andrewrs opened this issue Dec 4, 2018 · 8 comments

Comments

@andrewrs
Copy link

andrewrs commented Dec 4, 2018

I'd recommend mentioning, at least in passing, that there are many different regular expression engines in common use and that each engine has features and syntax that, while often quite similar, do differ from each other in meaningful ways.

Additionally, there is no mention of the specific engine used for the lesson's examples. Granted, the basic examples used in the lesson will work with any Perl-like engine (with the lesson's suggested online tools employing PCRE and Javascript for the most part), it would be a good thing to plant the idea in the head of students that they might need to learn a particular tool or language's regular expression implementation before using more advanced regex features.

A couple links that describe the varying features of some of the many regular expression engines in use:
https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines (basic overview as one would expect on wikipedia)
https://www.regular-expressions.info/refflavors.html (nice, detailed reference)

@libcce
Copy link
Contributor

libcce commented Dec 4, 2018

+1 I often note this in my workshops but it would be great to reference one or two the links that you point to for more information. Can you see a point in the lesson where we can add this?

@andrewrs
Copy link
Author

andrewrs commented Dec 4, 2018

I think adding 2-3 sentences in the lesson's opening paragraph would be sufficient and then maybe adding the links to the Key Points section.

@drjwbaker
Copy link

@andrewrs Would you be willing and able to recommend some text? I'm aware that there are different flavours of regex, but you seem to no more about the details than I do.

@andrewrs
Copy link
Author

andrewrs commented Dec 7, 2018

Here's my first crack at some text:

Most regular expression implementations employ comparable syntaxes (generally influenced by the Perl programming language's regex syntax) that behave similarly for simple pattern-matching operations. But there are differences, often subtle, in each, so it's always a good practice to read application or language's documentation whenever available, especially if you want to start using more advanced regex features. Some programs, notably many UNIX command line programs, use an older regex standard (POSIX regular expressions) which is less feature-rich and employs different metacharacters than Perl-influenced implementations .

I'll see if I can write something a little more tight over the next couple days when I'm not suffering from Friday afternoon brain fade.

I also came across another nice reference at: https://gist.github.com/CMCDragonkai/6c933f4a7d713ef712145c5eb94a1816

drjwbaker pushed a commit that referenced this issue Dec 10, 2018
add callout on regex engine per discussion at #71
@drjwbaker
Copy link

Many thanks. I like it. Shall we go with:

Most regular expression implementations employ comparable syntaxes (generally influenced by the regex syntax of a programming language called Perl) that behave similarly for most pattern-matching operations. But there are differences, often subtle, in each, so it's always a good practice to read application or language's documentation whenever available, especially if you want to start using more advanced regex features. Some programs, notably many UNIX command line programs (for more on UNIX see our 'Shell Lesson'), use an older regex standard (called 'POSIX regular expressions') which is less feature-rich and employs different metacharacters than Perl-influenced implementations. For the purposes of our lesson, you don't need to worry too much about all this, but if you want to follow up on this see this detailed engine comparison.

I suggest we add it between '..including markdown and HTML.' and 'A very simple use of a regular expression..' at https://librarycarpentry.org/lc-data-intro/04-regular-expressions/index.html as a pinned callout (like 'Tab for Auto-complete' at https://librarycarpentry.org/lc-shell/03-working-with-files-and-folders/index.html).

PR here #73

@andrewrs
Copy link
Author

Yeah, I was thinking that placing the new text between '..including markdown and HTML.' and 'A very simple use of a regular expression..' too.

I forked and the previously referenced regex feature comparison and separated the feature categories into their own tables to allow for easier scrolling:
https://gist.github.com/andrewrs/74ece75269f56d074408df216b3d9e77

I like your edits and additions and I think version you posted is good to go. Yesterday, I revised my initial draft to come up with:

Most regular expression implementations employ comparable syntaxes and metacharacters (generally influenced by the Perl programming language's regex syntax), and they behave similarly for the simple pattern-matching exercises in this lesson. But there are differences, often subtle, in each, so it's always a good practice to read application or language's documentation whenever available, especially when you start using more advanced regex features. Some programs, notably many UNIX command line programs, use an older regex standard (POSIX regular expressions) which is less feature-rich and uses different metacharacters than Perl-influenced implementations. Reference the links at the end of this lesson for greater detail.

Sorry for the delay in posting the revision. Github apparently determined that my final edits to the regex table mentioned above weren't adequately human-like and locked my account for a day until I could customer service to unlock it.

Feel free to pick and choose from either version. Again, I think the version you posted is perfectly sufficient.

drjwbaker pushed a commit that referenced this issue Dec 11, 2018
Light edit of text based on revision from @andrewrs at #71 (comment)
@drjwbaker
Copy link

@andrewrs Woah, weird bot drama! I've incorporated your revised text into the PR. Ta!

@drjwbaker
Copy link

Resolved by #73

zkamvar pushed a commit that referenced this issue May 3, 2023
add callout on regex engine per discussion at #71
zkamvar pushed a commit that referenced this issue May 3, 2023
Light edit of text based on revision from @andrewrs at #71 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants