From f4e30e971f703a5cf911b6b4d1169532c49645ce Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:07:59 +0000 Subject: [PATCH 001/129] New translations code.py (Romanian) --- ro-RO/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 ro-RO/resources/code.py diff --git a/ro-RO/resources/code.py b/ro-RO/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ro-RO/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 0d0c38ec6f91bbd6a9db177dceb0953ed608b950 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:01 +0000 Subject: [PATCH 002/129] New translations step_1.md (Danish) --- da-DK/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 da-DK/step_1.md diff --git a/da-DK/step_1.md b/da-DK/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/da-DK/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From a39521d006723d5edf0644aeaeb055f19803e5f1 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:02 +0000 Subject: [PATCH 003/129] New translations code.py (Czech) --- cs-CZ/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 cs-CZ/resources/code.py diff --git a/cs-CZ/resources/code.py b/cs-CZ/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/cs-CZ/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 1bc2fc5b71987434f85c9eba2d26a8772a9b9f06 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:03 +0000 Subject: [PATCH 004/129] New translations .keep (Czech) --- cs-CZ/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 cs-CZ/solutions/.keep diff --git a/cs-CZ/solutions/.keep b/cs-CZ/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/cs-CZ/solutions/.keep @@ -0,0 +1 @@ + From 4f6020c0df360d73ae6841ceb3ffd59a56903f60 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:05 +0000 Subject: [PATCH 005/129] New translations meta.yml (Czech) --- cs-CZ/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 cs-CZ/meta.yml diff --git a/cs-CZ/meta.yml b/cs-CZ/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/cs-CZ/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 9324eff05716e3f1d9e01acb433dbf8f68abb5cc Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:06 +0000 Subject: [PATCH 006/129] New translations step_1.md (Czech) --- cs-CZ/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 cs-CZ/step_1.md diff --git a/cs-CZ/step_1.md b/cs-CZ/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/cs-CZ/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 461f85f89d3e07dc39eb3af1c4e0ef63239d650b Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:07 +0000 Subject: [PATCH 007/129] New translations code.py (Danish) --- da-DK/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 da-DK/resources/code.py diff --git a/da-DK/resources/code.py b/da-DK/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/da-DK/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 54c331b920f02c8c3d50140d7b43e68f95b68724 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:08 +0000 Subject: [PATCH 008/129] New translations .keep (Danish) --- da-DK/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 da-DK/solutions/.keep diff --git a/da-DK/solutions/.keep b/da-DK/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/da-DK/solutions/.keep @@ -0,0 +1 @@ + From 35af01bca926cb7b6591b80437c32b09bf9c27b0 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:10 +0000 Subject: [PATCH 009/129] New translations meta.yml (Danish) --- da-DK/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 da-DK/meta.yml diff --git a/da-DK/meta.yml b/da-DK/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/da-DK/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 224d3500d4e23334f9d3bc6267516d4e1c78f82f Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:11 +0000 Subject: [PATCH 010/129] New translations code.py (German) --- de-DE/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 de-DE/resources/code.py diff --git a/de-DE/resources/code.py b/de-DE/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/de-DE/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 4095e70aad6919eda532321168637dbb7a09bd2d Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:12 +0000 Subject: [PATCH 011/129] New translations meta.yml (Catalan) --- ca-ES/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 ca-ES/meta.yml diff --git a/ca-ES/meta.yml b/ca-ES/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ca-ES/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From e55bf73d0a13ff9b3b84a064813f69f7a09ca6c3 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:13 +0000 Subject: [PATCH 012/129] New translations .keep (German) --- de-DE/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 de-DE/solutions/.keep diff --git a/de-DE/solutions/.keep b/de-DE/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/de-DE/solutions/.keep @@ -0,0 +1 @@ + From 3e74acf1a6210fa778a689ddef847e6299d37554 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:14 +0000 Subject: [PATCH 013/129] New translations meta.yml (German) --- de-DE/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 de-DE/meta.yml diff --git a/de-DE/meta.yml b/de-DE/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/de-DE/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 2aaef1438bc72d18e6bfa22641f27947de41539a Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:16 +0000 Subject: [PATCH 014/129] New translations step_1.md (German) --- de-DE/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 de-DE/step_1.md diff --git a/de-DE/step_1.md b/de-DE/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/de-DE/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 72beb62bea6bbe18a30eeb10859ed2bae86f443a Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:17 +0000 Subject: [PATCH 015/129] New translations code.py (Greek) --- el-GR/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 el-GR/resources/code.py diff --git a/el-GR/resources/code.py b/el-GR/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/el-GR/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 4b96952b13c9f0a9f83cce5bcfa4fc3b45e6670b Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:18 +0000 Subject: [PATCH 016/129] New translations .keep (Greek) --- el-GR/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 el-GR/solutions/.keep diff --git a/el-GR/solutions/.keep b/el-GR/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/el-GR/solutions/.keep @@ -0,0 +1 @@ + From 2eee3d4966413fb2f1c32e7ec5547a489e5f9aa4 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:19 +0000 Subject: [PATCH 017/129] New translations meta.yml (Greek) --- el-GR/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 el-GR/meta.yml diff --git a/el-GR/meta.yml b/el-GR/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/el-GR/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From d52947241996971141edcb7afc684f01242a669f Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:21 +0000 Subject: [PATCH 018/129] New translations code.py (Finnish) --- fi-FI/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 fi-FI/resources/code.py diff --git a/fi-FI/resources/code.py b/fi-FI/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/fi-FI/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 7198055bf989461cc71bea9f9c9def1514a4efec Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:22 +0000 Subject: [PATCH 019/129] New translations .keep (Romanian) --- ro-RO/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 ro-RO/solutions/.keep diff --git a/ro-RO/solutions/.keep b/ro-RO/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ro-RO/solutions/.keep @@ -0,0 +1 @@ + From 8a0c00622ed33f85f8ddda4feb7b22e554de0210 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:23 +0000 Subject: [PATCH 020/129] New translations step_1.md (Catalan) --- ca-ES/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 ca-ES/step_1.md diff --git a/ca-ES/step_1.md b/ca-ES/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ca-ES/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 1cac588a03eacdef497290a6465680e285721c86 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:24 +0000 Subject: [PATCH 021/129] New translations step_1.md (Greek) --- el-GR/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 el-GR/step_1.md diff --git a/el-GR/step_1.md b/el-GR/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/el-GR/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 2c690dd1532dd8ae50f4e582e783a9d21723491b Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:25 +0000 Subject: [PATCH 022/129] New translations .keep (Catalan) --- ca-ES/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 ca-ES/solutions/.keep diff --git a/ca-ES/solutions/.keep b/ca-ES/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ca-ES/solutions/.keep @@ -0,0 +1 @@ + From fa0b8f3f28fbad2e41e2104aa025a6bc9714bebf Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:27 +0000 Subject: [PATCH 023/129] New translations meta.yml (Spanish) --- es-ES/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 es-ES/meta.yml diff --git a/es-ES/meta.yml b/es-ES/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/es-ES/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 302b75c3afe77f71cf9dee7bb91b816bc20480da Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:28 +0000 Subject: [PATCH 024/129] New translations step_1.md (Romanian) --- ro-RO/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 ro-RO/step_1.md diff --git a/ro-RO/step_1.md b/ro-RO/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ro-RO/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From d32b872372353557e698a7b4cb1f8658e22b82b6 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:29 +0000 Subject: [PATCH 025/129] New translations meta.yml (Romanian) --- ro-RO/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 ro-RO/meta.yml diff --git a/ro-RO/meta.yml b/ro-RO/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ro-RO/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 5b17f75554038af1f6e640cf688e6c58ba3bb7d4 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:30 +0000 Subject: [PATCH 026/129] New translations code.py (French) --- fr-FR/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 fr-FR/resources/code.py diff --git a/fr-FR/resources/code.py b/fr-FR/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/fr-FR/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 8542238e04352a85ac51a761f5f5f073008ec83d Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:31 +0000 Subject: [PATCH 027/129] New translations .keep (French) --- fr-FR/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 fr-FR/solutions/.keep diff --git a/fr-FR/solutions/.keep b/fr-FR/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/fr-FR/solutions/.keep @@ -0,0 +1 @@ + From 0f8555d2b4c6ce5aa80248d0273a29eba3c93ecd Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:32 +0000 Subject: [PATCH 028/129] New translations meta.yml (French) --- fr-FR/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 fr-FR/meta.yml diff --git a/fr-FR/meta.yml b/fr-FR/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/fr-FR/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From b2427abfb5e4f2673e4ac9b7594b39b00956e866 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:34 +0000 Subject: [PATCH 029/129] New translations step_1.md (French) --- fr-FR/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 fr-FR/step_1.md diff --git a/fr-FR/step_1.md b/fr-FR/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/fr-FR/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 9bcfb60dffad4e40bf6a0c7efadc585394ae40b6 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:35 +0000 Subject: [PATCH 030/129] New translations .keep (Spanish) --- es-ES/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 es-ES/solutions/.keep diff --git a/es-ES/solutions/.keep b/es-ES/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/es-ES/solutions/.keep @@ -0,0 +1 @@ + From 037bc50ff78da747a45f980fee27190488d01c34 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:36 +0000 Subject: [PATCH 031/129] New translations code.py (Spanish) --- es-ES/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 es-ES/resources/code.py diff --git a/es-ES/resources/code.py b/es-ES/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/es-ES/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From ec324241c24e355088257965e3507e140a393a27 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:37 +0000 Subject: [PATCH 032/129] New translations step_1.md (Spanish) --- es-ES/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 es-ES/step_1.md diff --git a/es-ES/step_1.md b/es-ES/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/es-ES/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From d197e5818ec92bff78bb50d8eb5a149b1c962b4c Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:39 +0000 Subject: [PATCH 033/129] New translations code.py (Arabic) --- ar-SA/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 ar-SA/resources/code.py diff --git a/ar-SA/resources/code.py b/ar-SA/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ar-SA/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From deeb1e968c32f18b5c08efa4ef05f14af9e237a3 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:40 +0000 Subject: [PATCH 034/129] New translations step_1.md (Arabic) --- ar-SA/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 ar-SA/step_1.md diff --git a/ar-SA/step_1.md b/ar-SA/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ar-SA/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 6162e3c6d7cbece90d1588a89421f98c78eee4b4 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:41 +0000 Subject: [PATCH 035/129] New translations code.py (Afrikaans) --- af-ZA/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 af-ZA/resources/code.py diff --git a/af-ZA/resources/code.py b/af-ZA/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/af-ZA/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 9cbda9d451f53fd9af64024714aa5a92ed1d5477 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:42 +0000 Subject: [PATCH 036/129] New translations .keep (Arabic) --- ar-SA/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 ar-SA/solutions/.keep diff --git a/ar-SA/solutions/.keep b/ar-SA/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ar-SA/solutions/.keep @@ -0,0 +1 @@ + From a6f4a0c98e3be38f81a8925611cec23cafdb1947 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:44 +0000 Subject: [PATCH 037/129] New translations meta.yml (Arabic) --- ar-SA/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 ar-SA/meta.yml diff --git a/ar-SA/meta.yml b/ar-SA/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ar-SA/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From c84b2c58f27e1fe58093af75b0890ba04cb985a5 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:45 +0000 Subject: [PATCH 038/129] New translations step_1.md (Afrikaans) --- af-ZA/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 af-ZA/step_1.md diff --git a/af-ZA/step_1.md b/af-ZA/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/af-ZA/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From d3a5a5ddb126f33b8cd13a1d888cd8b94c6d695e Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:46 +0000 Subject: [PATCH 039/129] New translations meta.yml (Afrikaans) --- af-ZA/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 af-ZA/meta.yml diff --git a/af-ZA/meta.yml b/af-ZA/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/af-ZA/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 6b13cd428fffaa5f14aa9464ed99d4793de2f5a4 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:47 +0000 Subject: [PATCH 040/129] New translations .keep (Afrikaans) --- af-ZA/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 af-ZA/solutions/.keep diff --git a/af-ZA/solutions/.keep b/af-ZA/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/af-ZA/solutions/.keep @@ -0,0 +1 @@ + From 739d9c47f14d7d1fc1d2f7b24eb9872d793d1c84 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:48 +0000 Subject: [PATCH 041/129] New translations code.py (Catalan) --- ca-ES/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 ca-ES/resources/code.py diff --git a/ca-ES/resources/code.py b/ca-ES/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ca-ES/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 20d5ba072eb84475d194d5c8b611e60cb75b55eb Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:50 +0000 Subject: [PATCH 042/129] New translations code.py (Turkish) --- tr-TR/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 tr-TR/resources/code.py diff --git a/tr-TR/resources/code.py b/tr-TR/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/tr-TR/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From da9b96a179db6c95450c4c8f549ab242e6a0c35f Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:51 +0000 Subject: [PATCH 043/129] New translations meta.yml (Ukrainian) --- uk-UA/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 uk-UA/meta.yml diff --git a/uk-UA/meta.yml b/uk-UA/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/uk-UA/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From f645acde98f33feb54c961a03eb027a19cdaaa03 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:52 +0000 Subject: [PATCH 044/129] New translations .keep (Ukrainian) --- uk-UA/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 uk-UA/solutions/.keep diff --git a/uk-UA/solutions/.keep b/uk-UA/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/uk-UA/solutions/.keep @@ -0,0 +1 @@ + From 78066cc69178fd0358683af39c9143e49fe3ad16 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:53 +0000 Subject: [PATCH 045/129] New translations code.py (Ukrainian) --- uk-UA/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 uk-UA/resources/code.py diff --git a/uk-UA/resources/code.py b/uk-UA/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/uk-UA/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From fe17083b8de0a97a99fe950b6d85637aa66772f6 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:55 +0000 Subject: [PATCH 046/129] New translations step_1.md (Turkish) --- tr-TR/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 tr-TR/step_1.md diff --git a/tr-TR/step_1.md b/tr-TR/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/tr-TR/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 867eedc321a6dbcff3f58d0050a1a4c6309458be Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:56 +0000 Subject: [PATCH 047/129] New translations meta.yml (Turkish) --- tr-TR/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 tr-TR/meta.yml diff --git a/tr-TR/meta.yml b/tr-TR/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/tr-TR/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From f5997e79945faf3a1b635091542a3febee71ad7c Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:57 +0000 Subject: [PATCH 048/129] New translations .keep (Turkish) --- tr-TR/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 tr-TR/solutions/.keep diff --git a/tr-TR/solutions/.keep b/tr-TR/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/tr-TR/solutions/.keep @@ -0,0 +1 @@ + From 6199a653f24e2adb8de7a8d239577023633e0fed Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:08:58 +0000 Subject: [PATCH 049/129] New translations step_1.md (Russian) --- ru-RU/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 ru-RU/step_1.md diff --git a/ru-RU/step_1.md b/ru-RU/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ru-RU/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 9a851a6a9ab364e4381a64124f47025c9df1acc7 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:00 +0000 Subject: [PATCH 050/129] New translations step_1.md (Swedish) --- sv-SE/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 sv-SE/step_1.md diff --git a/sv-SE/step_1.md b/sv-SE/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/sv-SE/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 3354620e025c1d7bb08dbbe3fd0ae96ee554917a Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:01 +0000 Subject: [PATCH 051/129] New translations meta.yml (Swedish) --- sv-SE/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 sv-SE/meta.yml diff --git a/sv-SE/meta.yml b/sv-SE/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/sv-SE/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From b61252ce74b1fc684d7d81d03397257fe2526764 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:02 +0000 Subject: [PATCH 052/129] New translations .keep (Swedish) --- sv-SE/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 sv-SE/solutions/.keep diff --git a/sv-SE/solutions/.keep b/sv-SE/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/sv-SE/solutions/.keep @@ -0,0 +1 @@ + From e6ea51070d190ed9586765edea6244a40a4db7ff Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:04 +0000 Subject: [PATCH 053/129] New translations code.py (Swedish) --- sv-SE/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 sv-SE/resources/code.py diff --git a/sv-SE/resources/code.py b/sv-SE/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/sv-SE/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From abd908041e5f5f58fbc52e0bcb23b802679f0463 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:05 +0000 Subject: [PATCH 054/129] New translations step_1.md (Serbian (Cyrillic)) --- sr-SP/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 sr-SP/step_1.md diff --git a/sr-SP/step_1.md b/sr-SP/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/sr-SP/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 6d12ad3abf10c7edbd948032007ebeddde50cffb Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:06 +0000 Subject: [PATCH 055/129] New translations meta.yml (Serbian (Cyrillic)) --- sr-SP/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 sr-SP/meta.yml diff --git a/sr-SP/meta.yml b/sr-SP/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/sr-SP/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From e6d1ca774005cda34d80f05b4714238df3f86720 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:07 +0000 Subject: [PATCH 056/129] New translations .keep (Serbian (Cyrillic)) --- sr-SP/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 sr-SP/solutions/.keep diff --git a/sr-SP/solutions/.keep b/sr-SP/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/sr-SP/solutions/.keep @@ -0,0 +1 @@ + From af8cb864a34c2508e43da0a041c470b2f9972517 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:09 +0000 Subject: [PATCH 057/129] New translations code.py (Serbian (Cyrillic)) --- sr-SP/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 sr-SP/resources/code.py diff --git a/sr-SP/resources/code.py b/sr-SP/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/sr-SP/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 34a700557cfada913a8698388e9e04a19ad2462f Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:10 +0000 Subject: [PATCH 058/129] New translations step_1.md (Ukrainian) --- uk-UA/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 uk-UA/step_1.md diff --git a/uk-UA/step_1.md b/uk-UA/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/uk-UA/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From b73b1d114837f46bd034ec6560887857a73126db Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:11 +0000 Subject: [PATCH 059/129] New translations .keep (English) --- en-US/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 en-US/solutions/.keep diff --git a/en-US/solutions/.keep b/en-US/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/en-US/solutions/.keep @@ -0,0 +1 @@ + From 74e52ae973e0f6e25dfb35d3960a1a2095d3b640 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:13 +0000 Subject: [PATCH 060/129] New translations code.py (Chinese Simplified) --- zh-CN/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 zh-CN/resources/code.py diff --git a/zh-CN/resources/code.py b/zh-CN/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/zh-CN/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From e35580e3f6d145bc18555e387a319424fb67f0f9 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:14 +0000 Subject: [PATCH 061/129] New translations code.py (Vietnamese) --- vi-VN/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 vi-VN/resources/code.py diff --git a/vi-VN/resources/code.py b/vi-VN/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/vi-VN/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From e9d8ce5ea65e745f272b5def868fb813f977edba Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:15 +0000 Subject: [PATCH 062/129] New translations .keep (Russian) --- ru-RU/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 ru-RU/solutions/.keep diff --git a/ru-RU/solutions/.keep b/ru-RU/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ru-RU/solutions/.keep @@ -0,0 +1 @@ + From cfbd7bd92f16fe54a645d7f388871ace17c5f6d7 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:16 +0000 Subject: [PATCH 063/129] New translations meta.yml (Portuguese, Brazilian) --- pt-BR/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 pt-BR/meta.yml diff --git a/pt-BR/meta.yml b/pt-BR/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/pt-BR/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 8b6d4f18da4572b1f3e8fe5e660bd206b71d3588 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:17 +0000 Subject: [PATCH 064/129] New translations .keep (Portuguese, Brazilian) --- pt-BR/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 pt-BR/solutions/.keep diff --git a/pt-BR/solutions/.keep b/pt-BR/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/pt-BR/solutions/.keep @@ -0,0 +1 @@ + From 27ea2ff3d79f754abfc3dc928bdb38b100eb95c6 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:18 +0000 Subject: [PATCH 065/129] New translations code.py (Portuguese, Brazilian) --- pt-BR/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 pt-BR/resources/code.py diff --git a/pt-BR/resources/code.py b/pt-BR/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/pt-BR/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From cdfbe0077b087a4462487086af71cef59d091e7e Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:20 +0000 Subject: [PATCH 066/129] New translations step_1.md (Vietnamese) --- vi-VN/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 vi-VN/step_1.md diff --git a/vi-VN/step_1.md b/vi-VN/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/vi-VN/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 374b2eef1bd5639df271d40615acc34e568ffd86 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:21 +0000 Subject: [PATCH 067/129] New translations meta.yml (Vietnamese) --- vi-VN/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 vi-VN/meta.yml diff --git a/vi-VN/meta.yml b/vi-VN/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/vi-VN/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From eb2a9a7885acbfe8996561deda292e6ae38c1921 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:22 +0000 Subject: [PATCH 068/129] New translations .keep (Vietnamese) --- vi-VN/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 vi-VN/solutions/.keep diff --git a/vi-VN/solutions/.keep b/vi-VN/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/vi-VN/solutions/.keep @@ -0,0 +1 @@ + From c4ce51c77025ce4adc6fe2d47b9102a5d69cddbb Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:23 +0000 Subject: [PATCH 069/129] New translations step_1.md (English) --- en-US/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 en-US/step_1.md diff --git a/en-US/step_1.md b/en-US/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/en-US/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 275c1548ae4e2a4d3199685e5461bf607be94570 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:24 +0000 Subject: [PATCH 070/129] New translations .keep (Chinese Simplified) --- zh-CN/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 zh-CN/solutions/.keep diff --git a/zh-CN/solutions/.keep b/zh-CN/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/zh-CN/solutions/.keep @@ -0,0 +1 @@ + From 00755c1acc0462a2ef8b935167b4aba8f60712a0 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:26 +0000 Subject: [PATCH 071/129] New translations meta.yml (English) --- en-US/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 en-US/meta.yml diff --git a/en-US/meta.yml b/en-US/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/en-US/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 5f62cf6f10b0c180bcc3afa3b7fb7c5345a91194 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:27 +0000 Subject: [PATCH 072/129] New translations code.py (English) --- en-US/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 en-US/resources/code.py diff --git a/en-US/resources/code.py b/en-US/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/en-US/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 5b6599fe1bc256a2cd5048f6639caed5893708ca Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:28 +0000 Subject: [PATCH 073/129] New translations step_1.md (Chinese Traditional) --- zh-TW/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 zh-TW/step_1.md diff --git a/zh-TW/step_1.md b/zh-TW/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/zh-TW/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From fc526ef513c68646dc8c35b06149bc050138f748 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:29 +0000 Subject: [PATCH 074/129] New translations meta.yml (Chinese Traditional) --- zh-TW/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 zh-TW/meta.yml diff --git a/zh-TW/meta.yml b/zh-TW/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/zh-TW/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From d48f4f32783a39886238f40e866e82318362cee4 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:30 +0000 Subject: [PATCH 075/129] New translations .keep (Chinese Traditional) --- zh-TW/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 zh-TW/solutions/.keep diff --git a/zh-TW/solutions/.keep b/zh-TW/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/zh-TW/solutions/.keep @@ -0,0 +1 @@ + From efcd6f82d6270cd261440d2e1fdc7542248751b4 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:31 +0000 Subject: [PATCH 076/129] New translations code.py (Chinese Traditional) --- zh-TW/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 zh-TW/resources/code.py diff --git a/zh-TW/resources/code.py b/zh-TW/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/zh-TW/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 156755d164790c9d357985119bda19a7f5345151 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:33 +0000 Subject: [PATCH 077/129] New translations step_1.md (Chinese Simplified) --- zh-CN/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 zh-CN/step_1.md diff --git a/zh-CN/step_1.md b/zh-CN/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/zh-CN/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 5b7a6c3f190d3735d8da2d2eb911c53881ecc499 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:34 +0000 Subject: [PATCH 078/129] New translations meta.yml (Chinese Simplified) --- zh-CN/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 zh-CN/meta.yml diff --git a/zh-CN/meta.yml b/zh-CN/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/zh-CN/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From d7f02d667313436d58c8c5fd9f723d0516bfd094 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:35 +0000 Subject: [PATCH 079/129] New translations meta.yml (Russian) --- ru-RU/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 ru-RU/meta.yml diff --git a/ru-RU/meta.yml b/ru-RU/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ru-RU/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From b348b8b990b21c59e4cf3865fb549cb8ab3dbbf0 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:36 +0000 Subject: [PATCH 080/129] New translations code.py (Korean) --- ko-KR/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 ko-KR/resources/code.py diff --git a/ko-KR/resources/code.py b/ko-KR/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ko-KR/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 745ac3e393bc94e99e96cde249c45e6abd5961b2 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:37 +0000 Subject: [PATCH 081/129] New translations code.py (Russian) --- ru-RU/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 ru-RU/resources/code.py diff --git a/ru-RU/resources/code.py b/ru-RU/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ru-RU/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 7275acf69814d56c8fc82003231851de8646a9a6 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:38 +0000 Subject: [PATCH 082/129] New translations .keep (Hungarian) --- hu-HU/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 hu-HU/solutions/.keep diff --git a/hu-HU/solutions/.keep b/hu-HU/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/hu-HU/solutions/.keep @@ -0,0 +1 @@ + From 6b7d8f3d414dcac9a69c38972dd643dcf1008b9d Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:40 +0000 Subject: [PATCH 083/129] New translations code.py (Japanese) --- ja-JP/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 ja-JP/resources/code.py diff --git a/ja-JP/resources/code.py b/ja-JP/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ja-JP/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From f6a39f35884bf836710ea47177d37251bfbf9e81 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:43 +0000 Subject: [PATCH 084/129] New translations step_1.md (Italian) --- it-IT/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 it-IT/step_1.md diff --git a/it-IT/step_1.md b/it-IT/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/it-IT/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From b7d252aaab181803514e27d5174abc7ddab0385b Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:44 +0000 Subject: [PATCH 085/129] New translations meta.yml (Italian) --- it-IT/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 it-IT/meta.yml diff --git a/it-IT/meta.yml b/it-IT/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/it-IT/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 8842595cc21f49781d6d18dd10b95db9998af740 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:45 +0000 Subject: [PATCH 086/129] New translations .keep (Italian) --- it-IT/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 it-IT/solutions/.keep diff --git a/it-IT/solutions/.keep b/it-IT/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/it-IT/solutions/.keep @@ -0,0 +1 @@ + From 9b947adc4ae061db2809fed148159f1eed57f5f1 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:46 +0000 Subject: [PATCH 087/129] New translations code.py (Italian) --- it-IT/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 it-IT/resources/code.py diff --git a/it-IT/resources/code.py b/it-IT/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/it-IT/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From b41fc21584c43bcb01b6cedfb101ff863094057c Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:48 +0000 Subject: [PATCH 088/129] New translations step_1.md (Hungarian) --- hu-HU/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 hu-HU/step_1.md diff --git a/hu-HU/step_1.md b/hu-HU/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/hu-HU/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From abcd40fa46495df2d06bfadb08e6beb17ad61391 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:49 +0000 Subject: [PATCH 089/129] New translations meta.yml (Hungarian) --- hu-HU/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 hu-HU/meta.yml diff --git a/hu-HU/meta.yml b/hu-HU/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/hu-HU/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From bf23faa321bc0a1d7915b92392277f5e7d596332 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:50 +0000 Subject: [PATCH 090/129] New translations code.py (Hungarian) --- hu-HU/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 hu-HU/resources/code.py diff --git a/hu-HU/resources/code.py b/hu-HU/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/hu-HU/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 75684815e3fbae481b35737ac029364db4b188b0 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:51 +0000 Subject: [PATCH 091/129] New translations meta.yml (Japanese) --- ja-JP/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 ja-JP/meta.yml diff --git a/ja-JP/meta.yml b/ja-JP/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ja-JP/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 075eb585e142ede122587ab91dd8730ba1ad6eb0 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:53 +0000 Subject: [PATCH 092/129] New translations step_1.md (Hebrew) --- he-IL/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 he-IL/step_1.md diff --git a/he-IL/step_1.md b/he-IL/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/he-IL/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From b7207316d2d7a2a96a985ddf6a47cae6b720956a Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:54 +0000 Subject: [PATCH 093/129] New translations meta.yml (Hebrew) --- he-IL/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 he-IL/meta.yml diff --git a/he-IL/meta.yml b/he-IL/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/he-IL/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From b4177c2365683ba709ffdb0757060d3789192f74 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:55 +0000 Subject: [PATCH 094/129] New translations .keep (Hebrew) --- he-IL/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 he-IL/solutions/.keep diff --git a/he-IL/solutions/.keep b/he-IL/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/he-IL/solutions/.keep @@ -0,0 +1 @@ + From 67638835d4eb6da67ed3d68cee2a9a695c9777cf Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:56 +0000 Subject: [PATCH 095/129] New translations code.py (Hebrew) --- he-IL/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 he-IL/resources/code.py diff --git a/he-IL/resources/code.py b/he-IL/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/he-IL/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From cdeebbe314a07367dbec37a282db166860d7e0c3 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:58 +0000 Subject: [PATCH 096/129] New translations step_1.md (Finnish) --- fi-FI/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 fi-FI/step_1.md diff --git a/fi-FI/step_1.md b/fi-FI/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/fi-FI/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 2f46297eae70d8e2009bf2399729c64df73ca0cf Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:09:59 +0000 Subject: [PATCH 097/129] New translations meta.yml (Finnish) --- fi-FI/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 fi-FI/meta.yml diff --git a/fi-FI/meta.yml b/fi-FI/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/fi-FI/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 946f0b1b72490869f054e89824e8cd50ebf1bacb Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:00 +0000 Subject: [PATCH 098/129] New translations .keep (Finnish) --- fi-FI/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 fi-FI/solutions/.keep diff --git a/fi-FI/solutions/.keep b/fi-FI/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/fi-FI/solutions/.keep @@ -0,0 +1 @@ + From a0b358e4443b0387becc46013d98b5b839256feb Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:02 +0000 Subject: [PATCH 099/129] New translations .keep (Japanese) --- ja-JP/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 ja-JP/solutions/.keep diff --git a/ja-JP/solutions/.keep b/ja-JP/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ja-JP/solutions/.keep @@ -0,0 +1 @@ + From 73fcbd6056c940b4d6d8c4cdd9d4b0cc508edc03 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:03 +0000 Subject: [PATCH 100/129] New translations step_1.md (Japanese) --- ja-JP/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 ja-JP/step_1.md diff --git a/ja-JP/step_1.md b/ja-JP/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ja-JP/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 0077289304669e3ef75f9ac16008eae48ac2e0c7 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:05 +0000 Subject: [PATCH 101/129] New translations step_1.md (Portuguese) --- pt-PT/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 pt-PT/step_1.md diff --git a/pt-PT/step_1.md b/pt-PT/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/pt-PT/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 2fa845bda15b55f8f653cd4c4d6f98e1216780c0 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:06 +0000 Subject: [PATCH 102/129] New translations step_1.md (Norwegian) --- no-NO/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 no-NO/step_1.md diff --git a/no-NO/step_1.md b/no-NO/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/no-NO/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 00e51c9940d313b6724775b4aca3898a6a174fbf Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:07 +0000 Subject: [PATCH 103/129] New translations meta.yml (Portuguese) --- pt-PT/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 pt-PT/meta.yml diff --git a/pt-PT/meta.yml b/pt-PT/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/pt-PT/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From ffbb1ed2a43ff818854ae4ba32efcc6b940ca619 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:09 +0000 Subject: [PATCH 104/129] New translations .keep (Portuguese) --- pt-PT/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 pt-PT/solutions/.keep diff --git a/pt-PT/solutions/.keep b/pt-PT/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/pt-PT/solutions/.keep @@ -0,0 +1 @@ + From f5d7860cfea98d1bd4392df1740e187e895e60d3 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:10 +0000 Subject: [PATCH 105/129] New translations code.py (Portuguese) --- pt-PT/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 pt-PT/resources/code.py diff --git a/pt-PT/resources/code.py b/pt-PT/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/pt-PT/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 1e6764d605d404f79bf9939c0a555e17c1bd4a09 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:11 +0000 Subject: [PATCH 106/129] New translations step_1.md (Polish) --- pl-PL/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 pl-PL/step_1.md diff --git a/pl-PL/step_1.md b/pl-PL/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/pl-PL/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 0b40325d2a08f63645fd9fa5e2f5dd73c1926af2 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:13 +0000 Subject: [PATCH 107/129] New translations meta.yml (Polish) --- pl-PL/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 pl-PL/meta.yml diff --git a/pl-PL/meta.yml b/pl-PL/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/pl-PL/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From 186f7ed01278000e29846b6a0c51f5c3babe802d Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:14 +0000 Subject: [PATCH 108/129] New translations .keep (Polish) --- pl-PL/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 pl-PL/solutions/.keep diff --git a/pl-PL/solutions/.keep b/pl-PL/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/pl-PL/solutions/.keep @@ -0,0 +1 @@ + From 8dfea5dedb821f1fd9e53a189d82dcdfcd4163cd Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:16 +0000 Subject: [PATCH 109/129] New translations code.py (Polish) --- pl-PL/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 pl-PL/resources/code.py diff --git a/pl-PL/resources/code.py b/pl-PL/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/pl-PL/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 2dcb54d72e988349ef175aa21c337f2f2839625e Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:17 +0000 Subject: [PATCH 110/129] New translations meta.yml (Norwegian) --- no-NO/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 no-NO/meta.yml diff --git a/no-NO/meta.yml b/no-NO/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/no-NO/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From df69ba56f59f5f9d99d4d4b57cb29eba75182444 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:18 +0000 Subject: [PATCH 111/129] New translations .keep (Korean) --- ko-KR/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 ko-KR/solutions/.keep diff --git a/ko-KR/solutions/.keep b/ko-KR/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ko-KR/solutions/.keep @@ -0,0 +1 @@ + From 3d7ca430d1590e679f8e8b3c1c60f01781a72382 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:19 +0000 Subject: [PATCH 112/129] New translations .keep (Norwegian) --- no-NO/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 no-NO/solutions/.keep diff --git a/no-NO/solutions/.keep b/no-NO/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/no-NO/solutions/.keep @@ -0,0 +1 @@ + From 0fadee359bc0d3be0e6d7eb72aa7844639be4c16 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:20 +0000 Subject: [PATCH 113/129] New translations code.py (Norwegian) --- no-NO/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 no-NO/resources/code.py diff --git a/no-NO/resources/code.py b/no-NO/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/no-NO/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From af82e7e6bb3463b40b2537b61ed787c94446a7d6 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:22 +0000 Subject: [PATCH 114/129] New translations step_1.md (Dutch) --- nl-NL/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 nl-NL/step_1.md diff --git a/nl-NL/step_1.md b/nl-NL/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/nl-NL/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 166906fabd98a84e1317a20d11de88d438a83615 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:23 +0000 Subject: [PATCH 115/129] New translations meta.yml (Dutch) --- nl-NL/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 nl-NL/meta.yml diff --git a/nl-NL/meta.yml b/nl-NL/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/nl-NL/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From ce75237627e81c1d06b5c35fb1b226be10b28764 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:24 +0000 Subject: [PATCH 116/129] New translations .keep (Dutch) --- nl-NL/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 nl-NL/solutions/.keep diff --git a/nl-NL/solutions/.keep b/nl-NL/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/nl-NL/solutions/.keep @@ -0,0 +1 @@ + From 4bf8c3fe1a4d4a2369cb4ecfae2409ea7344be3b Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:25 +0000 Subject: [PATCH 117/129] New translations code.py (Dutch) --- nl-NL/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 nl-NL/resources/code.py diff --git a/nl-NL/resources/code.py b/nl-NL/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/nl-NL/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 9515c9096a356aa36db01e79ebcb0957c0480363 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:26 +0000 Subject: [PATCH 118/129] New translations step_1.md (Korean) --- ko-KR/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 ko-KR/step_1.md diff --git a/ko-KR/step_1.md b/ko-KR/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ko-KR/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 102d093d9b3ebc7fa57ee96605cdfec2027d3d63 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:28 +0000 Subject: [PATCH 119/129] New translations meta.yml (Korean) --- ko-KR/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 ko-KR/meta.yml diff --git a/ko-KR/meta.yml b/ko-KR/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ko-KR/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text From de2cdca59398aa44160f8d829e8230ef24800c33 Mon Sep 17 00:00:00 2001 From: ninaszymor <33628387+ninaszymor@users.noreply.github.com> Date: Mon, 2 Mar 2020 12:10:29 +0000 Subject: [PATCH 120/129] New translations step_1.md (Portuguese, Brazilian) --- pt-BR/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 pt-BR/step_1.md diff --git a/pt-BR/step_1.md b/pt-BR/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/pt-BR/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + From 03ca9500489029f4cfd7cbba4299f595b8f0b555 Mon Sep 17 00:00:00 2001 From: majamanojlovic <49232422+majamanojlovic@users.noreply.github.com> Date: Sun, 26 Apr 2020 11:15:29 +0100 Subject: [PATCH 121/129] New translations meta.yml (Dutch) --- nl-NL/meta.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/nl-NL/meta.yml b/nl-NL/meta.yml index 2a5a817..27b50bb 100644 --- a/nl-NL/meta.yml +++ b/nl-NL/meta.yml @@ -1,7 +1,7 @@ --- -title: Finding text between patterns with regex and Python +title: Tekst zoeken tussen patronen met regex en Python hero_image: images/banner.png -description: Finding text between patterns with regex +description: Tekst zoeken tussen patronen met regex original_url: https://codeclubprojects.org/en-GB/scratch/rock-band theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow duration: 1 #possible values: 1, 2 or 3 @@ -14,4 +14,4 @@ technologies: "python" site_areas: steps: - - title: Finding text + title: Tekst zoeken From a002f957c3d52ccb3409d8df2bbbb1e69eec2795 Mon Sep 17 00:00:00 2001 From: majamanojlovic <49232422+majamanojlovic@users.noreply.github.com> Date: Sun, 26 Apr 2020 11:15:31 +0100 Subject: [PATCH 122/129] New translations code.py (Dutch) --- nl-NL/resources/code.py | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/nl-NL/resources/code.py b/nl-NL/resources/code.py index d7e72ed..8a4dc50 100644 --- a/nl-NL/resources/code.py +++ b/nl-NL/resources/code.py @@ -1,14 +1,14 @@ import re -text = ''' +tekst = ''' start -Here is a line -end +Hier is een regel +einde start -and here is some more -end''' +en hier is nog wat meer +einde''' -match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) +match = re.findall(r'(?<=start).*?(?=einde)', tekst, flags=re.DOTALL) From 85d388164d5ae09d9afaa70c3ce98e62a2cdb4b9 Mon Sep 17 00:00:00 2001 From: majamanojlovic <49232422+majamanojlovic@users.noreply.github.com> Date: Sun, 26 Apr 2020 11:15:33 +0100 Subject: [PATCH 123/129] New translations step_1.md (Dutch) --- nl-NL/step_1.md | 102 ++++++++++++++++++++++++------------------------ 1 file changed, 51 insertions(+), 51 deletions(-) diff --git a/nl-NL/step_1.md b/nl-NL/step_1.md index d74c167..34c9a90 100644 --- a/nl-NL/step_1.md +++ b/nl-NL/step_1.md @@ -1,111 +1,111 @@ -If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. +Als je tekst tussen specifieke tekens of reeksen tekens wilt vinden, kun je de `re` module van Python en de methode `findall()` gebruiken. -- Let's suppose you have the following string: +- Stel dat je de volgende tekenreeks hebt: ```python - text = 'start Here is a line end' + tekst = 'start Hier is een regel einde' ``` -- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: +- Stel je voor dat je alle tekst tussen `'start'` en `'einde'` wilt vinden. Hier is de regex-zoekopdracht die je hiervoor zou kunnen gebruiken: ```python import re - text = 'start Here is a line end' - matches = re.findall(r'start.*end', text) +text = 'start Hier is een regel einde' +overeenkomsten = re.findall(r'start.*einde', tekst) ``` -- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: +- Als je nu de variabele `overeenkomsten` in de interpreter controleert, zul je zien dat het een lijst is met de overeenkomsten die Python heeft gevonden: ```python - >>> matches - ['start Here is a line end'] + >>> overeenkomsten +['start Hier is een regel einde'] ``` -- What happens if there is more than one match, like in the example below? +- Wat gebeurt er als er meer dan één overeenkomst is, zoals in het onderstaande voorbeeld? ```python import re - text = 'start Here is a line end start and here is some more end' - matches = re.findall(r'start.*end', text) +tekst = 'start Hier is een regel einde start en hier is wat meer einde' +overeenkomsten = re.findall(r'start.*einde', tekst) ``` ```python - >>> match - ['start Here is a line end start and here is some more end'] + >>> overeenkomsten +['start Hier is een regel einde start en hier is wat meer einde'] ``` -- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. +- Dat was niet wat we wilden. Dit komt omdat deze regex wordt beschreven als **hebzuchtig**. Dat betekent dat het de hele reeks doorzoekt voordat de overeenkomsten worden geretourneerd en vervolgens alle tekens tussen de eerste `'start'` en de laatste `'einde'` retourneert. -- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. +- Om de **regex** niet hebzuchtig te maken, moet je een `.*?` gebruiken in plaats van `*`. ```python import re - text = 'start Here is a line end start and here is some more end' - matches = re.findall(r'start.*?end', text) +tekst = 'start Hier is een regel einde start en hier is wat meer einde' +overeenkomsten = re.findall(r'start.*?einde', tekst) ``` ```python - >>> match - ['start Here is a line end', 'start and here is some more end'] + >>> overeenkomsten +['start Hier is een regel einde', 'start en hier is wat meer einde'] ``` -- Now the list has two elements in it. +- Nu bevat de lijst twee elementen. -- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: +- Als je niet wilt dat Python de woorden `start` en `einde` in de resultaten opneemt, moet je de **regex** opdragen **vooruit te kijken** en **achteruit te kijken**. Er zijn twee regex-elementen die dat zullen doen: -- `?<=` means **look ahead**. Use it to search for text **after** the match. +- `?<=` betekent **vooruit kijken**. Gebruik dit om naar tekst **te zoeken na** de overeenkomst. -- `?=` means **look behind**. Use it to search for text **before** the match. +- `? =` betekent **achteruit kijken**. Gebruik het om naar tekst **te zoeken vóór** de overeenkomst. -- For these elements to work, you need to surround them and the pattern you're looking for in brackets: +- Om deze elementen te laten werken, moet je ze en het patroon waarnaar je op zoek bent omringen door haakjes: ```python - match = re.findall(r'(?<=start).*?(?=end)', text) + overeenkomsten = re.findall(r'(?<=start).*?(?=einde)', tekst) ``` ```python - >>> match - [' Here is a line ', ' and here is some more '] + >>> overeenkomsten +['Hier is een regel', 'en hier is wat meer'] ``` -- What happens with strings spread across multiple lines, such as the one below? +- Wat gebeurt er met tekenreeksen verspreid over meerdere lijnen, zoals die hieronder? ```python import re - text = ''' - start - Here is a line - end - start - and here is some more - end''' - - match = re.findall(r'(?<=start).*?(?=end)', text) +text = ''' +start +Hier is een regel +einde +start +en hier is nog wat +einde''' + +overeenkomsten = re.findall(r'(?<= start).*?(?= einde)', tekst) ``` ```python - >>> match - [] + >>> overeenkomsten +[] ``` -- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: +- Dat is niet wat we wilden. Het probleem is dat nieuwe regels (`\n`) het zoeken naar regex stoppen. Het toevoegen van een `vlag` aan de zoekopdracht kan dit echter oplossen: ```python import re - text = ''' - start - Here is a line - end - start - and here is some more - end''' +text = ''' +start +Hier is een regel +einde +start +en hier is nog wat +einde''' - match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) +overeenkomsten = re.findall(r'(?<=start).*?(?=einde)', tekst, flags=re.DOTALL) ``` ```python - >>> match - ['\nHere is a line\n', '\nand here is some more\n'] + >>> overeenkomsten +['\nHier is een regel\n', '\nen hier is nog wat\n'] ``` From a59a846516b9dff76c1ce443c781b236ebba507e Mon Sep 17 00:00:00 2001 From: majamanojlovic <49232422+majamanojlovic@users.noreply.github.com> Date: Tue, 28 Jul 2020 12:04:13 +0100 Subject: [PATCH 124/129] New translations .keep (Spanish, Latin America) --- es-LA/solutions/.keep | 1 + 1 file changed, 1 insertion(+) create mode 100644 es-LA/solutions/.keep diff --git a/es-LA/solutions/.keep b/es-LA/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/es-LA/solutions/.keep @@ -0,0 +1 @@ + From a437277a25c374438efb148f98aa2d7b087735c4 Mon Sep 17 00:00:00 2001 From: majamanojlovic <49232422+majamanojlovic@users.noreply.github.com> Date: Tue, 28 Jul 2020 12:04:16 +0100 Subject: [PATCH 125/129] New translations code.py (Spanish, Latin America) --- es-LA/resources/code.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 es-LA/resources/code.py diff --git a/es-LA/resources/code.py b/es-LA/resources/code.py new file mode 100644 index 0000000..5581f46 --- /dev/null +++ b/es-LA/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Aquí hay una línea +end +start +y aquí hay otra +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + From 4b974f28fe6adcf6085cf63678881be0c69b274a Mon Sep 17 00:00:00 2001 From: majamanojlovic <49232422+majamanojlovic@users.noreply.github.com> Date: Tue, 28 Jul 2020 12:04:18 +0100 Subject: [PATCH 126/129] New translations meta.yml (Spanish, Latin America) --- es-LA/meta.yml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 es-LA/meta.yml diff --git a/es-LA/meta.yml b/es-LA/meta.yml new file mode 100644 index 0000000..6e13860 --- /dev/null +++ b/es-LA/meta.yml @@ -0,0 +1,17 @@ +--- +title: Buscando texto entre patrones con regex y Python +hero_image: images/banner.png +description: Buscando texto entre patrones con regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Encontrar texto From 9542f380a1c7e33cac6ffcccd77d0127ec1c9224 Mon Sep 17 00:00:00 2001 From: majamanojlovic <49232422+majamanojlovic@users.noreply.github.com> Date: Tue, 28 Jul 2020 12:05:13 +0100 Subject: [PATCH 127/129] New translations step_1.md (Spanish, Latin America) --- es-LA/step_1.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 es-LA/step_1.md diff --git a/es-LA/step_1.md b/es-LA/step_1.md new file mode 100644 index 0000000..46ee7de --- /dev/null +++ b/es-LA/step_1.md @@ -0,0 +1,111 @@ +Si deseas encontrar texto ubicado entre caracteres específicos o secuencias de caracteres, puedes usar el módulo de Python `re` y el método `findall() `. + +- Supongamos que tienes la siguiente cadena: + + ```python + text = 'start Aqui hay una linea end' + ``` + +- Imagina que quieres encontrar todo el texto entre `'start'` y `'end'`. Aquí está la búsqueda de regex que puedes usar para hacerlo: + + ```python + import re + text = 'start Aqui hay una linea end' + coincidencias = re.findall(r'start.*end', text) + ``` + +- Si ahora revisas la variable `coincidencias` en el intérprete, verás que es una lista de las coincidencias que Python ha encontrado: + + ```python + >>> matches + ['start Aqui hay un fin de linea'] + ``` + +- ¿Qué pasa si hay más de una coincidencia, como en el ejemplo de abajo? + + ```python + import re + text = 'start Aqui hay una linea end start y aqui hay otra end' + coincidencias = re.findall(r'start.*end', text) + ``` + + ```python + >>> coincidencia + ['start Aqui hay una linea end start y aqui hay otra end'] + ``` + +- Eso no era lo que queríamos. Esto es porque esta expresión regular se describe como **codiciosa**. Esto significa que busca toda la cadena antes de devolver la coincidencia, y luego devuelve todos los caracteres entre el primer `'start'` y el último `'end'`. + +- Para hacer que el **regex** no sea codicioso, debes usar un `.*?` en lugar de `.*`. + + ```python + import re + text = 'start Aqui hay una linea end start y aqui hay otra end' + coincidencias = re.findall(r'start.*?end', text) + ``` + + ```python + >>> coincidencias + ['start Aqui hay una linea end', 'start y aqui hay otra end'] + ``` + +- Ahora la lista contiene dos elementos. + +- Si no quieres que Python incluya las palabras `start` y `end` en los resultados, entonces tienes que decirle al **regex** que **mire hacia adelante** y **mire hacia atras**. Hay dos elementos regex que harán eso: + +- `?<=` significa **mirar hacia adelante**. Úsalo para buscar texto **despues** de la coincidencia. + +- `?=` significa **mirar hacia atras**. Úsalo para buscar texto **antes** de la coincidencia. + +- Para que estos elementos funcionen, necesitas rodearlos y el patrón que estás buscando entre paréntesis: + + ```python + coincidencia = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> coincidencia + [' Aqui hay una linea ', ' y aqui hay otra '] + ``` + +- ¿Qué sucede con las cadenas distribuidas en múltiples líneas, como la de abajo? + + ```python + import re + text = ''' + start + Aqui hay una linea + end + start + y aqui hay otra + end''' + + coincidencia = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> coincidencia + [] + ``` + +- Eso no era lo que queríamos. El problema es que las nuevas líneas (`\n`) detienen la búsqueda de expresiones regulares. Sin embargo, añadir una `bandera` a la búsqueda puede resolverla: + + ```python + import re + + text = ''' + start + Aqui hay una linea + end + start + y aqui hay otra + end''' + + coincidencia = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> coincidencias + ['\nAqui hay una linea\n', '\ny aqui hay otra\n'] + ``` + From e3d5e4d02d15f0d6ea595d4ff64e50349b386c88 Mon Sep 17 00:00:00 2001 From: Sasha Mishcheriakova <135987917+sashamishcheriakova@users.noreply.github.com> Date: Fri, 9 Feb 2024 11:57:22 +0000 Subject: [PATCH 128/129] New translations meta.yml (Spanish, Latin America) --- es-LA/meta.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/es-LA/meta.yml b/es-LA/meta.yml index 6e13860..760abb7 100644 --- a/es-LA/meta.yml +++ b/es-LA/meta.yml @@ -1,7 +1,7 @@ --- -title: Buscando texto entre patrones con regex y Python +title: Buscar texto entre patrones con regex y Python4 hero_image: images/banner.png -description: Buscando texto entre patrones con regex +description: Buscar texto entre patrones con regex original_url: https://codeclubprojects.org/en-GB/scratch/rock-band theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow duration: 1 #possible values: 1, 2 or 3 From d446f930ceac1680036dea6fcc81fe8d885ff67f Mon Sep 17 00:00:00 2001 From: Sasha Mishcheriakova <135987917+sashamishcheriakova@users.noreply.github.com> Date: Fri, 9 Feb 2024 11:58:16 +0000 Subject: [PATCH 129/129] New translations step_1.md (Spanish, Latin America) --- es-LA/step_1.md | 76 ++++++++++++++++++++++++------------------------- 1 file changed, 38 insertions(+), 38 deletions(-) diff --git a/es-LA/step_1.md b/es-LA/step_1.md index 46ee7de..d86a2bc 100644 --- a/es-LA/step_1.md +++ b/es-LA/step_1.md @@ -1,57 +1,57 @@ -Si deseas encontrar texto ubicado entre caracteres específicos o secuencias de caracteres, puedes usar el módulo de Python `re` y el método `findall() `. +Si deseas encontrar texto ubicado entre caracteres específicos o secuencias de caracteres, puedes usar el módulo de Python `re` y el método `findall()`. - Supongamos que tienes la siguiente cadena: ```python - text = 'start Aqui hay una linea end' + texto = 'inicio Aquí hay una línea final' ``` -- Imagina que quieres encontrar todo el texto entre `'start'` y `'end'`. Aquí está la búsqueda de regex que puedes usar para hacerlo: +- Imagina que quieres encontrar todo el texto entre `'inicio'` y `'final'`. Aquí está la búsqueda de regex que puedes usar para hacerlo: ```python import re - text = 'start Aqui hay una linea end' - coincidencias = re.findall(r'start.*end', text) + texto = 'inicio Aquí hay una línea final' + coincidencias = re.findall(r'inicio.*final', texto) ``` - Si ahora revisas la variable `coincidencias` en el intérprete, verás que es una lista de las coincidencias que Python ha encontrado: ```python - >>> matches - ['start Aqui hay un fin de linea'] + >>> coincidencias + ['inicio Aquí hay una línea final'] ``` - ¿Qué pasa si hay más de una coincidencia, como en el ejemplo de abajo? ```python import re - text = 'start Aqui hay una linea end start y aqui hay otra end' - coincidencias = re.findall(r'start.*end', text) + texto = 'inicio Aquí hay una línea final inicio y aquí hay otra final' + coincidencias = re.findall(r'inicio.*final', texto) ``` ```python - >>> coincidencia - ['start Aqui hay una linea end start y aqui hay otra end'] + >>> coincidencias + ['inicio Aquí hay una línea final inicio y aquí hay otra final'] ``` -- Eso no era lo que queríamos. Esto es porque esta expresión regular se describe como **codiciosa**. Esto significa que busca toda la cadena antes de devolver la coincidencia, y luego devuelve todos los caracteres entre el primer `'start'` y el último `'end'`. +- Eso no era lo que queríamos. Esto es porque esta expresión regular se describe como **codiciosa**. Esto significa que busca toda la cadena antes de devolver la coincidencia, y luego devuelve todos los caracteres entre el primer `'inicio'` y el último `'final'`. - Para hacer que el **regex** no sea codicioso, debes usar un `.*?` en lugar de `.*`. ```python import re - text = 'start Aqui hay una linea end start y aqui hay otra end' - coincidencias = re.findall(r'start.*?end', text) + texto = 'inicio Aquí hay una línea final inicio y aquí hay otra final' + coincidencias = re.findall(r'inicio.*?final', texto) ``` ```python >>> coincidencias - ['start Aqui hay una linea end', 'start y aqui hay otra end'] + ['inicio Aquí hay una línea final', 'inicio y aquí hay otra final'] ``` - Ahora la lista contiene dos elementos. -- Si no quieres que Python incluya las palabras `start` y `end` en los resultados, entonces tienes que decirle al **regex** que **mire hacia adelante** y **mire hacia atras**. Hay dos elementos regex que harán eso: +- Si no quieres que Python incluya las palabras `inicio` y `final` en los resultados, entonces tienes que decirle al **regex** que **mire hacia adelante** y **mire hacia atras**. Hay dos elementos regex que harán eso: - `?<=` significa **mirar hacia adelante**. Úsalo para buscar texto **despues** de la coincidencia. @@ -60,31 +60,31 @@ Si deseas encontrar texto ubicado entre caracteres específicos o secuencias de - Para que estos elementos funcionen, necesitas rodearlos y el patrón que estás buscando entre paréntesis: ```python - coincidencia = re.findall(r'(?<=start).*?(?=end)', text) + coincidencias = re.findall(r'(?<=inicio).*?(?=final)', texto) ``` ```python - >>> coincidencia - [' Aqui hay una linea ', ' y aqui hay otra '] + >>> coincidencias + [' Aquí hay una línea ', ' y aquí hay otra '] ``` - ¿Qué sucede con las cadenas distribuidas en múltiples líneas, como la de abajo? ```python import re - text = ''' - start - Aqui hay una linea - end - start - y aqui hay otra - end''' - - coincidencia = re.findall(r'(?<=start).*?(?=end)', text) + texto = ''' + inicio + Aquí hay una línea + final + inicio + y aquí hay otra + final''' + + coincidencias = re.findall(r'(?<=inicio).*?(?=final)', texto) ``` ```python - >>> coincidencia + >>> coincidencias [] ``` @@ -93,19 +93,19 @@ Si deseas encontrar texto ubicado entre caracteres específicos o secuencias de ```python import re - text = ''' - start - Aqui hay una linea - end - start - y aqui hay otra - end''' + texto = ''' + inicio + Aquí hay una línea + final + inicio + y aquí hay otra + final''' - coincidencia = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + coincidencias = re.findall(r'(?<=inicio).*?(?=final)', texto, flags=re.DOTALL) ``` ```python >>> coincidencias - ['\nAqui hay una linea\n', '\ny aqui hay otra\n'] + ['\nAquí hay una línea\n', '\ny aquí hay otra\n'] ```