diff --git a/af-ZA/meta.yml b/af-ZA/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/af-ZA/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/af-ZA/resources/code.py b/af-ZA/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/af-ZA/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/af-ZA/solutions/.keep b/af-ZA/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/af-ZA/solutions/.keep @@ -0,0 +1 @@ + diff --git a/af-ZA/step_1.md b/af-ZA/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/af-ZA/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/ar-SA/meta.yml b/ar-SA/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ar-SA/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/ar-SA/resources/code.py b/ar-SA/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ar-SA/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/ar-SA/solutions/.keep b/ar-SA/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ar-SA/solutions/.keep @@ -0,0 +1 @@ + diff --git a/ar-SA/step_1.md b/ar-SA/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ar-SA/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/ca-ES/meta.yml b/ca-ES/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ca-ES/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/ca-ES/resources/code.py b/ca-ES/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ca-ES/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/ca-ES/solutions/.keep b/ca-ES/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ca-ES/solutions/.keep @@ -0,0 +1 @@ + diff --git a/ca-ES/step_1.md b/ca-ES/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ca-ES/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/cs-CZ/meta.yml b/cs-CZ/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/cs-CZ/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/cs-CZ/resources/code.py b/cs-CZ/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/cs-CZ/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/cs-CZ/solutions/.keep b/cs-CZ/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/cs-CZ/solutions/.keep @@ -0,0 +1 @@ + diff --git a/cs-CZ/step_1.md b/cs-CZ/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/cs-CZ/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/da-DK/meta.yml b/da-DK/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/da-DK/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/da-DK/resources/code.py b/da-DK/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/da-DK/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/da-DK/solutions/.keep b/da-DK/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/da-DK/solutions/.keep @@ -0,0 +1 @@ + diff --git a/da-DK/step_1.md b/da-DK/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/da-DK/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/de-DE/meta.yml b/de-DE/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/de-DE/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/de-DE/resources/code.py b/de-DE/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/de-DE/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/de-DE/solutions/.keep b/de-DE/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/de-DE/solutions/.keep @@ -0,0 +1 @@ + diff --git a/de-DE/step_1.md b/de-DE/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/de-DE/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/el-GR/meta.yml b/el-GR/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/el-GR/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/el-GR/resources/code.py b/el-GR/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/el-GR/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/el-GR/solutions/.keep b/el-GR/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/el-GR/solutions/.keep @@ -0,0 +1 @@ + diff --git a/el-GR/step_1.md b/el-GR/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/el-GR/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/en-US/meta.yml b/en-US/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/en-US/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/en-US/resources/code.py b/en-US/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/en-US/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/en-US/solutions/.keep b/en-US/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/en-US/solutions/.keep @@ -0,0 +1 @@ + diff --git a/en-US/step_1.md b/en-US/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/en-US/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/es-ES/meta.yml b/es-ES/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/es-ES/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/es-ES/resources/code.py b/es-ES/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/es-ES/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/es-ES/solutions/.keep b/es-ES/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/es-ES/solutions/.keep @@ -0,0 +1 @@ + diff --git a/es-ES/step_1.md b/es-ES/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/es-ES/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/es-LA/meta.yml b/es-LA/meta.yml new file mode 100644 index 0000000..760abb7 --- /dev/null +++ b/es-LA/meta.yml @@ -0,0 +1,17 @@ +--- +title: Buscar texto entre patrones con regex y Python4 +hero_image: images/banner.png +description: Buscar texto entre patrones con regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Encontrar texto diff --git a/es-LA/resources/code.py b/es-LA/resources/code.py new file mode 100644 index 0000000..5581f46 --- /dev/null +++ b/es-LA/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Aquí hay una línea +end +start +y aquí hay otra +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/es-LA/solutions/.keep b/es-LA/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/es-LA/solutions/.keep @@ -0,0 +1 @@ + diff --git a/es-LA/step_1.md b/es-LA/step_1.md new file mode 100644 index 0000000..d86a2bc --- /dev/null +++ b/es-LA/step_1.md @@ -0,0 +1,111 @@ +Si deseas encontrar texto ubicado entre caracteres específicos o secuencias de caracteres, puedes usar el módulo de Python `re` y el método `findall()`. + +- Supongamos que tienes la siguiente cadena: + + ```python + texto = 'inicio Aquí hay una línea final' + ``` + +- Imagina que quieres encontrar todo el texto entre `'inicio'` y `'final'`. Aquí está la búsqueda de regex que puedes usar para hacerlo: + + ```python + import re + texto = 'inicio Aquí hay una línea final' + coincidencias = re.findall(r'inicio.*final', texto) + ``` + +- Si ahora revisas la variable `coincidencias` en el intérprete, verás que es una lista de las coincidencias que Python ha encontrado: + + ```python + >>> coincidencias + ['inicio Aquí hay una línea final'] + ``` + +- ¿Qué pasa si hay más de una coincidencia, como en el ejemplo de abajo? + + ```python + import re + texto = 'inicio Aquí hay una línea final inicio y aquí hay otra final' + coincidencias = re.findall(r'inicio.*final', texto) + ``` + + ```python + >>> coincidencias + ['inicio Aquí hay una línea final inicio y aquí hay otra final'] + ``` + +- Eso no era lo que queríamos. Esto es porque esta expresión regular se describe como **codiciosa**. Esto significa que busca toda la cadena antes de devolver la coincidencia, y luego devuelve todos los caracteres entre el primer `'inicio'` y el último `'final'`. + +- Para hacer que el **regex** no sea codicioso, debes usar un `.*?` en lugar de `.*`. + + ```python + import re + texto = 'inicio Aquí hay una línea final inicio y aquí hay otra final' + coincidencias = re.findall(r'inicio.*?final', texto) + ``` + + ```python + >>> coincidencias + ['inicio Aquí hay una línea final', 'inicio y aquí hay otra final'] + ``` + +- Ahora la lista contiene dos elementos. + +- Si no quieres que Python incluya las palabras `inicio` y `final` en los resultados, entonces tienes que decirle al **regex** que **mire hacia adelante** y **mire hacia atras**. Hay dos elementos regex que harán eso: + +- `?<=` significa **mirar hacia adelante**. Úsalo para buscar texto **despues** de la coincidencia. + +- `?=` significa **mirar hacia atras**. Úsalo para buscar texto **antes** de la coincidencia. + +- Para que estos elementos funcionen, necesitas rodearlos y el patrón que estás buscando entre paréntesis: + + ```python + coincidencias = re.findall(r'(?<=inicio).*?(?=final)', texto) + ``` + + ```python + >>> coincidencias + [' Aquí hay una línea ', ' y aquí hay otra '] + ``` + +- ¿Qué sucede con las cadenas distribuidas en múltiples líneas, como la de abajo? + + ```python + import re + texto = ''' + inicio + Aquí hay una línea + final + inicio + y aquí hay otra + final''' + + coincidencias = re.findall(r'(?<=inicio).*?(?=final)', texto) + ``` + + ```python + >>> coincidencias + [] + ``` + +- Eso no era lo que queríamos. El problema es que las nuevas líneas (`\n`) detienen la búsqueda de expresiones regulares. Sin embargo, añadir una `bandera` a la búsqueda puede resolverla: + + ```python + import re + + texto = ''' + inicio + Aquí hay una línea + final + inicio + y aquí hay otra + final''' + + coincidencias = re.findall(r'(?<=inicio).*?(?=final)', texto, flags=re.DOTALL) + ``` + + ```python + >>> coincidencias + ['\nAquí hay una línea\n', '\ny aquí hay otra\n'] + ``` + diff --git a/fi-FI/meta.yml b/fi-FI/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/fi-FI/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/fi-FI/resources/code.py b/fi-FI/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/fi-FI/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/fi-FI/solutions/.keep b/fi-FI/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/fi-FI/solutions/.keep @@ -0,0 +1 @@ + diff --git a/fi-FI/step_1.md b/fi-FI/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/fi-FI/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/fr-FR/meta.yml b/fr-FR/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/fr-FR/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/fr-FR/resources/code.py b/fr-FR/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/fr-FR/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/fr-FR/solutions/.keep b/fr-FR/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/fr-FR/solutions/.keep @@ -0,0 +1 @@ + diff --git a/fr-FR/step_1.md b/fr-FR/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/fr-FR/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/he-IL/meta.yml b/he-IL/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/he-IL/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/he-IL/resources/code.py b/he-IL/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/he-IL/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/he-IL/solutions/.keep b/he-IL/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/he-IL/solutions/.keep @@ -0,0 +1 @@ + diff --git a/he-IL/step_1.md b/he-IL/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/he-IL/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/hu-HU/meta.yml b/hu-HU/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/hu-HU/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/hu-HU/resources/code.py b/hu-HU/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/hu-HU/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/hu-HU/solutions/.keep b/hu-HU/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/hu-HU/solutions/.keep @@ -0,0 +1 @@ + diff --git a/hu-HU/step_1.md b/hu-HU/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/hu-HU/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/it-IT/meta.yml b/it-IT/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/it-IT/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/it-IT/resources/code.py b/it-IT/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/it-IT/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/it-IT/solutions/.keep b/it-IT/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/it-IT/solutions/.keep @@ -0,0 +1 @@ + diff --git a/it-IT/step_1.md b/it-IT/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/it-IT/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/ja-JP/meta.yml b/ja-JP/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ja-JP/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/ja-JP/resources/code.py b/ja-JP/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ja-JP/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/ja-JP/solutions/.keep b/ja-JP/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ja-JP/solutions/.keep @@ -0,0 +1 @@ + diff --git a/ja-JP/step_1.md b/ja-JP/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ja-JP/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/ko-KR/meta.yml b/ko-KR/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ko-KR/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/ko-KR/resources/code.py b/ko-KR/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ko-KR/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/ko-KR/solutions/.keep b/ko-KR/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ko-KR/solutions/.keep @@ -0,0 +1 @@ + diff --git a/ko-KR/step_1.md b/ko-KR/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ko-KR/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/nl-NL/meta.yml b/nl-NL/meta.yml new file mode 100644 index 0000000..27b50bb --- /dev/null +++ b/nl-NL/meta.yml @@ -0,0 +1,17 @@ +--- +title: Tekst zoeken tussen patronen met regex en Python +hero_image: images/banner.png +description: Tekst zoeken tussen patronen met regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Tekst zoeken diff --git a/nl-NL/resources/code.py b/nl-NL/resources/code.py new file mode 100644 index 0000000..8a4dc50 --- /dev/null +++ b/nl-NL/resources/code.py @@ -0,0 +1,14 @@ +import re + +tekst = ''' +start +Hier is een regel +einde +start +en hier is nog wat meer +einde''' + +match = re.findall(r'(?<=start).*?(?=einde)', tekst, flags=re.DOTALL) + + + diff --git a/nl-NL/solutions/.keep b/nl-NL/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/nl-NL/solutions/.keep @@ -0,0 +1 @@ + diff --git a/nl-NL/step_1.md b/nl-NL/step_1.md new file mode 100644 index 0000000..34c9a90 --- /dev/null +++ b/nl-NL/step_1.md @@ -0,0 +1,111 @@ +Als je tekst tussen specifieke tekens of reeksen tekens wilt vinden, kun je de `re` module van Python en de methode `findall()` gebruiken. + +- Stel dat je de volgende tekenreeks hebt: + + ```python + tekst = 'start Hier is een regel einde' + ``` + +- Stel je voor dat je alle tekst tussen `'start'` en `'einde'` wilt vinden. Hier is de regex-zoekopdracht die je hiervoor zou kunnen gebruiken: + + ```python + import re +text = 'start Hier is een regel einde' +overeenkomsten = re.findall(r'start.*einde', tekst) + ``` + +- Als je nu de variabele `overeenkomsten` in de interpreter controleert, zul je zien dat het een lijst is met de overeenkomsten die Python heeft gevonden: + + ```python + >>> overeenkomsten +['start Hier is een regel einde'] + ``` + +- Wat gebeurt er als er meer dan één overeenkomst is, zoals in het onderstaande voorbeeld? + + ```python + import re +tekst = 'start Hier is een regel einde start en hier is wat meer einde' +overeenkomsten = re.findall(r'start.*einde', tekst) + ``` + + ```python + >>> overeenkomsten +['start Hier is een regel einde start en hier is wat meer einde'] + ``` + +- Dat was niet wat we wilden. Dit komt omdat deze regex wordt beschreven als **hebzuchtig**. Dat betekent dat het de hele reeks doorzoekt voordat de overeenkomsten worden geretourneerd en vervolgens alle tekens tussen de eerste `'start'` en de laatste `'einde'` retourneert. + +- Om de **regex** niet hebzuchtig te maken, moet je een `.*?` gebruiken in plaats van `*`. + + ```python + import re +tekst = 'start Hier is een regel einde start en hier is wat meer einde' +overeenkomsten = re.findall(r'start.*?einde', tekst) + ``` + + ```python + >>> overeenkomsten +['start Hier is een regel einde', 'start en hier is wat meer einde'] + ``` + +- Nu bevat de lijst twee elementen. + +- Als je niet wilt dat Python de woorden `start` en `einde` in de resultaten opneemt, moet je de **regex** opdragen **vooruit te kijken** en **achteruit te kijken**. Er zijn twee regex-elementen die dat zullen doen: + +- `?<=` betekent **vooruit kijken**. Gebruik dit om naar tekst **te zoeken na** de overeenkomst. + +- `? =` betekent **achteruit kijken**. Gebruik het om naar tekst **te zoeken vóór** de overeenkomst. + +- Om deze elementen te laten werken, moet je ze en het patroon waarnaar je op zoek bent omringen door haakjes: + + ```python + overeenkomsten = re.findall(r'(?<=start).*?(?=einde)', tekst) + ``` + + ```python + >>> overeenkomsten +['Hier is een regel', 'en hier is wat meer'] + ``` + +- Wat gebeurt er met tekenreeksen verspreid over meerdere lijnen, zoals die hieronder? + + ```python + import re +text = ''' +start +Hier is een regel +einde +start +en hier is nog wat +einde''' + +overeenkomsten = re.findall(r'(?<= start).*?(?= einde)', tekst) + ``` + + ```python + >>> overeenkomsten +[] + ``` + +- Dat is niet wat we wilden. Het probleem is dat nieuwe regels (`\n`) het zoeken naar regex stoppen. Het toevoegen van een `vlag` aan de zoekopdracht kan dit echter oplossen: + + ```python + import re + +text = ''' +start +Hier is een regel +einde +start +en hier is nog wat +einde''' + +overeenkomsten = re.findall(r'(?<=start).*?(?=einde)', tekst, flags=re.DOTALL) + ``` + + ```python + >>> overeenkomsten +['\nHier is een regel\n', '\nen hier is nog wat\n'] + ``` + diff --git a/no-NO/meta.yml b/no-NO/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/no-NO/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/no-NO/resources/code.py b/no-NO/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/no-NO/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/no-NO/solutions/.keep b/no-NO/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/no-NO/solutions/.keep @@ -0,0 +1 @@ + diff --git a/no-NO/step_1.md b/no-NO/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/no-NO/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/pl-PL/meta.yml b/pl-PL/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/pl-PL/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/pl-PL/resources/code.py b/pl-PL/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/pl-PL/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/pl-PL/solutions/.keep b/pl-PL/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/pl-PL/solutions/.keep @@ -0,0 +1 @@ + diff --git a/pl-PL/step_1.md b/pl-PL/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/pl-PL/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/pt-BR/meta.yml b/pt-BR/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/pt-BR/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/pt-BR/resources/code.py b/pt-BR/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/pt-BR/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/pt-BR/solutions/.keep b/pt-BR/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/pt-BR/solutions/.keep @@ -0,0 +1 @@ + diff --git a/pt-BR/step_1.md b/pt-BR/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/pt-BR/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/pt-PT/meta.yml b/pt-PT/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/pt-PT/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/pt-PT/resources/code.py b/pt-PT/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/pt-PT/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/pt-PT/solutions/.keep b/pt-PT/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/pt-PT/solutions/.keep @@ -0,0 +1 @@ + diff --git a/pt-PT/step_1.md b/pt-PT/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/pt-PT/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/ro-RO/meta.yml b/ro-RO/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ro-RO/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/ro-RO/resources/code.py b/ro-RO/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ro-RO/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/ro-RO/solutions/.keep b/ro-RO/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ro-RO/solutions/.keep @@ -0,0 +1 @@ + diff --git a/ro-RO/step_1.md b/ro-RO/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ro-RO/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/ru-RU/meta.yml b/ru-RU/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/ru-RU/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/ru-RU/resources/code.py b/ru-RU/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/ru-RU/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/ru-RU/solutions/.keep b/ru-RU/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/ru-RU/solutions/.keep @@ -0,0 +1 @@ + diff --git a/ru-RU/step_1.md b/ru-RU/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/ru-RU/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/sr-SP/meta.yml b/sr-SP/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/sr-SP/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/sr-SP/resources/code.py b/sr-SP/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/sr-SP/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/sr-SP/solutions/.keep b/sr-SP/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/sr-SP/solutions/.keep @@ -0,0 +1 @@ + diff --git a/sr-SP/step_1.md b/sr-SP/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/sr-SP/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/sv-SE/meta.yml b/sv-SE/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/sv-SE/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/sv-SE/resources/code.py b/sv-SE/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/sv-SE/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/sv-SE/solutions/.keep b/sv-SE/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/sv-SE/solutions/.keep @@ -0,0 +1 @@ + diff --git a/sv-SE/step_1.md b/sv-SE/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/sv-SE/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/tr-TR/meta.yml b/tr-TR/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/tr-TR/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/tr-TR/resources/code.py b/tr-TR/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/tr-TR/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/tr-TR/solutions/.keep b/tr-TR/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/tr-TR/solutions/.keep @@ -0,0 +1 @@ + diff --git a/tr-TR/step_1.md b/tr-TR/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/tr-TR/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/uk-UA/meta.yml b/uk-UA/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/uk-UA/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/uk-UA/resources/code.py b/uk-UA/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/uk-UA/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/uk-UA/solutions/.keep b/uk-UA/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/uk-UA/solutions/.keep @@ -0,0 +1 @@ + diff --git a/uk-UA/step_1.md b/uk-UA/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/uk-UA/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/vi-VN/meta.yml b/vi-VN/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/vi-VN/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/vi-VN/resources/code.py b/vi-VN/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/vi-VN/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/vi-VN/solutions/.keep b/vi-VN/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/vi-VN/solutions/.keep @@ -0,0 +1 @@ + diff --git a/vi-VN/step_1.md b/vi-VN/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/vi-VN/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/zh-CN/meta.yml b/zh-CN/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/zh-CN/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/zh-CN/resources/code.py b/zh-CN/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/zh-CN/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/zh-CN/solutions/.keep b/zh-CN/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/zh-CN/solutions/.keep @@ -0,0 +1 @@ + diff --git a/zh-CN/step_1.md b/zh-CN/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/zh-CN/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` + diff --git a/zh-TW/meta.yml b/zh-TW/meta.yml new file mode 100644 index 0000000..2a5a817 --- /dev/null +++ b/zh-TW/meta.yml @@ -0,0 +1,17 @@ +--- +title: Finding text between patterns with regex and Python +hero_image: images/banner.png +description: Finding text between patterns with regex +original_url: https://codeclubprojects.org/en-GB/scratch/rock-band +theme: red #possible values: blue, green, navy, orange, red, turquoise, violet, yellow +duration: 1 #possible values: 1, 2 or 3 +listed: false +ingredient: true +copyedit: true +curriculum: +interests: +technologies: "python" +site_areas: +steps: + - + title: Finding text diff --git a/zh-TW/resources/code.py b/zh-TW/resources/code.py new file mode 100644 index 0000000..d7e72ed --- /dev/null +++ b/zh-TW/resources/code.py @@ -0,0 +1,14 @@ +import re + +text = ''' +start +Here is a line +end +start +and here is some more +end''' + +match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + + + diff --git a/zh-TW/solutions/.keep b/zh-TW/solutions/.keep new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/zh-TW/solutions/.keep @@ -0,0 +1 @@ + diff --git a/zh-TW/step_1.md b/zh-TW/step_1.md new file mode 100644 index 0000000..d74c167 --- /dev/null +++ b/zh-TW/step_1.md @@ -0,0 +1,111 @@ +If you want to find text located between specific characters or sequences of characters, you can use Python's `re` module and the `findall()` method. + +- Let's suppose you have the following string: + + ```python + text = 'start Here is a line end' + ``` + +- Imagine you want to find all the text between `'start'` and `'end'`. Here's the regex search you might use to do so: + + ```python + import re + text = 'start Here is a line end' + matches = re.findall(r'start.*end', text) + ``` + +- If you now check the `matches` variable in the interpreter, you will see that it is a list of the matches Python has found: + + ```python + >>> matches + ['start Here is a line end'] + ``` + +- What happens if there is more than one match, like in the example below? + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*end', text) + ``` + + ```python + >>> match + ['start Here is a line end start and here is some more end'] + ``` + +- That wasn't what we wanted. This is because this regex is described as **greedy**. That means it searches the entire string before returning the match, and then returns all characters between the first `'start'` and the last `'end'`. + +- To make the **regex** non-greedy, you need to use a `.*?` rather than `.*`. + + ```python + import re + text = 'start Here is a line end start and here is some more end' + matches = re.findall(r'start.*?end', text) + ``` + + ```python + >>> match + ['start Here is a line end', 'start and here is some more end'] + ``` + +- Now the list has two elements in it. + +- If you don't want Python to include the `start` and `end` words in the results, then you need to tell the **regex** to **look ahead** and **look behind**. There are two regex elements which will do that: + +- `?<=` means **look ahead**. Use it to search for text **after** the match. + +- `?=` means **look behind**. Use it to search for text **before** the match. + +- For these elements to work, you need to surround them and the pattern you're looking for in brackets: + + ```python + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [' Here is a line ', ' and here is some more '] + ``` + +- What happens with strings spread across multiple lines, such as the one below? + + ```python + import re + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text) + ``` + + ```python + >>> match + [] + ``` + +- That's not what we wanted. The problem is that newlines (`\n`) stop the regex search. Adding a `flag` to the search can sort this out though: + + ```python + import re + + text = ''' + start + Here is a line + end + start + and here is some more + end''' + + match = re.findall(r'(?<=start).*?(?=end)', text, flags=re.DOTALL) + ``` + + ```python + >>> match + ['\nHere is a line\n', '\nand here is some more\n'] + ``` +