Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name below.

In [1]:
NAME = ""

# Assignment 4 (Regular Expressions)
### Learning Objectives
For this assignment, the main learning objectives would be:
1. __Get Comfortable with Regular Expressions:__ Learn how to create and apply regular expression patterns to find, match, split, and replace text in documents.
2. __Master Text Processing Techniques:__ Practice using `re.findall()`, `re.sub()`, and `re.split()` to effectively extract, modify, and organize text.
3. __Apply Your Skills:__ Use your knowledge in regular expressions to solve real-world problems--in this case, organizing and managing structured data like recipes!

# Cooking Activity
### Reduce Cooking Time for Time Efficiency!

In this assignment, you will be doing a cooking activity to apply regular expressions. Daphne, a mother of one daughter named Lucy, who is happily married to her husband, Cole--are planning to have a Sunday family cooking session and she wants to be fully prepared for it! Because Daphne is on a time crunch to make the dishes, you will be using regular expressions to effectively extracting, modifying, and organizing in the recipe text!

Note: Regular Expressions will also be called "RE" or "Regex" in this activity. Additionaly, in the definitions below, `regex_pattern` is a variabe refering to the regex pattern that is being implemented and `recipe_text` is a variable refering to text trying to find the pattern.

| RE Methods | Meaning |
| --- | --- |
| `re.finditer()` | Find all substrings where regular expression matches, and returns them as a list. <br> Example: __finditer_method = re.match(regex_pattern, recipe_text)__ </br>|
| `re.split()` | Returns a list where the string has been split at each match. <br> Example: `split_method = re.split = (regex_pattern, recipe_text)`</br>|
| `re.sub()` | Replaces one or more matches with a string. <br> Example: __sub_method = re.sub(regex_pattern,replacement, recipe_text)__. In this case, replacement is the phrase that will replace the pattern within `recipe_text`. </br> |
| `re.findall()` | Matches one or more of the previous expression. <br> Example __findall_method = regex_pattern.findall(recipe_text)"__ </br>|
| `re.IGNORECASE` | Does case-insensitive matches. <br> Example: __regex_pattern = re.compile(r'(hour)+[s]', re.IGNORECASE")__ </br>|

| Python Method | Meaning |
| --- | --- |
| `strip()` | Remove any spaces at the beginning and end of a string. <br>Example: __strip_method = recipe_text.strip()__  </br> |
| `str()` | Converts an object into its string representation. <br> Example: __print(str(item))__ </br> |

| RE Character | Meaning |
| --- | --- |
| `*` | Matches zero, one or more of the previous expression. <br> Example: __regex_pattern = re.compile(r'[0-9]*')__--which matches phrase(s) that contains digits from 0 to 9. </br>|
| `?` | Matches zero or one of the previous expression. <br> Example: __regex_pattern = re.compile(r'colou?r')__--which matches the string "color" or "colour" because 'u' is optional. </br> |
| `+` | Matches one or more of the previous expression. <br> Example: __regex_pattern = re.compile(r'\d+')__--which matches with one or more digits. </br> |
| `\` | Used to escape a special character. <br> Example: __regex_pattern = re.compile(r'\.')__--which matches a literal period dot '.' </br>|
| `.` | Matches any character. </br> Example: __regex_pattern = re.compile(r'a.b')__--which matches any string where 'a' is followed by a single character and then 'b'. </br> |
| `()` | Creates capture groups to extract adn refer to specific parts of a matched string. <br> Example: __regex_pattern = re.compile(r'(\d{3}')__--which matches to any phrase with 3 consecutive digits. </br> |
| `(?:)` | Non-capturing group that are group parts of a pattern togetehr but do not create a capturing group that you can refer to later. This is useful when you want to apply quantifiers to part of a pattern but don't need to capture the matched text for later. <br> Example: __regex_pattern = re.compile(r'\d+(?:-\d+)*')__--which is a non-capturing group that matches a hyphen followed by one or more digits, zero or one more times. </br>  |
| `\|` | Serves as an "or"/"and" statement that will match the pattern either on the right or left of the bar. <br> Example: `regex_pattern = re.compile(r'apple|orange')`--which matches either "apple" or "orange". </br>  |
| `[]` | Matches a range of characters. <br> Example: __regex_pattern = re.compile(r'[0-9]')__--which matches to any single digit. </br>  |
| `{}` | Matches a specific number of occurrences <br> Example: __regex_pattern = re.compile(r'a{3}')__--which matches exactly three consecutive 'a' characters. |
| `^` | Matches the start of a string. <br> Example: __regex_pattern = re.compile(r'^Hello')__--which matches 'Hello' only if it appears at the start of the string. |
| `$` | Matches the end of a string. <br> Example: __regex_pattern = re.compile(r'world!$')__--which matches 'world!' only if it appears at the end of the string.|
| `\d` | Matches a digit [0-9]. <br> Example: __regex_pattern = re.compile(r'\d')__--which matches to any single digit. |
| `\D` | Matches a non-digit <br> Example: __regex_pattern = re.compile(r'\D')__--which matches any character that is not a digit [0-9] |
| `\w` | Matches an alphanumeric character (letters [a-zA-Z]) and digits [0-9, and underscores.]<br> Example: __regex_pattern = re.compile(r'\w')--which matches all alphanumeric characters and underscores in the text, ignoring spaces and puncutation. |
| `\W` | Matches an non-alphanumeric character <br> Example: __regex_pattern = re.compile(r'\W')__--which matches to punctuation and spaces in the text.  |
| `\s` | Matches a whitespace character. <br> Example: __regex_pattern = re.compile(r'\s')--which matches all whitespace characters in the text, including a |
| `\S` | Matches a non-whitespace character <br> Example: __regex_pattern = re.compile(r'\S')__--which matches all characters in the text that are not whitespace characters (including letters, punctuation, and digits). |




Also, for all the regular expressions you define in the assignment, make sure to define them using [raw string notation](https://docs.python.org/3/library/re.html#raw-string-notation). This will make it easier for you to understand and debug your expressions, as discussed in the readings this week.

RE Definitions Source Reference: https://www.python-engineer.com/posts/regular-expressions/

In [None]:
import re
from feedback import *

In [None]:
recipe_text = r'''
Recipe 1: 194_Cabbage_Kielbara_Supper
1) In a 5-qt. slow cooker, combine the cabbage, potatoes, onion, salt and pepper.
2) Pour broth over all for 2 HOURS.
3) Place sausage on top (slow cooker will be full, but cabbage will cook down).
4) Cover and cook on low for 4-5 hours or until vegetables are tender and sausage is heated through.
Recipe 2: 195_Chocolate_Chip_Cookie_Ice_Cream_Cake
1) Crush half the cookies (about 20 cookies) to make crumbs.
2) Combine crumbs with melted margarine and press into the bottom of a 9-inch springform pan or pie plate.
3) Stand remaining cookies around edge of pan.
4) Spread 3/4 cup fudge topping over crust.
5) Freeze 1.5 hours.
6) Meanwhile, soften 1 quart of ice cream in microwave or on countertop.
7) After crust has chilled, spread softened ice cream over fudge layer.
8) Freeze 30 minutes.
9) Scoop remaining quart of ice cream into balls and arrange over spread ice cream layer.
10) Freeze until firm, 3.5-4.5 Hours.
11) To serve, garnish with remainder of fudge topping, whipped cream and cherries.'''

# PART 1: Searching Within Recipe Text

In this section, you will utilize a few more examples of regular expressions. Note that you want the regular expression to an __*exact match*__ to the word(s) you are looking for. Thereby, when making regular expressions, make sure to be specific in the pattern you create to make sure it does not capture words you are not looking for.

So, in this section, you will be using regular expressions to find the exact timeframes that are above an hour--which will later be replaced with a shorter timeframe. Based on this task, you will want to make regular expression patterns to find the following phrases.

Here is a __quick summary of coding sections for Part 1.__
- __Recipe 1: Regex Pattern for *Only* Integer Hours:__ create a regex pattern that can match to all integer hours--whether it is in the form of a single integer hour or a range of integer hours.  
- __Recipe 2: Regex Pattern for *Only* Decimal Hours:__ create regex pattern(s) that can match to all decimal hours--whether it is in the form of a single decimal hour or a range of decimal hours.

## Task 1: Regex Pattern for Integer Hours
For Task 1, you want to create a regex pattern that can match to all integer hours within the recipe text--whether it is in the form of a single integer hour (ex: "2 HOURS") or a range of integer hours (ex: "4-5 hours"). 

Things to Note: 
1. Create a variable called `integer_hour_pattern` and use the `re.compile()` method to store your regex pattern. Ensure sure that the regex pattern is written as a raw string by prefixing it with `r`.
    - Make this that regular expression is __*case-insensitive*__--where the regex pattern matches to non-capitalized or capitalized words such as "hours" or "HOURS".
    - Additionally, make sure the regular expression pattern should __*only extract to integer hours*__ and should not match '5 hours' if is part of a range like '4.5-5.5 hours'.
4. Create a variable called `matches`. Within this variable use the correct regex method to find all the matches.
5. Create a for-loop that goes through every `item` in `matches`. Then, the `item` is printed out.

Based on this, your regex pattern should output EXACTLY to the phrases below (where there should not be any more or any less ouputs):
* "2 HOURS"
* "4-5 hours"

In [None]:
# YOUR CODE HERE
raise NotImplementedError

In [None]:
assert integer_hour_pattern.findall(recipe_text) == ['2 HOURS'] or [' 2 HOURS'], "Assertion failed for '2 HOURS'."
assert integer_hour_pattern.findall(recipe_text) == ['4-5 hours'] or [' 4-5 hours'], "Assertion failed for '4-5 hours'."
assert '5 hours' not in matches, "Assertion failed because 5 hours in matches"
assert '5 hours' not in matches, "Assertion failed because 5 hours in matches"
assert '5 Hours' not in matches, "Assertion failed because 5 Hours in matches"
assert '5 hours' not in matches, "Assertion failed because 5 Hours in matches"

In [None]:
# these would be hidden from the student
positive_test_cases=['2 hours','4-5 hours','1 hour','2-3 hours','1 HOUR','24 hours']
negative_test_cases=['Overnight','Hour','two']

# this is all you need to generate feedback
feedback = return_feedback(integer_hour_pattern,positive_test_cases,negative_test_cases,NAME)
print(feedback)

## Task 2: Regex Pattern for Decimal Hours
For Task 2, you want to create a regex pattern that can match to all decimal hours--whether it is in the form of a single decimal hour (ex: "1.5 hours") or a range of integer hours (ex: "4.5-5.5 hours"). 

Things to Note: 
1. Create a variable called `decimal_hour_pattern` and use the `re.compile()` method to store your regex pattern. Ensure sure that the regex pattern is written as a raw string by prefixing it with `r`.
    - Make this that regular expression is __*case-insensitive*__--where the regex pattern matches to non-capitalized or capitalized words such as "hours" or "HOURS".
4. Create a variable called `matches`. Within this variable use the correct regex method to match the integer hours.
5. Create a for-loop that goes through every `item` in `matches` and prints it out as a string.

Based on this, your regex pattern should output EXACTLY to the phrases below (where there should not be any more or any less ouputs):
- "4.5-5.5 Hours"
- "1.5 hours"

In [None]:
# YOUR CODE HERE
raise NotImplementedError

In [None]:
assert decimal_hour_pattern.findall(recipe_text) == ['4.5-5.5 Hours'] or [' 4.5-5.5 Hours'], "Assertion failed for '4.5-5.5 Hours'."
assert decimal_hour_pattern.findall(recipe_text) == ['1.5 hours'] or [' 1.5 Hours'], "Assertion failed for '1.5 hours'."
assert '1 hour' not in matches, "Assertion failed because 1 Hour (an integer hour) is in matches"
assert '4-5 hours' not in matches, "Assertion failed because 4-5 Hours (an integer hour range) is in matches"

In [None]:
# these would be hidden from the student
positive_test_cases=['1.5 hours','3.5-4.5 Hours']
negative_test_cases=['Overnight','Hour','two','1 hour']

# this is all you need to generate feedback
feedback = return_feedback(decimal_hour_pattern,positive_test_cases,negative_test_cases,NAME)
print(feedback)

# PART 2: Improving the Recipe Text
Congrats to finding all of the recipe timeframes! Now that you have gotten more practice with regular expressions, you will now be using regular expression methods to make `recipe_text`much easier to read and extract necessary information! Since Daphne is most concerned about saving the most amount of time, the second part will be focused on answering important questions that will save Daphne the most amount of time!

Here is a __quick summary of questions/coding sections for Part 2.__
- __Q1: How to make the recipe easier to read?__ divide `recipe_text` to have spacing
- __Q2: How to shorten the recipe timeframes to save time?__ replace the timeframes
- __Q3: Which recipe has the most number of steps?__ find the recipes with the most number of steps

## Q1: How to make the recipe_text easier to read?
For Question 1, you want to use regex method(s) to divide the `recipe_text` docstring with a new line of whitespace between each recipe.

1. Create a variable called `spaced_regex_pattern` and use the `re.compile()` method to store your regex pattern. Ensure sure that the regex pattern is written as a raw string by prefixing it with `r`.\
2. Create a variable called `spaced_recipe_text`. Within this variable, use the correct regex method to split the `recipe_text` docstring.
3. Use both `spaced_regex_pattern` and `spaced_recipe_text` to create a function, which, given a string of recipes, will split them.

Note: You can also print `spaced_recipe_text` to see if the necessary changes were made!

In [None]:
# YOUR CODE HERE
raise NotImplementedError

In [None]:
spaced_recipe_text = split_recipes(recipe_text)
is_Correct,feedback = test_split_function(split_recipes)
assert(is_Correct)

In [None]:
return_feedback_split(split_recipes,NAME)

## Q2: How to shorten the recipe timeframes to save time?
For Question 1, you want to use __one__ regex method(s) to divide the `recipe_text` docstring with a new line of whitespace between each recipe.

1. Create a variable called `replace_regex_pattern` and use the `re.compile()` method to store your regex pattern. Ensure sure that the regex pattern is written as a raw string by prefixing it with `r`.
2. Create a variable called `organized_recipe_text`. Within this variable use __one__ correct regex method to substitute 'two hours' and '2 Hours' with '1.5 hours'

Note: You can also print `organized_recipe_text` to see if the necessary changes were made!

In [None]:
# YOUR CODE HERE
raise NotImplementedError

In [None]:
assert 'two hours' not in organized_recipe_text
assert '2 Hours' not in organized_recipe_text
assert '1.5 hours' in organized_recipe_text