# Project

You're going to build a streamlit app like the [Westminster Directory app](https://westminster-directory.streamlit.app/) or [Recipe app](https://allrecipes.streamlit.app/) I showed in class. 

You are expected to use what we have learned in class:

- numpy
- pandas
- matplotlib
- regualr expression
- web scraping
- streamlit 
- etc.. 

## Step 1: Project Idea and Plan

You need to submit your project idea and plan by the class time on 4/22 Tuesday next week.

Here are some example ideas:

- Weather Data: scrapes weather data from a weather website (e.g., Weather.com) for a specific location. Extract information such as temperature, humidity, wind speed, and weather condition.

- Job Listings: scrapes job listings from a job search website (e.g., Indeed.com or LinkedIn). Gather details such as job title, company name, location, and job description.

- News Headlines: scrapes headlines from a news website (e.g., CNN.com or BBC.com). Extract the title of the article, publication date, and a brief summary.

- Wikipedia on a specific topic.

- Movie: scrapes information about movies from a movie database website (e.g., IMDb or Rotten Tomatoes). Gather details such as movie title, release year, genre, cast, and ratings.

- Some professional websites related to your major. 

Here are some example plans:

- Recipe app: Provide a functionality to choose recipes at the selected range of calories. 

- Job app: Provide the trend of the programming languages in the market. 

- Movie: Provide the trend of the review rating. Analyze the sentiment and genre information. 

- News: Analyze how hot a topic is. 

## Step 2: Project

You have about 2-3 weeks to build your project in following steps:

1. Exploring and making a project idea and plan. 

2. Scraping the data. 

3. Desiging and drafting the interface and functionality of an app. 

4. Building the streamlit app in your local laptop. 

5. Publishing it in public via github and streamlit cloud. 

In [138]:
import requests
from bs4 import BeautifulSoup
import re
import numpy as np
import pandas as pd

In [139]:
recipes = []
URL = "https://www.americastestkitchen.com/search?q=brownies"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")


results = soup.find('ul', class_ ="AlgoliaResults_grid__NXdqe")
recipe_urls = results.find_all('a', class_="ResultLink_resultLink__Duhpm")

for recipe_url in recipe_urls:
    
    recipe = []
    
    recipe_url = "https://www.americastestkitchen.com" + recipe_url['href']

    try:
        recipe_page = requests.get(recipe_url)
        recipe_soup = BeautifulSoup(recipe_page.content, "html.parser")
        
        name_recipe = recipe_soup.find('h1').text.strip()
        rating_recipe = recipe_soup.find('span', id="recipe-header-rating-score").text.strip()
        count_recipe = recipe_soup.find('span', id="recipe-header-rating-count").text.strip()
        
        ingredient_list = recipe_soup.find_all('div', class_ ="RecipeIngredient_controlCheckbox__HszPl")
        num_ingredients = len(ingredient_list)
        
        
        #time = recipe_soup.find_all ('span', class_="typography typography-module_base__PkumT typography-module_open-sm__6cWJa typography-module_proxima__HDZ4V")
 #=====       
        time_elements = recipe_soup.find_all('span', class_="typography typography-module_base__PkumT typography-module_open-sm__6cWJa typography-module_proxima__HDZ4V")

        for time_element in time_elements:
            

            time_text = time_element.text.strip()
    
            # Use a regex to capture the part before the comma (e.g., 2¾ hours)
            match = re.match(r'^[\d/]+(?:\s*\w+)?\s+hours?', time_text)
            if match:
                time = match.group(0)
                break  # Assuming the time is the first matching element
            else:
                time = None  # If no match found
 #======           
        recipe.append([name_recipe, rating_recipe, count_recipe, num_ingredients, time_text]) #calories_recipe])

    except:
        recipe.extend([None, None, None, None, None])        
    
    recipes.append(recipe[0])  
recipes




[['Ultimate Brookies', '4', '(37)', 21, '1¾ hours, plus 2 hours cooling'],
 ['Ultimate Brookies', '4', '(37)', 21, '1¾ hours, plus 2 hours cooling'],
 ["Millionaire's Shortbread",
  '4.5',
  '(842)',
  11,
  '1¾ hours, plus 1½ hours cooling'],
 ["Millionaire's Shortbread",
  '4.5',
  '(842)',
  11,
  '1¾ hours, plus 1½ hours cooling'],
 ['Ultimate Turtle Brownies',
  '4.5',
  '(18)',
  19,
  '1¾ hours, plus 1½ hours cooling and 2 hours chilling'],
 ['Ultimate Turtle Brownies',
  '4.5',
  '(18)',
  19,
  '1¾ hours, plus 1½ hours cooling and 2 hours chilling'],
 ['Lemon Bars', '4.5', '(134)', 11, '1½ hours, plus 2 hours cooling'],
 ['Lemon Bars', '4.5', '(134)', 11, '1½ hours, plus 2 hours cooling'],
 ['Chewy Brownies', '4', '(306)', 13, '1 hour, plus 2½ hours cooling'],
 ['Chewy Brownies', '4', '(306)', 13, '1 hour, plus 2½ hours cooling'],
 ['Strawberry Cheesecake Bars',
  '4.5',
  '(327)',
  12,
  '2¾ hours. plus 2½ hours cooling and 4 hours chilling'],
 ['Strawberry Cheesecake Bars',

In [140]:
brownie_recipes = pd.DataFrame(recipes, columns=["Title", "Review Rating", "Review Count", "Ingredient Count", "Time"])
brownie_recipes.head()

Unnamed: 0,Title,Review Rating,Review Count,Ingredient Count,Time
0,Ultimate Brookies,4.0,(37),21,"1¾ hours, plus 2 hours cooling"
1,Ultimate Brookies,4.0,(37),21,"1¾ hours, plus 2 hours cooling"
2,Millionaire's Shortbread,4.5,(842),11,"1¾ hours, plus 1½ hours cooling"
3,Millionaire's Shortbread,4.5,(842),11,"1¾ hours, plus 1½ hours cooling"
4,Ultimate Turtle Brownies,4.5,(18),19,"1¾ hours, plus 1½ hours cooling and 2 hours ch..."


In [141]:
brownie_recipes = brownie_recipes.drop_duplicates()
brownie_recipes.reset_index(drop=True, inplace=True)
brownie_recipes.head()

Unnamed: 0,Title,Review Rating,Review Count,Ingredient Count,Time
0,Ultimate Brookies,4.0,(37),21,"1¾ hours, plus 2 hours cooling"
1,Millionaire's Shortbread,4.5,(842),11,"1¾ hours, plus 1½ hours cooling"
2,Ultimate Turtle Brownies,4.5,(18),19,"1¾ hours, plus 1½ hours cooling and 2 hours ch..."
3,Lemon Bars,4.5,(134),11,"1½ hours, plus 2 hours cooling"
4,Chewy Brownies,4.0,(306),13,"1 hour, plus 2½ hours cooling"


In [142]:
brownie_recipes.dtypes

Title               object
Review Rating       object
Review Count        object
Ingredient Count     int64
Time                object
dtype: object

In [143]:
brownie_recipes['Review Rating'] = brownie_recipes['Review Rating'].astype('float')
brownie_recipes.dtypes

Title                object
Review Rating       float64
Review Count         object
Ingredient Count      int64
Time                 object
dtype: object

In [144]:
brownie_recipes['Review Count'] = (
    brownie_recipes['Review Count']
    .astype(str)
    .str.replace(r'[(),]', '', regex=True)
    .replace('', np.nan)  
    .astype('Int64')      
)

brownie_recipes.dtypes

Title                object
Review Rating       float64
Review Count          Int64
Ingredient Count      int64
Time                 object
dtype: object

In [145]:
brownie_recipes

Unnamed: 0,Title,Review Rating,Review Count,Ingredient Count,Time
0,Ultimate Brookies,4.0,37,21,"1¾ hours, plus 2 hours cooling"
1,Millionaire's Shortbread,4.5,842,11,"1¾ hours, plus 1½ hours cooling"
2,Ultimate Turtle Brownies,4.5,18,19,"1¾ hours, plus 1½ hours cooling and 2 hours ch..."
3,Lemon Bars,4.5,134,11,"1½ hours, plus 2 hours cooling"
4,Chewy Brownies,4.0,306,13,"1 hour, plus 2½ hours cooling"
5,Strawberry Cheesecake Bars,4.5,327,12,2¾ hours. plus 2½ hours cooling and 4 hours ch...
6,Lemon Cookie Bars,4.5,138,14,"1¼ hours, plus 3 hours cooling"
7,Browned Butter Blondies,4.0,218,11,"1¼ hours, plus 2 hours cooling"
8,Vegan Brownies,4.5,89,11,"55 minutes, plus 3 hours cooling"
9,Ultranutty Pecan Bars,4.5,389,11,"1 hour, plus 1½ hours cooling"


In [146]:

test_string = '2¾ hours, plus 2½ hours cooling'

# Test the regex pattern
result = re.match(r'(\d{1,2}[½¼¾]?)\s?\w+,\s*plus\s*(\d{1,2}[½¼¾]?)\s\w+', test_string)

# If the match succeeds, print the extracted values
if result:
    print(result.groups())  # Should print the bake and cool times
else:
    print("No match found")

('2¾', '2½')


In [147]:
brownie_recipes[['Bake and Prep Time','Cool Time']]=brownie_recipes['Time'].str.extract(r'(\d{1,2}[½¼¾]?)\s?\w+,?\.?\s*plus\s*(\d{1,2}[½¼¾]?)\s\w+')
brownie_recipes

Unnamed: 0,Title,Review Rating,Review Count,Ingredient Count,Time,Bake and Prep Time,Cool Time
0,Ultimate Brookies,4.0,37,21,"1¾ hours, plus 2 hours cooling",1¾,2
1,Millionaire's Shortbread,4.5,842,11,"1¾ hours, plus 1½ hours cooling",1¾,1½
2,Ultimate Turtle Brownies,4.5,18,19,"1¾ hours, plus 1½ hours cooling and 2 hours ch...",1¾,1½
3,Lemon Bars,4.5,134,11,"1½ hours, plus 2 hours cooling",1½,2
4,Chewy Brownies,4.0,306,13,"1 hour, plus 2½ hours cooling",1,2½
5,Strawberry Cheesecake Bars,4.5,327,12,2¾ hours. plus 2½ hours cooling and 4 hours ch...,2¾,2½
6,Lemon Cookie Bars,4.5,138,14,"1¼ hours, plus 3 hours cooling",1¼,3
7,Browned Butter Blondies,4.0,218,11,"1¼ hours, plus 2 hours cooling",1¼,2
8,Vegan Brownies,4.5,89,11,"55 minutes, plus 3 hours cooling",55,3
9,Ultranutty Pecan Bars,4.5,389,11,"1 hour, plus 1½ hours cooling",1,1½


In [148]:
brownie_recipes = brownie_recipes.drop(columns=['Time'])
brownie_recipes

Unnamed: 0,Title,Review Rating,Review Count,Ingredient Count,Bake and Prep Time,Cool Time
0,Ultimate Brookies,4.0,37,21,1¾,2
1,Millionaire's Shortbread,4.5,842,11,1¾,1½
2,Ultimate Turtle Brownies,4.5,18,19,1¾,1½
3,Lemon Bars,4.5,134,11,1½,2
4,Chewy Brownies,4.0,306,13,1,2½
5,Strawberry Cheesecake Bars,4.5,327,12,2¾,2½
6,Lemon Cookie Bars,4.5,138,14,1¼,3
7,Browned Butter Blondies,4.0,218,11,1¼,2
8,Vegan Brownies,4.5,89,11,55,3
9,Ultranutty Pecan Bars,4.5,389,11,1,1½


In [149]:
brownie_recipes.dtypes

Title                  object
Review Rating         float64
Review Count            Int64
Ingredient Count        int64
Bake and Prep Time     object
Cool Time              object
dtype: object

In [151]:
def convert_to_minutes(value):
    if pd.isnull(value):
        return None
    
    value = str(value).strip()
    total = 0.0


    match = re.match(r'^(\d+)?([¼½¾]?)$', value)
    if match:
        whole = match.group(1)
        frac = match.group(2)
        if whole:
            total += int(whole)
        if frac:
            total += fraction_map.get(frac, 0)

            
        if total <= 12:
            return round(total * 60)
        else:
            return round(total)  # Already in minutes?
    
  
    try:
        num = float(value)
        if num <= 12:
            return round(num * 60)
        else:
            return round(num)  # Already in minutes
    except:
        return None
    
brownie_recipes['Bake and Prep Time'] = brownie_recipes['Bake and Prep Time'].apply(convert_to_minutes)
brownie_recipes['Cool Time'] = brownie_recipes['Cool Time'].apply(convert_to_minutes)


In [152]:
brownie_recipes.dtypes

Title                  object
Review Rating         float64
Review Count            Int64
Ingredient Count        int64
Bake and Prep Time    float64
Cool Time             float64
dtype: object

In [153]:
brownie_recipes

Unnamed: 0,Title,Review Rating,Review Count,Ingredient Count,Bake and Prep Time,Cool Time
0,Ultimate Brookies,4.0,37,21,105.0,120.0
1,Millionaire's Shortbread,4.5,842,11,105.0,90.0
2,Ultimate Turtle Brownies,4.5,18,19,105.0,90.0
3,Lemon Bars,4.5,134,11,90.0,120.0
4,Chewy Brownies,4.0,306,13,60.0,150.0
5,Strawberry Cheesecake Bars,4.5,327,12,165.0,150.0
6,Lemon Cookie Bars,4.5,138,14,75.0,180.0
7,Browned Butter Blondies,4.0,218,11,75.0,120.0
8,Vegan Brownies,4.5,89,11,55.0,180.0
9,Ultranutty Pecan Bars,4.5,389,11,60.0,90.0


In [154]:
brownie_recipes.dropna(inplace=True)
brownie_recipes.reset_index(drop=True, inplace=True)
brownie_recipes

Unnamed: 0,Title,Review Rating,Review Count,Ingredient Count,Bake and Prep Time,Cool Time
0,Ultimate Brookies,4.0,37,21,105.0,120.0
1,Millionaire's Shortbread,4.5,842,11,105.0,90.0
2,Ultimate Turtle Brownies,4.5,18,19,105.0,90.0
3,Lemon Bars,4.5,134,11,90.0,120.0
4,Chewy Brownies,4.0,306,13,60.0,150.0
5,Strawberry Cheesecake Bars,4.5,327,12,165.0,150.0
6,Lemon Cookie Bars,4.5,138,14,75.0,180.0
7,Browned Butter Blondies,4.0,218,11,75.0,120.0
8,Vegan Brownies,4.5,89,11,55.0,180.0
9,Ultranutty Pecan Bars,4.5,389,11,60.0,90.0


In [155]:
brownie_recipes.to_csv('brownie_recipes.csv', index=False)