## Web Scraping
> Beautiful Soup

You will need to examine the page:  https://web.archive.org/web/20211009000648/https://www.marketwatch.com/investing/stock/aapl to understand the structure (tags, classes).  Note that for the sake of consistency, we will use the string version of the html content. This means we will NOT be running a get request to the URL and will not need to make use of the requests library.

Check: [beautifulsoup library](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)  

### Step by Step Explaination
This problem is all about web-scraping with requests and BeautifulSoup. Parts a - d are missing a few steps. Take some time to fill in the blanks and understand the operations. Part e is left for you to complete and will ask you to combine the operations in parts a - d into one cohesive function.  

In [2]:
#Web scraping package
from bs4 import BeautifulSoup

In [3]:
content="""
<ul class="list list--kv list--col50">
                <li class="kv__item">
                    <small class="label">Open</small>
                    <span class="primary ">$144.03</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Day Range</small>
                    <span class="primary ">142.56 - 144.18</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">52 Week Range</small>
                    <span class="primary ">107.32 - 157.26</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Market Cap</small>
                    <span class="primary ">$2.37T</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Shares Outstanding</small>
                    <span class="primary ">17.34B</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Public Float</small>
                    <span class="primary ">16.51B</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Beta</small>
                    <span class="primary ">1.20</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Rev. per Employee</small>
                    <span class="primary ">$1.865M</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">P/E Ratio</small>
                    <span class="primary ">27.99</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">EPS</small>
                    <span class="primary ">$5.11</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Yield</small>
                    <span class="primary ">0.62%</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Dividend</small>
                    <span class="primary ">$0.22</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Ex-Dividend Date</small>
                    <span class="primary ">Aug 6, 2021</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Short Interest</small>
                    <span class="primary ">100.93M</span>
                    <span class="secondary ">09/15/21</span>
                </li>
                <li class="kv__item">
                    <small class="label">% of Float Shorted</small>
                    <span class="primary ">0.61%</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Average Volume</small>
                    <span class="primary ">80.12M</span>
                    <span class="secondary no-value"></span>
                </li>
        </ul>
"""

a.  Referencing the above response, create BeautifulSoup object called "results_page" parsed with 'html.parser'.

In [4]:
# build the BeautifulSoup object and name it results_page
results_page = BeautifulSoup(content,'html.parser')
print(type(results_page))

<class 'bs4.BeautifulSoup'>


b.  On the webpage:  https://web.archive.org/web/20211009000648/https://www.marketwatch.com/investing/stock/aapl, look for the section called "Key Data".
![key_data](https://drive.google.com/uc?id=1ZMDc-Q000J5fksE28mMfyIJbFQHvlLUe)

 Inspect the page and verify that the tag/class combination that uniquely identifies each of the items in this table is 'li'/'kv__item'.  Find all elements identified by a tag of 'li' and a class_='kv__item'.  PLEASE NOTE THE **TWO** UNDERSCORES in kv__item.

In [5]:
# find all elements with the tag "li" and class_="kv__item"
key_data = results_page.find_all('li', class_="kv__item")
# print the data type for the kv_data object
print(type(key_data))

<class 'bs4.element.ResultSet'>


d. The code to answer the above question should have returned a list.  Print the first item in this list.  You should get:
```
<li class="kv__item">
<small class="label">Open</small>
<span class="primary">$144.03</span>
<span class="secondary no-value"></span>
</li>
```

In [None]:
# print the first element in "key_data"
print(key_data[0])

<li class="kv__item">
<small class="label">Open</small>
<span class="primary">$144.03</span>
<span class="secondary no-value"></span>
</li>


Extract the label "Open" and the value "$144.03"

[open_price](https://drive.google.com/uc?id=1OLQ3lVUd84dVytXft1K1GPffcb-ejbxq)

Note:  values may differ.

In [None]:
# label=key_data[0].----(----, class_="label").get_text()
label = key_data[0].find('small', class_= "label").get_text()
primary_val=key_data[0].find('span',class_= "primary").get_text()
print(label)
print(primary_val)

Open
$144.03


e.  We need to be able to extract the text for each of the items in the key_data object.  Write a for loop to extract the label and primary value creating a list of the form [(label, value), (label,value),...]

In [None]:
kd_list=[]
for item in key_data:
    # label=item.find('small', class_=----).----()
    label = item.find('small', class_="label").get_text()
    # value=item.----(----,class_="primary").get_text()
    value = item.find('span', class_="primary").get_text()
    # kd_list.----((label,value))
    kd_list.append((label,value))
print(kd_list)

[('Open', '$144.03'), ('Day Range', '142.56 - 144.18'), ('52 Week Range', '107.32 - 157.26'), ('Market Cap', '$2.37T'), ('Shares Outstanding', '17.34B'), ('Public Float', '16.51B'), ('Beta', '1.20'), ('Rev. per Employee', '$1.865M'), ('P/E Ratio', '27.99'), ('EPS', '$5.11'), ('Yield', '0.62%'), ('Dividend', '$0.22'), ('Ex-Dividend Date', 'Aug 6, 2021'), ('Short Interest', '100.93M'), ('% of Float Shorted', '0.61%'), ('Average Volume', '80.12M')]


f.  **BUILD THE FUNCTION Combine all of the above into a function that accepts html source code as an argument and returns the list of (label, value) pairs in the form** Please make sure to define and reference the function parameter so that your program will work correctly with multiple different inputs.
    
```
[('Open', '$144.03'), ('Day Range', '142.56 - 144.18'), ('52 Week Range', '107.32 - 157.26'), ('Market Cap', '$2.37T'), ('Shares Outstanding', '17.34B'), ('Public Float', '16.51B'), ('Beta', '1.20'), ('Rev. per Employee', '$1.865M'), ('P/E Ratio', '27.99'), ('EPS', '$5.11'), ('Yield', '0.62%'), ('Dividend', '$0.22'), ('Ex-Dividend Date', 'Aug 6, 2021'), ('Short Interest', '100.93M'), ('% of Float Shorted', '0.61%'), ('Average Volume', '80.12M')]
```.  

### Complete Code

In [6]:
# Web scraping package
from bs4 import BeautifulSoup

# def get_key_data(---): # add a paramter
def get_key_data(stock):
    # Parse the HTML content using BeautifulSoup
    results_page = BeautifulSoup(stock, 'html.parser')

    # Find all key data
    key_data = results_page.find_all('li', class_="kv__item")
    kd_list = []
    for item in key_data:
        label = item.find('small', class_="label").get_text()
        value = item.find('span', class_="primary").get_text()
        kd_list.append((label, value))
    return kd_list

In [7]:
aapl="""
<ul class="list list--kv list--col50">
                <li class="kv__item">
                    <small class="label">Open</small>
                    <span class="primary ">$144.03</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Day Range</small>
                    <span class="primary ">142.56 - 144.18</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">52 Week Range</small>
                    <span class="primary ">107.32 - 157.26</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Market Cap</small>
                    <span class="primary ">$2.37T</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Shares Outstanding</small>
                    <span class="primary ">17.34B</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Public Float</small>
                    <span class="primary ">16.51B</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Beta</small>
                    <span class="primary ">1.20</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Rev. per Employee</small>
                    <span class="primary ">$1.865M</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">P/E Ratio</small>
                    <span class="primary ">27.99</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">EPS</small>
                    <span class="primary ">$5.11</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Yield</small>
                    <span class="primary ">0.62%</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Dividend</small>
                    <span class="primary ">$0.22</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Ex-Dividend Date</small>
                    <span class="primary ">Aug 6, 2021</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Short Interest</small>
                    <span class="primary ">100.93M</span>
                    <span class="secondary ">09/15/21</span>
                </li>
                <li class="kv__item">
                    <small class="label">% of Float Shorted</small>
                    <span class="primary ">0.61%</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Average Volume</small>
                    <span class="primary ">80.12M</span>
                    <span class="secondary no-value"></span>
                </li>
        </ul>
"""

In [8]:
results_a = get_key_data(aapl)
print(results_a)

[('Open', '$144.03'), ('Day Range', '142.56 - 144.18'), ('52 Week Range', '107.32 - 157.26'), ('Market Cap', '$2.37T'), ('Shares Outstanding', '17.34B'), ('Public Float', '16.51B'), ('Beta', '1.20'), ('Rev. per Employee', '$1.865M'), ('P/E Ratio', '27.99'), ('EPS', '$5.11'), ('Yield', '0.62%'), ('Dividend', '$0.22'), ('Ex-Dividend Date', 'Aug 6, 2021'), ('Short Interest', '100.93M'), ('% of Float Shorted', '0.61%'), ('Average Volume', '80.12M')]


In [3]:
goog="""
<ul class="list list--kv list--col50">
                <li class="kv__item">
                    <small class="label">Open</small>
                    <span class="primary ">$2,798.12</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Day Range</small>
                    <span class="primary ">2,788.59 - 2,806.34</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">52 Week Range</small>
                    <span class="primary ">1,489.45 - 2,936.41</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Market Cap</small>
                    <span class="primary ">$1.86T</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Shares Outstanding</small>
                    <span class="primary ">320.17M</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Public Float</small>
                    <span class="primary ">279.85M</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Beta</small>
                    <span class="primary ">1.08</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Rev. per Employee</small>
                    <span class="primary ">$1.348M</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">P/E Ratio</small>
                    <span class="primary ">30.39</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">EPS</small>
                    <span class="primary ">$92.24</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Yield</small>
                    <span class="primary is-na">N/A</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Dividend</small>
                    <span class="primary is-na">N/A</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Ex-Dividend Date</small>
                    <span class="primary is-na">N/A</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Short Interest</small>
                    <span class="primary ">2.64M</span>
                    <span class="secondary ">09/15/21</span>
                </li>
                <li class="kv__item">
                    <small class="label">% of Float Shorted</small>
                    <span class="primary ">0.94%</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Average Volume</small>
                    <span class="primary ">1.05M</span>
                    <span class="secondary no-value"></span>
                </li>
        </ul>
"""

In [4]:
results_g = get_key_data(goog)
print(results_g)

[('Open', '$2,798.12'), ('Day Range', '2,788.59 - 2,806.34'), ('52 Week Range', '1,489.45 - 2,936.41'), ('Market Cap', '$1.86T'), ('Shares Outstanding', '320.17M'), ('Public Float', '279.85M'), ('Beta', '1.08'), ('Rev. per Employee', '$1.348M'), ('P/E Ratio', '30.39'), ('EPS', '$92.24'), ('Yield', 'N/A'), ('Dividend', 'N/A'), ('Ex-Dividend Date', 'N/A'), ('Short Interest', '2.64M'), ('% of Float Shorted', '0.94%'), ('Average Volume', '1.05M')]


In [12]:
msft="""
<ul class="list list--kv list--col50">
                <li class="kv__item">
                    <small class="label">Open</small>
                    <span class="primary ">$296.22</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Day Range</small>
                    <span class="primary ">293.76 - 296.64</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">52 Week Range</small>
                    <span class="primary ">199.62 - 305.84</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Market Cap</small>
                    <span class="primary ">$2.22T</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Shares Outstanding</small>
                    <span class="primary ">7.51B</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Public Float</small>
                    <span class="primary ">7.4B</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Beta</small>
                    <span class="primary ">1.19</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Rev. per Employee</small>
                    <span class="primary ">$928.66K</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">P/E Ratio</small>
                    <span class="primary ">36.48</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">EPS</small>
                    <span class="primary ">$8.06</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Yield</small>
                    <span class="primary ">0.84%</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Dividend</small>
                    <span class="primary ">$0.62</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Ex-Dividend Date</small>
                    <span class="primary ">Nov 17, 2021</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Short Interest</small>
                    <span class="primary ">44.2M</span>
                    <span class="secondary ">09/15/21</span>
                </li>
                <li class="kv__item">
                    <small class="label">% of Float Shorted</small>
                    <span class="primary ">0.60%</span>
                    <span class="secondary no-value"></span>
                </li>
                <li class="kv__item">
                    <small class="label">Average Volume</small>
                    <span class="primary ">22.76M</span>
                    <span class="secondary no-value"></span>
                </li>
        </ul>
"""

In [13]:
results_m = get_key_data(msft)
print(results_m)

[('Open', '$296.22'), ('Day Range', '293.76 - 296.64'), ('52 Week Range', '199.62 - 305.84'), ('Market Cap', '$2.22T'), ('Shares Outstanding', '7.51B'), ('Public Float', '7.4B'), ('Beta', '1.19'), ('Rev. per Employee', '$928.66K'), ('P/E Ratio', '36.48'), ('EPS', '$8.06'), ('Yield', '0.84%'), ('Dividend', '$0.62'), ('Ex-Dividend Date', 'Nov 17, 2021'), ('Short Interest', '44.2M'), ('% of Float Shorted', '0.60%'), ('Average Volume', '22.76M')]


### Example: Function to Find Recipes Given Ingredients
Now that we've built each part step by step, we can combine them into a complete function that finds matching recipes based on the user's ingredients.

**Summary:**

**Step 1:** Parse the HTML content using BeautifulSoup.   
**Step 2:** Find the elements that contain recipe data.   
**Step 3:** Extract relevant information (title and ingredients).   
**Step 4:** Check if the user-provided ingredients are present in the recipe.   

In [10]:
content = """
<body>
    <h1>Recipe Listings</h1>
    <div class="recipe">
        <p class="title">Apple Pie</p>
        <p class="description">A classic apple pie recipe with a flaky crust and sweet apple filling.</p>
        <div class="ingredients">
            <strong>Ingredients:</strong>
            <ul class="ingredient-list">
                <li>2 cups all-purpose flour</li>
                <li>1/2 cup sugar</li>
                <li>1 tsp cinnamon</li>
                <li>4 large apples, peeled and sliced</li>
                <li>1/4 cup butter</li>
                <li>1 egg, beaten</li>
            </ul>
        </div>
    </div>
    <div class="recipe">
        <p class="title">Spaghetti Carbonara</p>
        <p class="description">A rich and creamy pasta dish made with eggs, cheese, pancetta, and pepper.</p>
        <div class="ingredients">
            <strong>Ingredients:</strong>
            <ul class="ingredient-list">
                <li>200g spaghetti</li>
                <li>100g pancetta</li>
                <li>2 large eggs</li>
                <li>50g Parmesan cheese, grated</li>
                <li>2 garlic cloves, minced</li>
                <li>Freshly ground black pepper</li>
            </ul>
        </div>
    </div>
    <div class="recipe">
        <p class="title">Chocolate Cake</p>
        <p class="description">A moist and rich chocolate cake with a creamy chocolate frosting.</p>
        <div class="ingredients">
            <strong>Ingredients:</strong>
            <ul class="ingredient-list">
                <li>1 and 1/2 cups flour</li>
                <li>1 cup sugar</li>
                <li>1/2 cup cocoa powder</li>
                <li>1 tsp baking soda</li>
                <li>1/2 tsp salt</li>
                <li>1/3 cup vegetable oil</li>
                <li>1 cup water</li>
                <li>1 tsp vanilla extract</li>
            </ul>
        </div>
    </div>
</body>
</html>
"""

In [11]:
from bs4 import BeautifulSoup

def find_recipes_by_ingredients(source, ingredients):
    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(source, 'html.parser')

    # Find all recipes on the page
    recipes = soup.find_all('div', class_='recipe')

    # Initialize an empty list to store matching recipes
    matching_recipes = []

    # Loop through each recipe to check if it matches the ingredients
    for recipe in recipes:
        # Extract the recipe title, description, and ingredients
        title = recipe.find('p', class_='title').text
        description = recipe.find('p', class_='description').text
        ingredients_list = []

        # Extract each ingredient and convert to lowercase
        for ingredient in recipe.find_all('li'):
            ingredients_list.append(ingredient.text.lower())

        # Check if all user-provided ingredients are in the recipe (using partial match)
        matching = True
        for ingredient in ingredients:
            found = False  # A flag to track if the current ingredient is found in the recipe
            for item in ingredients_list:  # Loop through each item in the recipe's ingredients list
                if ingredient.lower() in item:  # Check if the ingredient is a substring of the item
                    found = True  # Mark as found
                    break  # Stop searching once the ingredient is found in the list
            if not found:  # If the ingredient wasn't found in any item
                matching = False  # The recipe does not match
                break  # Exit the loop early since not all ingredients are found

        # If all ingredients match, add the recipe to the matching_recipes list
        if matching:
            matching_recipes.append({
                'title': title,
                'description': description,
                'ingredients': ingredients_list
            })

    # Return the matching recipes or a message if no recipes match
    if matching_recipes:
        return matching_recipes
    else:
        return "No recipes found with the given ingredients."

ingredients_to_search = ["butter", "flour", "apple"]
find_recipes_by_ingredients(content, ingredients_to_search)

[{'title': 'Apple Pie',
  'description': 'A classic apple pie recipe with a flaky crust and sweet apple filling.',
  'ingredients': ['2 cups all-purpose flour',
   '1/2 cup sugar',
   '1 tsp cinnamon',
   '4 large apples, peeled and sliced',
   '1/4 cup butter',
   '1 egg, beaten']}]

## Exercises:

### Exercise: Web Scraping Job Listings


In this exercise, we will apply the concepts of web scraping using **BeautifulSoup** to extract job listing data from an HTML source. Web scraping is a critical skill in today's data-driven business environment, as it enables organizations to automate data collection from websites. This technique can be especially useful for gathering market intelligence, monitoring job trends, and analyzing competitive information.

**Business Relevance:**
In a competitive job market, businesses and recruitment agencies need up-to-date job market information. They often gather job listings across locations and industries to understand salary ranges, demand for specific roles, and emerging job markets. Automating the extraction of this data saves time, reduces errors, and allows for a broader collection of data points that can be used in business decisions.

For example:
- **Recruitment firms** may want to gather job openings in specific cities to better match candidates.
- **Businesses** may track competitors' job postings to gain insights into their hiring strategies or the expansion of their operations into new locations.
- **Job seekers** may want to track salaries and job openings in specific regions to guide their applications.

In this exercise, you will use BeautifulSoup to scrape job listing data such as job titles, companies, locations, and salaries, and filter the results based on user-provided locations.

**Steps:**
1. **Parse the HTML Content**: Use BeautifulSoup to parse the HTML structure and extract job listings.
2. **Extract Job Information**: For each job listing, extract key data points such as job title, company, location, and salary.
3. **Filter by Location**: Match job listings to a list of locations provided by the user, allowing for case-insensitive comparisons.
4. **Return the Results**: Display the matching jobs, or return a message if no jobs match the provided locations.

This exercise will give you hands-on experience in building a function that automates the process of job data collection based on location, a task that is valuable for businesses and individuals alike.

---

**Example Usage:**
In the example below, the function scrapes job listings and filters them by the locations `"New York"` and `"Austin"`. You can adjust the list of locations to target other cities.

```python
locations_to_search = ['New York', 'Austin']
find_jobs_by_location(job_content, locations_to_search)
```
**Expected Output:**
```
[{'title': 'Business Intelligence Developer',
  'company': 'Data Insights LLC',
  'location': 'New York, NY',
  'salary': '$105,000 - $125,000'},
 {'title': 'Data Scientist',
  'company': 'AI Ventures',
  'location': 'Austin, TX',
  'salary': '$120,000 - $150,000'}]
```


In [12]:
job_content = """
<body>
    <h1>Job Listings</h1>

    <div class="listing">
        <p class="title">Data Analyst</p>
        <p class="company">TechCorp Inc.</p>
        <p class="location">San Francisco, CA</p>
        <p class="salary">$90,000 - $110,000</p>
    </div>

    <div class="listing">
        <p class="title">Business Intelligence Developer</p>
        <p class="company">Data Insights LLC</p>
        <p class="location">New York, NY</p>
        <p class="salary">$105,000 - $125,000</p>
    </div>

    <div class="listing">
        <p class="title">Product Manager</p>
        <p class="company">Innovate Solutions</p>
        <p class="location">Boston, MA</p>
        <p class="salary">$110,000 - $130,000</p>
    </div>

    <div class="listing">
        <p class="title">Data Scientist</p>
        <p class="company">AI Ventures</p>
        <p class="location">Austin, TX</p>
        <p class="salary">$120,000 - $150,000</p>
    </div>

</body>
</html>
"""

In [13]:
from bs4 import BeautifulSoup

def find_jobs_by_location(source, locations):
    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(source, 'html.parser')

    # Find all job listings on the page
    listings = soup.find_all('div', class_='listing')

    # Initialize an empty list to store matching job listings
    matching_jobs = []

    # Loop through each job listing to check if it matches the location
    for listing in listings:
        # Extract the job title, company, location, and salary
        title = listing.find('p', class_='title').text
        company = listing.find('p', class_='company').text
        location = listing.find('p', class_='location').text
        salary = listing.find('p', class_='salary').text

        # Check if the job's location matches any user-provided location
        matching = False
        for loc in locations:
            if loc.lower() in location.lower():  # Case-insensitive match
                matching = True
                break  # If the location matches, stop checking

        # If the location matches, add the job to the matching_jobs list
        if matching:
            matching_jobs.append({
                'title': title,
                'company': company,
                'location': location,
                'salary': salary
            })

    # Return the matching jobs or a message if no jobs match
    if matching_jobs:
        return matching_jobs
    else:
        return "No jobs found for the given locations."

locations_to_search = ["New York", "Austin"]
find_jobs_by_location(job_content, locations_to_search)

[{'title': 'Business Intelligence Developer',
  'company': 'Data Insights LLC',
  'location': 'New York, NY',
  'salary': '$105,000 - $125,000'},
 {'title': 'Data Scientist',
  'company': 'AI Ventures',
  'location': 'Austin, TX',
  'salary': '$120,000 - $150,000'}]

### Exercise: List Common Elements
Take two lists and write a function that returns a list that contains only the elements that are common between the lists (without duplicates). Make sure your function works on two lists of different sizes.

a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

print(common_elements(a,b)) # should output [1,2,3,5,8,13]

Hints:

```
x=[1,2,3,4]
if 4 in x:
 print("yes")

if 5 not in x:
 print("no")
```

In [14]:
def common_elements(a, b):
  c = []
  for i in a:
    if i in b and i not in c:
      c.append(i)
  return c

In [26]:
# Find common elements from both sets
def intersect_elements(a, b):
    return list(set(a) & set(b))

In [19]:
# Combines all elements from both sets, removing duplicates.
def union_elements(a, b):
    return list(set(a) | set(b))

In [20]:
# Finds elements in a that are not in b.
def difference_elements(a, b):
    return list(set(a) - set(b))

In [17]:
# Returns elements that are in either a or b, but not in both (i.e., removes common elements).
def symmetric_difference_elements(a, b):
    return list(set(a) ^ set(b))

In [21]:
# To check if all elements of one set are contained in another.
def is_subset(a, b):
    return set(a).issubset(set(b))
def is_superset(a, b):
    return set(a).issuperset(set(b))

In [27]:
# Test 1
a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

print(common_elements(a,b)) # should output [1,2,3,5,8,13]
print(intersect_elements(a,b))
print(union_elements(a,b))
print(difference_elements(a,b))
print(symmetric_difference_elements(a,b))
print(is_subset(a,b))
print(is_superset(a,b))

[1, 2, 3, 5, 8, 13]
[1, 2, 3, 5, 8, 13]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 21, 89, 34, 55]
[89, 34, 21, 55]
[34, 4, 6, 7, 9, 10, 11, 12, 21, 55, 89]
False
False


In [24]:
# Test 2
x = ["Apple", "Pear", "Orange", "Peach", "Grapes", "Lemon", "Lime"]
y = ["Apple", "Tangerine", "Orange", "Strawberries", "Blackberries", "Lemon", "Lime"]

print(common_elements(x,y)) # should output ["Apple", "Orange", "Lemon", "Lime"]
print(intersect_elements(x,y))
print(union_elements(x,y))
print(difference_elements(x,y))
print(symmetric_difference_elements(x,y))
print(is_subset(x,y))
print(is_superset(x,y))

['Apple', 'Orange', 'Lemon', 'Lime']
['Apple', 'Grapes', 'Lemon', 'Lime', 'Orange', 'Strawberries', 'Pear', 'Tangerine', 'Blackberries', 'Peach']
['Peach', 'Pear', 'Grapes']
['Grapes', 'Strawberries', 'Pear', 'Tangerine', 'Blackberries', 'Peach']
False
False


In [28]:
# Test 3
e = [1, 2, 3, 5, 8, 13]
f = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

print(common_elements(e,f)) # should output [1,2,3,5,8,13]
print(intersect_elements(e,f))
print(union_elements(e,f))
print(difference_elements(e,f))
print(symmetric_difference_elements(e,f))
print(is_subset(e,f))
print(is_superset(e,f))

[1, 2, 3, 5, 8, 13]
[1, 2, 3, 5, 8, 13]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
[]
[4, 6, 7, 9, 10, 11, 12]
True
False


### Exercise: ATM Code Check
ATM machines allow 4 or 6 digit PIN codes and PIN codes cannot contain anything but exactly 4 digits or exactly 6 digits. Your task is to create a function that takes a string and returns True if the PIN is valid and False if it's not.

Hint: https://www.w3schools.com/python/ref_string_isdigit.asp

In [34]:
def is_valid_PIN(pin):
  if pin.isdigit() and (len(pin) == 4 or len(pin) == 6):
    return True
  return False

# def is_valid_PIN(pin):
#     return len(pin) in [4, 6] and pin.isdigit()

In [35]:
# Tests

print(is_valid_PIN("1234")) # True
print(is_valid_PIN("12345")) # False
print(is_valid_PIN("a234")) # False
print(is_valid_PIN("")) # False
# Empty strings must return False.

True
False
False
False
