# **Scraping a Site**
### Extract links for all countries from the website https://starbucksmenuprices.com/.


## **Step 1: Install Necessary Libraries**
Python libraries extend the functionality of Python. Here, we need:
*   requests: To fetch the webpage.
*   BeautifulSoup (from bs4): To parse and extract data from the HTML.
### **Command:**

In [None]:
# pip install requests beautifulsoup4

## **Step 2: Import Libraries**
We start the Python script by importing the libraries we installed. Think of this as unlocking tools we’ll need for our task.
*   **import requests**: Enables us to send a request to a webpage.
*   **from bs4 import BeautifulSoup**: Allows us to use BeautifulSoup for extracting data.
### **Command:**

In [None]:
# Importing necessary libraries
import requests  # For fetching the webpage
from bs4 import BeautifulSoup  # For parsing the webpage

## **Step 3: Fetch the Webpage Content**
Webpages are made of HTML (a markup language for displaying content). To analyze it, we first need to download the HTML using Python.
*  **url = '...'**: This variable stores the URL of the website we want to scrape.
*  **requests.get(url)**: Sends a request to the server to get the webpage's content.
*   **response.status_code**: Checks the server's response. A code of 200 means the request was successful.

In [None]:
# URL of the webpage to scrape
url = 'https://starbucksmenuprices.com/'

# Send a request to the server
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    print("Successfully fetched the webpage!")
else:
    print(f"Failed to fetch webpage. Status code: {response.status_code}")


Successfully fetched the webpage!


**Common Status Codes:**
*   200 OK: The request was successful.
*   201 Created: The request has been fulfilled, resulting in the creation of a new resource.
*   204 No Content: The server successfully processed the request, but is not returning any content.
*   301 Moved Permanently: The requested resource has been permanently moved to a new location.
*   302 Found (Temporary Redirect): The requested resource has been temporarily moved to a new location.
*   400 Bad Request: The server cannot or will not process the request due to an apparent client error.
*   401 Unauthorized: The request requires user authentication.
*   403 Forbidden: The server understood the request, but refuses to fulfill it.
*   404 Not Found: The server cannot find the requested resource.
*   500 Internal Server Error: The server encountered an unexpected condition that prevented it from fulfilling the request.
*   503 Service Unavailable: The server is currently unable to handle the request due to a temporary overload or maintenance.

## **Step 4: Parse the HTML Content**
Now that we have the HTML, we need to make it readable for Python using BeautifulSoup.
*  **response.text**: The raw HTML text of the webpage.
*   **BeautifulSoup(..., 'html.parser')**: Converts the raw HTML into a structured format that Python can easily work with.
*   **soup.prettify()**: Prints the HTML in an indented format, making it easier to inspect.





In [56]:
# Parse the webpage content with BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

# Print the HTML content to understand its structure
# print(soup.prettify())


## **Step 5: Locate the Links**
To extract links, inspect the website’s structure using your browser’s developer tools (right-click > "Inspect").

*   Identify the < a > tags (used for links) within the < ul > list elements.
*  **soup.find_all('ul')**: Finds all < ul > (unordered list) elements on the page.






In [None]:
# Find all <ul> elements containing the country links
sections = soup.find_all('ul')  # Locate all <ul> elements

# Lists in Python: The result is a list, which can store multiple items.
print(sections[1])

<ul class="sub-menu">
<li class="menu-item menu-item-type-post_type menu-item-object-page menu-item-32" id="menu-item-32"><a href="https://starbucksmenuprices.com/starbucks-au-prices/">Australia</a></li>
<li class="menu-item menu-item-type-post_type menu-item-object-page menu-item-42" id="menu-item-42"><a href="https://starbucksmenuprices.com/starbucks-brasil-precos/">Brasil</a></li>
<li class="menu-item menu-item-type-post_type menu-item-object-page menu-item-70" id="menu-item-70"><a href="https://starbucksmenuprices.com/starbucks-%d1%86%d0%b5%d0%bd%d0%b8/">Bulgaria</a></li>
<li class="menu-item menu-item-type-post_type menu-item-object-page menu-item-58" id="menu-item-58"><a href="https://starbucksmenuprices.com/starbucks-canada-menu/">Canada</a></li>
</ul>


In [None]:
# Example of List:
my_list = ['apple', 'banana', 'cherry']
print(my_list[0],my_list[1],my_list[2])

apple banana cherry


## **Step 6: Extract Links**
Now, loop through each < ul > section to find < a > tags (anchors), which represent links.

*   **link.text.strip()**: Extracts the text (e.g., "Australia") and removes extra spaces.
*  **link.get('href')**: Retrieves the URL linked to the < a > tag.
*  **country_links.append({...})**: Adds a dictionary with country name and URL to the list.






In [None]:
# Extract links from the <ul> sections
country_links = []  # Empty list to store results

for section in sections:
    links = section.find_all('a')  # Find all <a> tags in each section
    for link in links:
        country_name = link.text.strip()  # Get the visible text of the link
        country_url = link.get('href')  # Get the href attribute (URL)
        country_links.append({'Country': country_name, 'URL': country_url})


In [None]:
# Example of Loop:
for fruit in ['apple', 'banana', 'cherry']:
    print(fruit)


apple
banana
cherry


## **Step 7: Store Results in a DataFrame**
DataFrames are tables provided by the **pandas** library, making it easy to organize and analyze data.
*   **pd.DataFrame()**: Converts a list of dictionaries into a tabular format.
*   **df.head()**: Displays the first 5 rows.



In [None]:
# Import pandas for working with data
import pandas as pd

# Convert the list of dictionaries into a DataFrame
df = pd.DataFrame(country_links)

# Display the first few rows of the DataFrame
print(df.head())


     Country                                                URL
0        A-C                                                  #
1  Australia  https://starbucksmenuprices.com/starbucks-au-p...
2     Brasil  https://starbucksmenuprices.com/starbucks-bras...
3   Bulgaria  https://starbucksmenuprices.com/starbucks-%d1%...
4     Canada  https://starbucksmenuprices.com/starbucks-cana...


## **Step 8: Save Results to a CSV File**
Finally, save the data to a CSV file, which can be opened in Excel or analyzed further.
*   **to_csv()**: Exports the DataFrame to a file.
*   **index=False**: Prevents saving row numbers in the CSV.



In [None]:
# Save the DataFrame to a CSV file
df.to_csv('starbucks_country_links.csv', index=False)
print("Saved country links to starbucks_country_links.csv")


Saved country links to starbucks_country_links.csv


## **Complete Script**

In [None]:
# Import necessary libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Step 1: Fetch the webpage content
url = 'https://starbucksmenuprices.com/'
response = requests.get(url)
if response.status_code == 200:
    print("Successfully fetched the webpage!")
else:
    print(f"Failed to fetch webpage. Status code: {response.status_code}")

# Step 2: Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

# Step 3: Find all <ul> elements
sections = soup.find_all('ul')

# Step 4: Extract links
country_links = []
for section in sections:
    links = section.find_all('a')
    for link in links:
        country_name = link.text.strip()
        country_url = link.get('href')
        country_links.append({'Country': country_name, 'URL': country_url})

# Step 5: Convert to DataFrame
df = pd.DataFrame(country_links)
print(df.head())  # Display the first 5 rows

# Step 6: Save to CSV
df.to_csv('starbucks_country_links.csv', index=False)
print("Saved country links to starbucks_country_links.csv")


Successfully fetched the webpage!
     Country                                                URL
0        A-C                                                  #
1  Australia  https://starbucksmenuprices.com/starbucks-au-p...
2     Brasil  https://starbucksmenuprices.com/starbucks-bras...
3   Bulgaria  https://starbucksmenuprices.com/starbucks-%d1%...
4     Canada  https://starbucksmenuprices.com/starbucks-cana...
Saved country links to starbucks_country_links.csv


# **Extracting Starbucks Prices Data**
### Example of scraping hot coffee price data.

## **Step 1: Load the Links**
*   **pd.read_csv()**: Reads the CSV file containing country links into a DataFrame.
*   **links_file**: The file path of the CSV file.



In [None]:
links_file = 'starbucks_country_links.csv'  # File containing country links
country_links = pd.read_csv(links_file)  # Read the CSV file into a DataFrame

## **Step 2: Filter Valid Links**
*   **Filter Rows**: ~country_links['URL'].str.contains('#', na=False) excludes rows where the URL contains #.
*   **iloc[0]**: Selects the first valid row.
*   **Format Country Name**: Converts the country name to lowercase and replaces spaces with hyphens for file naming.

In [None]:
valid_links = country_links[~country_links['URL'].str.contains('#', na=False)]  # Filter rows without '#'
if valid_links.empty:
    print("No valid links found in the file.")
    exit()

first_link = valid_links.iloc[0]  # Select the first valid row
country_url = first_link['URL']  # Extract the URL from the first valid row
country_name = first_link['Country'].lower().replace(' ', '-')  # Extract and format the country name


## **Step 3: Fetch the Webpage**
*   **requests.get(url)**: Fetches the HTML content of the URL.
*   **Check Status Code**: Ensures the webpage was successfully fetched (200 status code).






In [None]:
response = requests.get(country_url)
if response.status_code == 200:
    print(f"Successfully fetched the page: {country_url}")
else:
    print(f"Failed to fetch webpage. Status code: {response.status_code}")
    exit()

Successfully fetched the page: https://starbucksmenuprices.com/starbucks-au-prices/


## **Step 4: Parse the HTML**
*   **BeautifulSoup(..., 'html.parser')**: Parses the HTML content into a structured format.

In [None]:
soup = BeautifulSoup(response.text, 'html.parser')

## **Step 5: Locate "h2" Section**
*   **Find < h2 > Heading**: Searches for the "h2" heading.
*   **Find Parent Table**: If the heading is found, locate the parent table containing the data.

In [None]:
hot_coffee_heading = soup.find('h2')  # Locate the "h2" section heading
if hot_coffee_heading:
    hot_coffee_table = hot_coffee_heading.find_parent('table')  # Find the parent table containing the data
else:
    print("'Hot Coffee' section not found on the page.")
    exit()

## **Step 6: Extract Data**
*   **Find Rows**: Selects rows with class item.
*  **Extract Columns**: Extracts text from each column and cleans it.

In [None]:
hot_coffee_data = []  # List to store extracted data
if hot_coffee_table:
    rows = hot_coffee_table.find_all('tr', class_='item')  # Find all rows with class "item"
    for row in rows:
        cols = row.find_all('td')  # Find all columns in the row
        cols = [col.string.strip() for col in cols]  # Clean the text
        if cols:  # Skip empty rows
            hot_coffee_data.append(cols)

## **Step 7: Save Data**
*   **Dynamic File Name**: Includes the country name in the file name.
*  **Save to CSV**: Exports the data to a CSV file.

In [None]:
if hot_coffee_data:
    df_hot_coffee = pd.DataFrame(hot_coffee_data, columns=['Item', 'Price'])  # Create a DataFrame
    output_file = f'hot_coffee_prices_{country_name}.csv'  # Generate file name with country name
    df_hot_coffee.to_csv(output_file, index=False)  # Save to CSV
    print(f"Saved 'Hot Coffee' prices to {output_file}")
    print(df_hot_coffee)  # Print the result
else:
    print("No 'Hot Coffee' data found.")


Saved 'Hot Coffee' prices to hot_coffee_prices_australia.csv
                              Item  Price
0                     Banana Bread  $6.30
1                 Butter Croissant  $5.78
2                 Almond Croissant  $5.78
3                Pain Au Chocolate  $5.50
4                 Pain Au Chocolat  $6.00
..                             ...    ...
74               Caramel Macchiato  $6.88
75           White Chocolate Mocha  $6.16
76                     Caffé Mocha  $6.35
77         Caramel Cloud Macchiato  $8.00
78  Honeycomb Salted Caramel Latte  $7.30

[79 rows x 2 columns]



# Analysis of Findings

## Comparison of Actual Exchange Rates and PPP Rates

The analysis reveals significant deviations between the actual exchange rates and the PPP rates calculated from Starbucks Latte prices. For instance, in Australia, the implied PPP rate based on Starbucks prices is 1.53 AUD per USD, which is approximately 138.89% higher than the actual exchange rate of 0.64 AUD per USD. This substantial deviation suggests that the Australian Dollar is undervalued in the actual exchange rates compared to what the Starbucks Latte Index implies.

## Starbucks Latte Index vs. Big Mac Index

The Starbucks Latte Index provides a contemporary measure of purchasing power, showing significant discrepancies when compared to actual exchange rates. Unlike the Big Mac Index, which might incorporate a variety of economic factors, the Starbucks Latte Index might be influenced more by local pricing strategies and cost structures specific to the coffee industry.

### Implications

These findings suggest that while traditional PPP calculations like the Big Mac Index provide a broad view of the economic standings, niche indices like the Starbucks Latte Index can reveal localized economic behaviors that differ significantly. The variations in index results may also reflect distinct consumer behavior patterns or business operating costs across different markets.
