## **Q#** How can I calculate a score based on the materials? *(Jasmine)*

### Qualitative:
#### Problem -
- We have extracted the materials from 30 brands, how do we automatically calculate a score for the materials?
- Is there a tool that exists that can calculate a score for materials?
#### Hypothesis & Assumptions -
- I am assuming that there are existing tools that can help calculate a sustainability score for materials.
#### Context, Motivation & Rationale -
- We want to use the materials used to make clothing and products pulled from webscraped brands to make a material score that will go into our overall sustainability score.
- By using an existing tool we do not need to dive deeper into the research that goes into calculating a material score and can focus on data collection and data science processes, as well as, building our tool.
#### Definitions, Data, and Methods -
- [Material Score Calculator](https://www.selflessclothes.com/blog/sustainability-calculator/) - We were able to find this material score calculator that can be used to input materials and their percentages for a score based on the Higg Materials Sustainability Index (Higg MSI).
- [Higg Materials Sustainability Index (Higg MSI)](https://howtohigg.org/higg-msi/an-introduction-to-msi/) - "material assessment tool that calculates the environmental impacts of materials used in consumer goods industries."
- Higg MSI measures environmental impact in 5 areas:
    - Global warming
    - Nutrient pollution in water (eutrophication)
    - Water scarcity
    - Abiotic resource depletion, use of fossil fuels
    - Chemistry
- Method:
    - Using string concatenation and Selenium webscraping to pull the score from the score calculator website.
#### Biases & Assumptions
- We are assuming that the Selfless Clothes' material score calculator is an accurate and trust worthy resource.
- There is a possibility of bias coming from the Selfless Clothes' material score calculator & Higg MSI where they have their own biases and limitations. 
- There is a possibility of selection bias with the brands that we decided to focus on.

### Quantitative:

Step 1: Import necessary libraries 

In [56]:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import requests
import time
import re

Step 2: Scrape the clothing item link and extract the item's name and the material composition

In [57]:
clothing_item = "https://www.uniqlo.com/us/en/products/E468788-000/00?colorDisplayCode=01&sizeDisplayCode=003"

In [58]:
# Scrapes a Uniqlo item and returns the material makeup
def scrape_uniqlo(url, wait_time=2):
    try:
        user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36"
        # Set Chrome options for headless mode
        chrome_options = Options()
        chrome_options.add_argument("--headless")
        chrome_options.add_argument(f"user-agent={user_agent}")
        interactive_element_xpath = '//*[@id="productMaterialDescription"]'
        loaded_content_xpath = '//*[@id="productMaterialDescription-content"]/dl/dd[1]/p'
        title_xpath = '//*[@id="root"]/div[4]/div/section/div[2]/div[2]/div[1]/div/ul/li[1]/h1'

        # Initialize the WebDriver with headless mode
        driver = webdriver.Chrome(options=chrome_options)
        
        # Open the webpage
        driver.get(url)

        # Wait for the specified time before clicking the interactive element
        time.sleep(wait_time)  # Wait for the specified time in seconds

        # Find the interactive element
        interactive_element = driver.find_element(By.XPATH, interactive_element_xpath)
        
        # Click the interactive element
        interactive_element.click()

        # Wait for the loaded content to be visible
        loaded_element = WebDriverWait(driver, wait_time).until(
            EC.visibility_of_element_located((By.XPATH, loaded_content_xpath))
        )

        # Once loaded, scrape the content
        dynamic_content = loaded_element.text
        
        # Scrape the title
        title_element = driver.find_element(By.XPATH, title_xpath)
        title = title_element.text
        
        # Extract all material percentages from the dynamic content
        material_pattern = r'(\d+)%\s*(\w+)'
        material_matches = re.findall(material_pattern, dynamic_content)
        
        # Create a dictionary to store the scraped data
        scraped_data = {"item": title}
        for percent, material in material_matches:
            # Convert percentage to integer
            percent = int(percent)
            # Update the dictionary with the material percentage
            scraped_data[material.lower()] = percent
        
        return scraped_data
        
    except Exception as e:
        print(f"An error occurred: {str(e)}")
        return None
        
    finally:
        # Close the WebDriver
        driver.quit()

In [72]:
# Save the scraped dictionary to an item variable
item = scrape_uniqlo(clothing_item)
print(item)

{'item': 'Mesh Crew Neck Long-Sleeve Sweater', 'acrylic': 58, 'lyocell': 26, 'linen': 16}


Step 2: Create a method that accepts an item dictionary and creates a material calculator URL that can be scraped to obtain the material score

In [69]:
# This is the base calculator URL
calculator_url = "https://www.selflessclothes.com/blog/sustainability-calculator/?"

# This is the goal url to obtain the scores based on the materials,
# it uses query params like this:
# material=ACRYLIC&material=LYOCELL&material=LINEN&percentage=58&percentage=26&percentage=16
# to generate a score:
goal = "https://www.selflessclothes.com/blog/sustainability-calculator/?material=ACRYLIC&material=LYOCELL&material=LINEN&percentage=58&percentage=26&percentage=16"

In [70]:
def get_material_score(item):
    # List of accepted parameters by the calculator
    material_params = ["COTTON", "RECYCLED_COTTON", "ORGANIC_COTTON", 
                    "POLYESTER", "RECYCLED_POLYESTER", "NYLON", 
                    "RECYCLED_NYLON", "ACRYLIC", "SPANDEX", 
                    "FLAX", "LINEN", "HEMP", "CUPRO", 
                    "LYOCELL", "TENCEL_LYOCELL_LENZING", 
                    "REFIBRA_TENCEL_LYOCELL_LENZING", "MODAL", 
                    "TENCEL_MODAL_LENZING", "VISCOSE", "VISCOSE_BAMBOO", 
                    "VISCOSE_ASIA_LENZING", "VISCOSE_EU_LENZING",
                    "SILK", "ALPACA", "WOOL", "RECYCLED_WOOL", 
                    "CASHMERE", "RECYCLED_CASHMERE"]
    
    # Base calculator URL
    calculator = "https://www.selflessclothes.com/blog/sustainability-calculator/?"

    # Extract the materials out of the item dictionary for calculator URL's query parameters
    materials_and_percents = {key: value for key, value in item.items() if key != 'item'}

    goal_params = []

    # Check if the materials are in the material parameters that the website uses
    for material, percent in materials_and_percents.items():
        if material.upper() in material_params:
            goal_params.extend([f"material={material.upper()}", f"percentage={percent}"])

    # Reorder the parameters so they match the score calculator
    reordered_params = [goal_params[i] for i in range(0, len(goal_params), 2)] + [goal_params[i] for i in range(1, len(goal_params), 2)]

    # String concatenation of the score calculator url
    score_calculator = calculator + "&".join(reordered_params)

    # Configure Chrome options for headless mode
    chrome_options = Options()
    chrome_options.add_argument("--headless")  # Run in headless mode

    # Initialize Chrome WebDriver with configured options
    driver = webdriver.Chrome(options=chrome_options)

    try:
        # Load the score calculator page with the results
        driver.get(score_calculator)

        # Wait for the score element to be present
        score_element = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CLASS_NAME, "ml-2"))
        )

        # Extract the score text
        score_text = score_element.text.strip()

        # Extract the score from the parentheses and convert it to a float
        material_score = float(score_text[score_text.find('(') + 1: score_text.find(')')])

        # Returns the clothing item's name and material score
        return (item.get("item"), f"Material Score: {material_score}")
    except Exception as e:
        print("Error:", e)
        return None
    finally:
        # Close the WebDriver
        driver.quit()

In [71]:
# Usage
get_material_score(item)

('Mesh Crew Neck Long-Sleeve Sweater', 'Material Score: 0.9')

### Qualitative (pt. 2):
#### Answer/Update to Question/Claim
- We have extracted the materials from 30 brands, how do we automatically calculate a score for the materials? Is there a tool that exists that can calculate a score for materials?
    - We can use an existing tool (Selfless Clothes' Material Sustainability Calculator) to calculate a score for the materials extracted from the web scraping of brands. We can automate the process by using string concatenation and webscraping to extract the score.
- Domain Knowledge
    - Learned how to utilize URL's query parameters to get results that I want and scrape them.
#### Summary
- Now I am able to extract scores from an existing tool through an automated process, this will simplify the sustainability score calculation for our final product.
#### Uncertainty, Limitations & Caveats
- The code does not run very quickly.
- The score that is calculated is limited to the bias and research of Selfless Clothes' methodology and the Higg MSI.
#### Next Steps
- Next steps: scrape all the YouTube data and generate the scores for all of those videos so we can generate our final sustainability score.