# How do established organizations like Eco-Stylist and Sustainable Review rate fashion brand sustainability?

While our team can collect information about brands directly from their websites, the information required to make an accurate assessment of brand sustainability is often unavailable. However, established organizations like Eco-Stylist and Sustainable Review have already dedicated time and resources to collect and analyze the relevant data for rating brand sustainability. They are also transparent about their review methodologies. Therefore, we are motivated to create datasets of the brands, sustainability ratings, and relevant factors available on their websites. Examining the criteria and methodologies used by each organization will help us develop our own formula for calculating sustainability ratings as well.

Eco-Stylist behaves as a guide to ethical and eco-friendly fashion while Sustainability Review publishes sustainability content weekly. We will collect data from the websites via web scraping with the python library BeautifulSoup. We choose to use BeautifulSoup because it is well documented. However, we are relying on the assumption that all brands which the organizations have collected sustainability data on are publicly available. 

In [13]:
# imports
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from bs4 import BeautifulSoup
import pandas as pd
import re
import csv

In [14]:
# create session
session = requests.Session()
# retry three times in case of exception
retry = Retry(connect=3, backoff_factor=0.5)
# apply delays between attempts
adapter = HTTPAdapter(max_retries=retry)
session.mount('https://', adapter)

## Part 1. Scraping Brand Review from Eco-Stylist

In [15]:
# scrape urls to brand reviews
eco_stylist_brands = session.get("https://www.eco-stylist.com/sustainable-brands/")
content = BeautifulSoup(eco_stylist_brands.text, 'html.parser')
# all links
page_links = content.find_all("a")
# links to brand reviews
brand_urls = set(link.get('href') for link in page_links if ("https://www.eco-stylist.com/ethical-brand/" in link.get('href')) and (link.get('href') != "https://www.eco-stylist.com/ethical-brand/"))

print(brand_urls)

{'https://www.eco-stylist.com/ethical-brand/taylor-stitch/', 'https://www.eco-stylist.com/ethical-brand/reprise/', 'https://www.eco-stylist.com/ethical-brand/unspun/', 'https://www.eco-stylist.com/ethical-brand/groceries-apparel/', 'https://www.eco-stylist.com/ethical-brand/po-zu/', 'https://www.eco-stylist.com/ethical-brand/kotn/', 'https://www.eco-stylist.com/ethical-brand/toadco/', 'https://www.eco-stylist.com/ethical-brand/hernest-project/', 'https://www.eco-stylist.com/ethical-brand/naadam/', 'https://www.eco-stylist.com/ethical-brand/edwin/', 'https://www.eco-stylist.com/ethical-brand/coalatree/', 'https://www.eco-stylist.com/ethical-brand/known-supply/', 'https://www.eco-stylist.com/ethical-brand/ten-thousand-villages/', 'https://www.eco-stylist.com/ethical-brand/isto/', 'https://www.eco-stylist.com/ethical-brand/kindom/', 'https://www.eco-stylist.com/ethical-brand/no-nasties/', 'https://www.eco-stylist.com/ethical-brand/beckett-simonon/', 'https://www.eco-stylist.com/ethical-br

In [6]:
# create dataframe 
eco_stylist_df = pd.DataFrame(columns=['Brand', 'Overall', 'Transparency', 'Fair Labor', 'Sustainably Made', 'URL'])

In [8]:
# scrape for brand data
for url in brand_urls:
    # search for brand review
    review = session.get(url)
    
    try:
        # content
        content = BeautifulSoup(review.text, 'html.parser')

        # brand name
        brand = content.find('h1').get_text()

        # overall rating
        overall = content.find(string=re.compile("Overall Rating:")).split(" ")[2]
        
        # transparency, fair labor, sustainably made
        ratings = content.find_all(string=re.compile("Rated:"))
        transparency = ratings[0].split(" ")[1]
        fair_labor = ratings[1].split(" ")[1]
        sustainably_made = ratings[2].split(" ")[1]

        # update dataframe
        eco_stylist_df.loc[len(eco_stylist_df.index)] = [brand, overall, transparency, fair_labor, sustainably_made, url]

    except:
        print(url + " Failed to Load")

### Visualize the first 10 rows of the dataframe:

In [9]:
eco_stylist_df.head(10)

Unnamed: 0,Brand,Overall,Transparency,Fair Labor,Sustainably Made,URL
0,Taylor Stitch,Silver,Good,Good,Excellent,https://www.eco-stylist.com/ethical-brand/tayl...
1,Reprise,Certified,Good,Good,Excellent,https://www.eco-stylist.com/ethical-brand/repr...
2,Unspun,Certified,Good,Fair,Excellent,https://www.eco-stylist.com/ethical-brand/unspun/
3,Groceries Apparel,Certified,Good,Good,Good,https://www.eco-stylist.com/ethical-brand/groc...
4,Po-Zu,Certified,Good,Good,Excellent,https://www.eco-stylist.com/ethical-brand/po-zu/
5,Kotn,Silver,Excellent,Good,Good,https://www.eco-stylist.com/ethical-brand/kotn/
6,Toad&Co,Silver,Fair,Good,Excellent,https://www.eco-stylist.com/ethical-brand/toadco/
7,Hernest Project,Certified,Excellent,Good,Good,https://www.eco-stylist.com/ethical-brand/hern...
8,NAADAM,Certified,Good,Good,Excellent,https://www.eco-stylist.com/ethical-brand/naadam/
9,EDWIN,Certified,Excellent,Fair,Excellent,https://www.eco-stylist.com/ethical-brand/edwin/


In [10]:
# export csv file
eco_stylist_df.to_csv('../data/eco_stylist_ratings.csv')

## Part 2. Scraping Brand Review from Sustainable Review

In [16]:
# get page range
first_page_url = "https://sustainablereview.com/brand-ratings/"
first_page = session.get(first_page_url)
content = BeautifulSoup(first_page.text, 'html.parser')
# last page number out of all page numbers
last_page_num = str(content.find_all('a', class_="page-numbers")[-1]).split('>')[1].split('<')[0]

print(last_page_num)

38


In [17]:
# get brand links from all pages
brand_urls = []
# loop through all pages
for i in range(1, int(last_page_num) + 1):
    # scrape first page
    if i == 1:
        page = first_page
    # scrape remaining pages
    else:
        next_page_url = "https://sustainablereview.com/brand-ratings/?query-48-page="
        page = session.get(next_page_url + str(i))

    # content
    content = BeautifulSoup(page.text, 'html.parser')

    # links to brand reviews
    page_links = content.find_all("a")
    brands = set(link.get('href') for link in page_links if ("https://sustainablereview.com/brand-ratings/" in link.get('href')) and (link.get('href') != "https://sustainablereview.com/brand-ratings/"))

    # update brand urls list
    brand_urls.extend(brands)

print(brand_urls)

['https://sustainablereview.com/brand-ratings/division/', 'https://sustainablereview.com/brand-ratings/adidas/', 'https://sustainablereview.com/brand-ratings/adarche-clothing/', 'https://sustainablereview.com/brand-ratings/a-dam/', 'https://sustainablereview.com/brand-ratings/aestethic-london/', 'https://sustainablereview.com/brand-ratings/adelaide-c-ecoage/', 'https://sustainablereview.com/brand-ratings/337-brand/', 'https://sustainablereview.com/brand-ratings/a_c/', 'https://sustainablereview.com/brand-ratings/a-roege-hove/', 'https://sustainablereview.com/brand-ratings/adidas-by-stella-mccartney/', 'https://sustainablereview.com/brand-ratings/absolutely-bear/', 'https://sustainablereview.com/brand-ratings/acbc/', 'https://sustainablereview.com/brand-ratings/a-bch/', 'https://sustainablereview.com/brand-ratings/aeance/', 'https://sustainablereview.com/brand-ratings/afends/', 'https://sustainablereview.com/brand-ratings/a-happy-brand/', 'https://sustainablereview.com/brand-ratings/les

In [22]:
# create dataframe
sustainable_review_df = pd.DataFrame(columns=['Brand', 'Rating', 'Factors'])

In [26]:
# scrape for brand data
for url in brand_urls:
    try: 
        # search for brand review
        review = session.get(url)
        
        # content
        content = BeautifulSoup(review.text, 'html.parser')

        # brand
        brand = content.find('h1', class_='post-title').get_text()

        # rating
        information = content.find('div', class_='InfoBox')
        rating = information.find('p').get_text().split(" ")[3]
        
        # factors
        body = content.find('div', class_='col-md-12 col-lg-9')
        factors = str(body.find_all('h3')).split(', ')

        # clean list of factors
        cleaned_factors = []
        for factor in factors: 
            cleaned_factor = factor.replace("[","").replace("]","").replace('<h3>', '').replace('<strong>', '').replace('</h3>', '').replace('</strong>', '').replace("Similar brands:","")
            
            # drop ':' from factor
            if cleaned_factor.endswith(":"):
                cleaned_factor = cleaned_factor[:-1]

            # exclude headings with "Conclusion"
            if "Conclusion" in cleaned_factor:
                cleaned_factor = ""

            # append cleaned factor to list if the item is not emtpy
            if cleaned_factor != "":
                cleaned_factors.append(cleaned_factor)

        # update dataframe
        sustainable_review_df.loc[len(sustainable_review_df.index)] = [brand, rating, ", ".join(cleaned_factors)]
    
    except:
        print(url + " Failed to Load")

https://sustainablereview.com/brand-ratings/the-social-studio/ Failed to Load
https://sustainablereview.com/brand-ratings/the-white-ribbon/ Failed to Load


### Visualize the first 10 rows of the dataframe:

In [27]:
# check dataframe
sustainable_review_df.head(10)

Unnamed: 0,Brand,Rating,Factors
0,Modibodi,3,
1,Natalie Perry,4,
2,(di)vision,4,
3,Adidas,3,Adidas’ Global Recognition for Sustainability ...
4,Adarche Clothing,4,"Protecting the Planet with Adarche Clothing, E..."
5,A-dam,4,
6,Aestethic London,5,
7,Adelaide C. Ecoage,4,
8,337 BRAND,4,
9,A_C,4,


In [28]:
# export csv file
sustainable_review_df.to_csv('../data/sustainable_review_ratings.csv')

Eco-Stylist assess brand sustainability based on the criteria transparency, fair labor, and sustainably made. Then, it assigns an overall rating. Sustainable Review only generates an overall rating, but it provides a description of the research that went into calculating the score. While eco-stylist graded sustainability on a scale of bronze to gold, Sustainable Review graded sustainability numerically from 1-5. This method of collecting sustainability data on brands is limited by the number of brands each organization has rated. Going forward, we will scrape additional organizations which review fashion brands to increase the number of ratings we can sythesize or compare ours against.
