# Climbing Shoes Analysis Project

The primary goal of this project is to determine the most cost-efficient climbing shoes for an intermediate climber. My motivation for this project comes from both my love for climbing and my need for new climbing shoes. Shoes are such an important and powerful aspect of climbing, that they often make or break climbing attempts. I often joke that anytime I fail a climb, it's a problem with the shoes.

My projection for this project involves web-scraping information from the men's climbing shoes section from the REI (Recreational Equipment Inc.) website. Information I want to get include the shoe name, shoe brand, price, discount status, and ratings. Once I get this information, I want to utilize an SQL database to store the data and potentially create a prediction model that scores the shoes. Lastly, I want to summarize my findings in clean visualizations through Tableau and/or Python.

Let's get started! First things first, let's import some python packages import for web-scraping as well as SQL connecting.

In [13]:
# Import necessary packages for webscraping and data collection
import bs4
import requests
from bs4 import BeautifulSoup
import pandas as pd
from sqlalchemy import create_engine
import matplotlib.pyplot as plt
from splinter.browser import Browser
import numpy as np
from requests_html import AsyncHTMLSession
from requests_html import HTMLSession
import json

# Load SQL extension
%load_ext sql

# Matplot frontend
%matplotlib inline

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


### Step One: Webscraping

I originally only wanted to use the requests and Beautiful Soup packages to webscrape the REI site, but the products' ratings were dynamic html content, which Beautiful Soup can't access. Thus, I delved into the splinter package specifically for the ratings. This wasn't really what I wanted because I'm unfamiliar with this process and also I needed to download chromedriver.exe which opens a new browser while running the program. In addition, having this additional software means that I need to eventually update these extra parts when Chrome or chromedriver updates rollout.

In [14]:
def scrape_rei(link):
    temp_df = pd.DataFrame(columns = ['Shoe_Name',
                                      'Shoe_Company',
                                      'Current_Price',
                                      'Original_Price',
                                      'Discount',
                                      'Product_Link'])
    
    # Splinter browser opens new chrome browser window
    br = Browser('chrome')
    br.visit(link)

    # Make a request to the primary url and uses an html parser to gather data
    req = requests.get(link)
    soup = bs4.BeautifulSoup(req.text, 'html.parser')

    # Use find all to get a list of each product's html information
    page_lst = soup.find_all('li', {'class': 'pPe0GNuagvmEFURs1Q_vm'})

    # Create a for loop that runs through each product and uses the find function
    # to aquire specific data observations
    for element in page_lst:
        shoe_name = element.find('div', {'class': 'r9nAQ5Ik_3veCKyyZkP0b'}).text
        if shoe_name[-8:] == " - Men's":
            shoe_name = shoe_name[:-8]

        shoe_company = element.find('div', {'class': '_1fwp3k8dh1lbhAAenp87CH'}).text

        if element.find('span', {'class': '_2xZVXKL4Bd0pJyQCumYi9P'}) != None:
            current_price = element.find('span', {'class': '_2xZVXKL4Bd0pJyQCumYi9P'}).text
            original_price = element.find('span', {'class': 'Dm_X3ktyv_w_gPWuYlJf_'}).text
            original_price = original_price[11:]
        else:
            current_price = element.find('div', {'class': '_1zwqhlCzOK-xETXwFg_-iZ'}).text
            original_price = element.find('div', {'class': '_1zwqhlCzOK-xETXwFg_-iZ'}).text

        if current_price != original_price:
            on_sale = True
        else:
            on_sale = False
            
        product_link = element.find('a')['href']

        # Append each row to the shoes dataframe
        temp_row = [shoe_name, shoe_company, current_price, original_price, on_sale, 'https://www.rei.com' + product_link]
        temp_df.loc[len(temp_df)] = temp_row
        
    # Use the splinter browser to gather dynamic html data of ratings
    # Also clean the data as we scrape it
    ratings = []
    for item in br.find_by_tag('dt'):
        rating = item.text.strip()[6:]
        if (rating[0].isalnum() == False):
            rating = rating[6:]
            rating = rating.split('s')[0].strip()
        else:
            rating = np.nan
        ratings.append(rating)        
    temp_df['Rating'] = ratings
    
    # Same as above, use splinter browser to gather dynamic html content
    rating_counts = []
    for item in br.find_by_tag('dd'):
        counts = item.text.strip('()')
        rating_counts.append(counts)
    temp_df['Rating_Count'] = rating_counts
    
    br.quit()
    
    return temp_df

In [15]:
# Create a dataframe to store data in
shoes_df = pd.DataFrame(columns = ['Shoe_Name',
                                   'Shoe_Company',
                                   'Current_Price',
                                   'Original_Price',
                                   'Discount',
                                   'Product_Link'])

# Define the primary url
master_url = 'https://www.rei.com/c/mens-climbing-shoes?page='
num = 1
link = master_url + str(num)

while requests.get(link).status_code != 404:
    link = master_url + str(num)
    print(shoes_df)
    shoes_df = pd.concat([shoes_df, scrape_rei(link)], axis = 0, ignore_index = True)
    num = num + 1

Empty DataFrame
Columns: [Shoe_Name, Shoe_Company, Current_Price, Original_Price, Discount, Product_Link]
Index: []
                        Shoe_Name   Shoe_Company    Current_Price  \
0        Rogue VCS Climbing Shoes       Five Ten           $39.93   
1          Quantum Climbing Shoes       Five Ten           $69.93   
2      Instinct VS Climbing Shoes         Scarpa          $141.69   
3         Momentum Climbing Shoes  Black Diamond           $46.73   
4           Origin Climbing Shoes         Scarpa           $74.19   
..                            ...            ...              ...   
25  Momentum Vegan Climbing Shoes  Black Diamond           $94.95   
26          Origin Climbing Shoes         Scarpa  $71.19 - $75.93   
27           Miura Climbing Shoes    La Sportiva          $165.00   
28          Genius Climbing Shoes    La Sportiva          $195.00   
29          Shaman Climbing Shoes          evolv          $170.00   

   Original_Price Discount                             

In [113]:
def scrape_product(link_lst):
    
    product_df = pd.DataFrame(columns = ['Description',
                'Climbing_Type',
                'Difficulty', 
                'Lasting',
                'Upper_Material', 
                'Lining_Material', 
                'Outsole_Material', 
                'Strap_Type', 
                'Resoleability', 
                'Gender',     
                'Weight'])
    
    for link in link_lst:
        
        temp_row = []
        
        # Make a request to the primary url and uses an html parser to gather data
        req = requests.get(link)
        soup = bs4.BeautifulSoup(req.text, 'html.parser')
        
        if 'rei-garage' in link:   
            temp_row.append(soup.find_all('meta')[12].attrs['content'])
            
            shoe_json = json.loads(soup.find('script', {'id': 'page-data'}).text)

            for json_str in shoe_json['product']['specifications']['specs']:
                print(json_str)
                temp_row.append(json_str['values'][0])      
                
        else:
        
            temp_row.append(soup.find_all('meta')[12].attrs['content'])

            shoe_json = json.loads(soup.find('script', {'data-client-store': 'product-details'}).text)
            for json_str in shoe_json['specs']:
                print(json_str)
                if json_str['name'] == 'Vegan':
                    continue
                temp_row.append(json_str['values'][0])
        
        display(product_df)
                
        
        print(len(temp_row))
        print(temp_row)
        
        product_df.loc[len(product_df)] = temp_row

        
    return product_df

In [114]:
scrape_product(shoes_df['Product_Link'])

{'name': 'Best Use', 'values': ['Rock Climbing'], 'displayOrder': 1, 'code': 50}
{'name': 'Climbing Shoe Type', 'values': ['Neutral'], 'displayOrder': 2, 'featuredSpecificationDisplayOrder': 1, 'code': 619}
{'name': 'Last', 'values': ['Slip-lasted'], 'displayOrder': 3, 'featuredSpecificationDisplayOrder': 2, 'code': 395}
{'name': 'Upper', 'values': ['Split-grain rubber'], 'displayOrder': 4, 'featuredSpecificationDisplayOrder': 4, 'code': 742}
{'name': 'Lining', 'values': ['Stretch arch lining'], 'displayOrder': 5, 'code': 419}
{'name': 'Outsole', 'values': ['Stealth® C4 rubber'], 'displayOrder': 6, 'code': 488}
{'name': 'Footwear Closure', 'values': ['Strap'], 'displayOrder': 7, 'code': 807}
{'name': 'Can Be Resoled', 'values': ['Yes'], 'displayOrder': 8, 'featuredSpecificationDisplayOrder': 3, 'code': 164}
{'name': 'Gender', 'values': ["Men's"], 'displayOrder': 10, 'code': 51}
{'name': 'Weight (Pair)', 'values': ['15.5 ounces'], 'displayOrder': 11, 'code': 767}


Unnamed: 0,Description,Climbing_Type,Difficulty,Lasting,Upper_Material,Lining_Material,Outsole_Material,Strap_Type,Resoleability,Gender,Weight


11
["Built with the basics in mind for beginning climbers, the men's Five Ten Rogue VCS climbing shoes will help you develop precise footwork and build foot strength for long practice sessions in the gym. Available at REI, 100% Satisfaction Guaranteed.", 'Rock Climbing', 'Neutral', 'Slip-lasted', 'Split-grain rubber', 'Stretch arch lining', 'Stealth® C4 rubber', 'Strap', 'Yes', "Men's", '15.5 ounces']
{'name': 'Best Use', 'values': ['Rock Climbing'], 'displayOrder': 1, 'code': 50}
{'name': 'Climbing Shoe Type', 'values': ['Aggressive'], 'displayOrder': 2, 'featuredSpecificationDisplayOrder': 1, 'code': 619}
{'name': 'Last', 'values': ['Slip-lasted'], 'displayOrder': 3, 'featuredSpecificationDisplayOrder': 2, 'code': 395}
{'name': 'Upper', 'values': ['Synthetic Clarino'], 'displayOrder': 4, 'featuredSpecificationDisplayOrder': 4, 'code': 742}
{'name': 'Lining', 'values': ['Synthetic Clarino'], 'displayOrder': 5, 'code': 419}
{'name': 'Outsole', 'values': ['4mm Stealth C4 rubber'], 'disp

Unnamed: 0,Description,Climbing_Type,Difficulty,Lasting,Upper_Material,Lining_Material,Outsole_Material,Strap_Type,Resoleability,Gender,Weight
0,Built with the basics in mind for beginning cl...,Rock Climbing,Neutral,Slip-lasted,Split-grain rubber,Stretch arch lining,Stealth® C4 rubber,Strap,Yes,Men's,15.5 ounces


11
['Five Ten worked with professional climbers Alex and Thomas Huber to redesign the classic Quantum Rock Shoes for all-around performance in just about any climbing setting. Available at REI, 100% Satisfaction Guaranteed.', 'Rock Climbing', 'Aggressive', 'Slip-lasted', 'Synthetic Clarino', 'Synthetic Clarino', '4mm Stealth C4 rubber', 'Lace-up', 'Yes', "Men's", '1 lb. 0.6 oz. ounces']
{'name': 'Best Use', 'values': ['Rock Climbing'], 'displayOrder': 1, 'code': 50}
{'name': 'Climbing Shoe Type', 'values': ['Aggressive'], 'displayOrder': 2, 'featuredSpecificationDisplayOrder': 1, 'code': 619}
{'name': 'Last', 'values': ['Slip-lasted'], 'displayOrder': 3, 'featuredSpecificationDisplayOrder': 2, 'code': 395}
{'name': 'Upper', 'values': ['Synthetic leather'], 'displayOrder': 4, 'featuredSpecificationDisplayOrder': 4, 'code': 742}
{'name': 'Lining', 'values': ['No'], 'displayOrder': 5, 'code': 419}
{'name': 'Outsole', 'values': ['Vibram XS Edge rubber'], 'displayOrder': 6, 'code': 488}
{'n

Unnamed: 0,Description,Climbing_Type,Difficulty,Lasting,Upper_Material,Lining_Material,Outsole_Material,Strap_Type,Resoleability,Gender,Weight
0,Built with the basics in mind for beginning cl...,Rock Climbing,Neutral,Slip-lasted,Split-grain rubber,Stretch arch lining,Stealth® C4 rubber,Strap,Yes,Men's,15.5 ounces
1,Five Ten worked with professional climbers Ale...,Rock Climbing,Aggressive,Slip-lasted,Synthetic Clarino,Synthetic Clarino,4mm Stealth C4 rubber,Lace-up,Yes,Men's,1 lb. 0.6 oz. ounces


11
['Built for sport climbers and boulderers who demand sensitivity and power, the Scarpa Instinct VS climbing shoes have a precise fit, tough synthetic uppers and grippy soles to help you send. Available at REI, 100% Satisfaction Guaranteed.', 'Rock Climbing', 'Aggressive', 'Slip-lasted', 'Synthetic leather', 'No', 'Vibram XS Edge rubber', 'Strap', 'Yes', "Men's", '1 lb. 0.5 oz.']
{'name': 'Best Use', 'values': ['Rock Climbing'], 'displayOrder': 1, 'code': 50}
{'name': 'Climbing Shoe Type', 'values': ['Neutral'], 'displayOrder': 2, 'featuredSpecificationDisplayOrder': 1, 'code': 619}
{'name': 'Last', 'values': ['Slip-lasted'], 'displayOrder': 3, 'featuredSpecificationDisplayOrder': 2, 'code': 395}
{'name': 'Upper', 'values': ['Synthetic knit'], 'displayOrder': 4, 'featuredSpecificationDisplayOrder': 4, 'code': 742}
{'name': 'Lining', 'values': ['Hemp'], 'displayOrder': 5, 'code': 419}
{'name': 'Outsole', 'values': ['Rubber'], 'displayOrder': 6, 'code': 488}
{'name': 'Footwear Closure'

Unnamed: 0,Description,Climbing_Type,Difficulty,Lasting,Upper_Material,Lining_Material,Outsole_Material,Strap_Type,Resoleability,Gender,Weight
0,Built with the basics in mind for beginning cl...,Rock Climbing,Neutral,Slip-lasted,Split-grain rubber,Stretch arch lining,Stealth® C4 rubber,Strap,Yes,Men's,15.5 ounces
1,Five Ten worked with professional climbers Ale...,Rock Climbing,Aggressive,Slip-lasted,Synthetic Clarino,Synthetic Clarino,4mm Stealth C4 rubber,Lace-up,Yes,Men's,1 lb. 0.6 oz. ounces
2,Built for sport climbers and boulderers who de...,Rock Climbing,Aggressive,Slip-lasted,Synthetic leather,No,Vibram XS Edge rubber,Strap,Yes,Men's,1 lb. 0.5 oz.


11
["With breathable fabric uppers and neutral lasts, Black Diamond Momentum men's climbing shoes deliver excellent comfort without sacrificing performance. They're a great choice for new climbers.", 'Rock Climbing', 'Neutral', 'Slip-lasted', 'Synthetic knit', 'Hemp', 'Rubber', 'Strap', 'Yes', "Men's", '15.6 ounces']
{'name': 'Best Use', 'values': ['Rock Climbing'], 'displayOrder': 1, 'code': 50}
{'name': 'Climbing Shoe Type', 'values': ['Neutral'], 'displayOrder': 2, 'featuredSpecificationDisplayOrder': 1, 'code': 619}
{'name': 'Last', 'values': ['Slip-lasted'], 'displayOrder': 3, 'featuredSpecificationDisplayOrder': 2, 'code': 395}
{'name': 'Upper', 'values': ['Suede leather'], 'displayOrder': 4, 'featuredSpecificationDisplayOrder': 4, 'code': 742}
{'name': 'Outsole', 'values': ['Vision rubber'], 'displayOrder': 6, 'code': 488}
{'name': 'Footwear Closure', 'values': ['Strap'], 'displayOrder': 7, 'code': 807}
{'name': 'Can Be Resoled', 'values': ['Yes'], 'displayOrder': 8, 'featuredSp

Unnamed: 0,Description,Climbing_Type,Difficulty,Lasting,Upper_Material,Lining_Material,Outsole_Material,Strap_Type,Resoleability,Gender,Weight
0,Built with the basics in mind for beginning cl...,Rock Climbing,Neutral,Slip-lasted,Split-grain rubber,Stretch arch lining,Stealth® C4 rubber,Strap,Yes,Men's,15.5 ounces
1,Five Ten worked with professional climbers Ale...,Rock Climbing,Aggressive,Slip-lasted,Synthetic Clarino,Synthetic Clarino,4mm Stealth C4 rubber,Lace-up,Yes,Men's,1 lb. 0.6 oz. ounces
2,Built for sport climbers and boulderers who de...,Rock Climbing,Aggressive,Slip-lasted,Synthetic leather,No,Vibram XS Edge rubber,Strap,Yes,Men's,1 lb. 0.5 oz.
3,With breathable fabric uppers and neutral last...,Rock Climbing,Neutral,Slip-lasted,Synthetic knit,Hemp,Rubber,Strap,Yes,Men's,15.6 ounces


10
["Offering an ideal introduction to climbing, the durable men's Scarpa Origin climbing shoes feature a flat-lasted profile, sticky rubber and plush suede uppers that give the feeling of padded luxury. Available at REI, 100% Satisfaction Guaranteed.", 'Rock Climbing', 'Neutral', 'Slip-lasted', 'Suede leather', 'Vision rubber', 'Strap', 'Yes', "Men's", '14.4 ounces']


ValueError: cannot set a row with mismatched columns

In [69]:
req = requests.get('https://www.rei.com/rei-garage/product/174923/black-diamond-momentum-climbing-shoes-mens')
soup = bs4.BeautifulSoup(req.text, 'html.parser')

detailed_df = pd.DataFrame{'Description': [], 
                'Difficulty': [], 
                'Lasting': [],
                'Upper_Material': [], 
                'Lining_Material': [], 
                'Outsole_Material': [], 
                'Strap_Type': [], 
                'Resoleability': [], 
                'Weight': []}

#shoes_dict['Description'].append(soup.find_all('meta')[12].attrs['content'])

#print(soup.find('script', {'id': 'page-data'}).text)

shoe_json = json.loads(soup.find('script', {'id': 'page-data'}).text)

for json_str in shoe_json['product']['specifications']['specs']:
    temp_row = []
    temp_row.append(json_str['values'][0])
    detailed_df.loc[len(detailed_df)] = temp_row

print(shoe_json['product']['specifications']['specs'])


Rock Climbing
Neutral
Slip-lasted
Synthetic knit
Hemp
Rubber
Strap
Yes
Men's
15.6 ounces
[{'name': 'Best Use', 'values': ['Rock Climbing'], 'displayOrder': 1, 'code': 50}, {'name': 'Climbing Shoe Type', 'values': ['Neutral'], 'displayOrder': 2, 'featuredSpecificationDisplayOrder': 1, 'code': 619}, {'name': 'Last', 'values': ['Slip-lasted'], 'displayOrder': 3, 'featuredSpecificationDisplayOrder': 2, 'code': 395}, {'name': 'Upper', 'values': ['Synthetic knit'], 'displayOrder': 4, 'featuredSpecificationDisplayOrder': 4, 'code': 742}, {'name': 'Lining', 'values': ['Hemp'], 'displayOrder': 5, 'code': 419}, {'name': 'Outsole', 'values': ['Rubber'], 'displayOrder': 6, 'code': 488}, {'name': 'Footwear Closure', 'values': ['Strap'], 'displayOrder': 7, 'code': 807}, {'name': 'Can Be Resoled', 'values': ['Yes'], 'displayOrder': 8, 'featuredSpecificationDisplayOrder': 3, 'code': 164}, {'name': 'Gender', 'values': ["Men's"], 'displayOrder': 10, 'code': 51}, {'name': 'Weight (Pair)', 'values': ['15.

### Step 2: Connect to an SQL database

Here, I'm using the sqlalchemy package's engine function to establish a connection to my postgresql database where I create an sql table from my existing pandas dataframe.

In [None]:
# Create engine connected to sql database
engine = create_engine('postgresql://postgres:postgres@localhost:5432/Climbing_Shoes', echo=False)

In [None]:
# Convert pandas dataframe to SQL database
shoes_df.to_sql('shoes', engine, if_exists='replace')

### Step 3: Data Cleaning

In order to perform data analysis, we need to decide what kind of questions we want to ask about the climbing shoes with the data that we have. Let's start with some basic questions, and utilize visuals to help us answer our questions.

But first thing's first, let's clean the data enough to perform some comparisons and groupings.

In [None]:
# Make a copy so we don't alter the original dataframe
clean_df = shoes_df.copy()

In [None]:
# Strip '$' from the Price columns
clean_df['Current_Price'] = clean_df['Current_Price'].str.replace('$', '')
clean_df['Original_Price'] = clean_df['Original_Price'].str.replace('$', '')

At this point, I wanted to convert the `Current_Price` and `Original_Price` columns to float type at the same time as stripping the '\\$' symbol, but I encountered edge cases of price ranges. For example, the Origin Climbing Shoes from Scarpa has a price range of \\$71.19 - \\$75.93, which can't be converted to a float even after removing the \\$ symbols. I was able to bypass this problem by split the column value by the '-' symbol, so price ranges would have two values in the list and single prices would just be the lone value in the list. Then, I simply replaced the list value with its average.

In [None]:
# Define a function I can apply to the Price Series to get average of a list
def avg_lst(lst):
    lst = list(map(float, lst)) 
    return sum(lst) / len(lst)

In [None]:
# Split the price value and apply the avg_lst function
clean_df['Current_Price'] = clean_df['Current_Price'].str.split("-", n = 1, expand = False)
clean_df['Current_Price'] = clean_df['Current_Price'].apply(avg_lst)
clean_df['Original_Price'] = clean_df['Original_Price'].str.split("-", n = 1, expand = False)
clean_df['Original_Price'] = clean_df['Original_Price'].apply(avg_lst)
clean_df.head()

In [None]:
clean_df['Rating'] = clean_df['Rating'].str.split().str[0].astype(float)
clean_df['Rating_Count'] = clean_df['Rating_Count'].astype(int)
clean_df.head()

### Step 4: Exploratory Data Analysis!

Yay! Now that the boring data cleaning section is (mostly) over, let's think about some questions we're interested in.

- Which shoe brand is the most popular?
- Which shoe has the highest rating on average?
- Which shoe is the most popular?
- What is the most expensive shoe?
- Which shoe brand charges the most on average?
- Do discounted shoes have higher or lower ratings?

In [None]:
# Use the groupby function to add the total Rating_Count of each shoe brand
sum_by_brand = clean_df.groupby('Shoe_Company').sum().sort_values('Rating_Count', ascending=False)
plt.title('Rating Counts by Brand')
plt.barh(sum_by_brand['Rating_Count'].index, sum_by_brand['Rating_Count'])

In [None]:
# Use value counts to see how many shoe listings there are by each brand
counts = clean_df['Shoe_Company'].value_counts().rename('Shoe_Counts')
plt.title('Number of Shoe Listings by Brand')
plt.barh(counts.index, counts)

In [None]:
# Use pandas concat function to join the two series together keyed on the index
ratings_per_shoe = pd.concat([counts, sum_by_brand['Rating_Count']], axis=1)
ratings_per_shoe['Ratings_per_shoe'] = ratings_per_shoe['Rating_Count'] / ratings_per_shoe['Shoe_Counts']
plt.title('Average Ratings per Shoe by Brand')
plt.barh(ratings_per_shoe.index, ratings_per_shoe['Ratings_per_shoe'])

Here, we can see that La Sportiva has significantly more ratings than other brands and also has the most ratings per shoe listing. From this, we can identify that La Sportiva either sells many more products than their competitors which increases the number of customer reviews, or La Sportiva sells the same amount as other brands but their customers are more inclined to leave reviews on their products. This is solid evidence that La Sportiva is the most popular brand.

In [None]:
# Groupby mean to identify the average ratings for each company
avg_by_brand = clean_df.groupby('Shoe_Company').mean().sort_values('Rating', ascending = False)

display(avg_by_brand)

# Create horizontal bar plot with labels
fig, ax = plt.subplots()
plt.title('Average Rating by Brand')
plt.barh(avg_by_brand.index, avg_by_brand['Rating'])

for i, v in enumerate(avg_by_brand['Rating']):
    ax.text(v + 0.1, i - 0.11, str(('%.2f' %v)), color='black', fontweight='bold')

With this horizontal bar graph, we can see that Metolius and Butora have relatively higher average ratings than the other brands. However in the dataframe above it, we see that they have 5 and 2.8 average number of ratings total in comparison to other brands like La Sportiva that boast 29.46 average ratings per shoe. With small sample sizes (low rating counts), it's always hard to determine the validity of the ratings because the rating is determined by such few individuals. 

Because of this, I think we can say that Butora, Metolius, and even TENAYA shoes are a little riskier to buy with potentially great shoes, but for products with more reliable ratings, La Sportiva, evolv, and Scarpa are your go to.

**We also didn't account for the shoes with 0 rating counts. Further analysis could involve imputation or additional webscraping from other websites.**

In [None]:
# Sort shoes by rating count to see which shoes have the most ratings
popular_shoes = clean_df.sort_values('Rating_Count', ascending = False).head()

# Create pie chart of the top 5 climbing shoes with highest rating counts
explode = (0.1, 0, 0, 0, 0)
fig1, ax1 = plt.subplots()
ax1.pie(popular_shoes['Rating_Count'], explode=explode, labels=popular_shoes['Shoe_Name'], autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()

Since we don't have the exact number of purchases of each shoe, the only way we can determine the popularity of each shoe is by its rating count. By this standard, the TC Pro Climbing Shoes were the most popular.

In [None]:
# Sort shoes by original and current price to identify the most expensive shoes
orig_price = clean_df.sort_values('Original_Price', ascending = False)
display(orig_price.head())

shoes_price = clean_df.sort_values('Current_Price', ascending = False)
display(shoes_price.head())

Here, we can see Scarpa offers the two most expensive shoes, both priced at \\$199.00. However, these two shoes are currently on sale, making them significantly cheaper. 4 of the 5 currently most expensive shoes are of La Sportiva brand, and they are relatively well-rated, excluding the Theory Climbing Shoes which haven't been rated yet. 

In [None]:
discounted = clean_df.groupby('Discount').mean()
discounted

From this dataframe, we can conclude that discounted shoes typically have lower and less ratings. While it's not a statistically significant difference, we can justify this finding because business models tend to discount products that aren't being sold much. Lower rated products don't sell as well as highly rated products, so it makes sense for brands to discount lower rated products. However, another strategy involves discounting hot products to boost its sales even more. With both these strategies and the lack of statistically significant evidence, we fail to conclude a difference in rating based on discounts.

For future analysis, I think it would be really interesting to see the relationship between dollar amount discounted and the ratings.

## Decision Making

In [None]:
hi