Linear Regression Project

Predicting Average User-Ratings of Rock-Climbing Routes

Created By: Leah Nagy

Abstract

Predicting the average user-rating for rock-climbing routes in Kentucky using Linear Regression models was the goal of this project. By webscraping the Mountain Project's website, I collected information about each route that could be used to make predictions on future routes. After collecting the data, I ran through multiple types of regression models before arriving at a final model.

Design

Kentucky has some of the best rock climbing in the world and is considered the rock climbing mecca of the East coast. Rock climbing guides are a vital part of the community there. With over 3,000 routes to choose from, a rock climbing guide company wants to better understand what makes routes more desirable than others to provide the optimal experience for their clients.

Data

After some Exploratory Data Analysis and Feature Engineering, the dataset contains 1,582 routes. I collected 17 features on each route and the final model includes a total of 12 features. The data was collected from Mountain Project's website using Selenium and BeautifulSoup.

Algorithms

Feature Engineering

The route's share-date was changed to the number of years on the app for comparison
The number of ratings, comments, photos and ticks were added together since these features were highly correlated
Encoded categorical features
Added interaction variables:
- Difficulty Rating X Route Length
- Popularity / Route Age

Models

Simple Linear, Polynomial, Ridge & LASSO Regression were used. The final model used was a simple Linear Regression model with features removed according to the LASSO Regression results.

Model Evaluation and Selection

The entire dataset was split into a 60/20/20 - Training/Validation/Testing. I used 5-fold cross validation as I tested various models and scored them based on the validation set. I then combined the training and validation datasets for a final 80/20 (training/testing) split. The testing data was only used on the final model by using the same random state throughout.

The metric I used to score my models was Mean Absolute Error (MAE), because it would be in the same unit as my target. Without a need to further penalize outliers, MAE keeps the model more interpretable to stakeholders. While I focused on the MAE, I also worked to reduce multicollinearity, which also increased MAE. In the future I would like to try more interaction terms to improve the model's performance even more.

Final Simple Linear Regression Model Scores:

Training Data

Accuracy: 0.566

Testing Data

Accuracy: 0.572
MAE: 0.374

Tools

Selenium, BeautifulSoup & Requests for web scraping
Numpy and Pandas for data manipulation
Scikit-learn for modeling
Matplotlib and Seaborn for plotting

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
mtn_project_final_model.ipynb		mtn_project_final_model.ipynb
mtn_project_scrape_routes.ipynb		mtn_project_scrape_routes.ipynb
mtn_project_slides.pdf		mtn_project_slides.pdf
mtn_project_urls.ipynb		mtn_project_urls.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear Regression Project

Predicting Average User-Ratings of Rock-Climbing Routes

Table of Contents

Abstract

Design

Data

Algorithms

Feature Engineering

Models

Model Evaluation and Selection

Final Simple Linear Regression Model Scores:

Training Data

Testing Data

Tools

About

Releases

Packages

Languages

leahnagy/rock_climbing_predictor

Folders and files

Latest commit

History

Repository files navigation

Linear Regression Project

Predicting Average User-Ratings of Rock-Climbing Routes

Table of Contents

Abstract

Design

Data

Algorithms

Feature Engineering

Models

Model Evaluation and Selection

Final Simple Linear Regression Model Scores:

Training Data

Testing Data

Tools

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages