### Big Mountain Guided Capstone Documentation

#### Problem Identification and Context
Big Mountain Resort came to us with suspicions that they were not capitalizing on their facilities as much as they could. Their facilities included 105 runs, 14 ski lifts, a 3.3 mile long run, a base elevation of 4464 feet, a summit height of 6817 feet, and vertical drop of 2353 feet. They had recently installed a new chair lift at an operational cost of 1,540,000 this season. Big Mountain wanted guidance on how to better price their ticket and how to make some operational changes that would reduce costs without reducing ticket price or possibly support increasing ticket price.
#### Problem Statement
How can Big Mountain Resort better price their tickets based on the value provided by their facilities during the next ski season?

### Imports
Libraries needed for documentation.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import html5lib

from library.sb_utils import save_file

### Data Wrangling
Importing our ski resort data.

In [5]:
ski_data=pd.read_csv('../raw_data/ski_resort_data.csv')

#### Check for null values
We started data wrangling by checking how much data was missing by using the .info() method

In [6]:
ski_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 330 entries, 0 to 329
Data columns (total 27 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Name               330 non-null    object 
 1   Region             330 non-null    object 
 2   state              330 non-null    object 
 3   summit_elev        330 non-null    int64  
 4   vertical_drop      330 non-null    int64  
 5   base_elev          330 non-null    int64  
 6   trams              330 non-null    int64  
 7   fastEight          164 non-null    float64
 8   fastSixes          330 non-null    int64  
 9   fastQuads          330 non-null    int64  
 10  quad               330 non-null    int64  
 11  triple             330 non-null    int64  
 12  double             330 non-null    int64  
 13  surface            330 non-null    int64  
 14  total_chairs       330 non-null    int64  
 15  Runs               326 non-null    float64
 16  TerrainParks       279 non

Another way to check how much data in our dataset is missing.

In [8]:
missing = pd.concat([ski_data.isnull().sum(), 100 * ski_data.isnull().mean()], axis=1)
missing.columns=['count', '%']
missing.sort_values(by='count',ascending=False)

Unnamed: 0,count,%
fastEight,166,50.30303
NightSkiing_ac,143,43.333333
AdultWeekday,54,16.363636
AdultWeekend,51,15.454545
daysOpenLastYear,51,15.454545
TerrainParks,51,15.454545
projectedDaysOpen,47,14.242424
Snow Making_ac,46,13.939394
averageSnowfall,14,4.242424
LongestRun_mi,5,1.515152


We get a list of the column names along with a count of the number of non-null values. We will check that Big Mountain is in the data and is not missing any values.

In [7]:
ski_data[ski_data.Name=='Big Mountain Resort'].T

Unnamed: 0,151
Name,Big Mountain Resort
Region,Montana
state,Montana
summit_elev,6817
vertical_drop,2353
base_elev,4464
trams,0
fastEight,0.0
fastSixes,0
fastQuads,3


We find that Big Mountain has no null values.

#### Check that rows are unique
We want to prevent duplicate data in our dataset.

In [9]:
ski_data['Name'].value_counts().head()

Name
Crystal Mountain    2
Alyeska Resort      1
Brandywine          1
Boston Mills        1
Alpine Valley       1
Name: count, dtype: int64

In [10]:
ski_data[ski_data['Name']=='Crystal Mountain']

Unnamed: 0,Name,Region,state,summit_elev,vertical_drop,base_elev,trams,fastEight,fastSixes,fastQuads,...,LongestRun_mi,SkiableTerrain_ac,Snow Making_ac,daysOpenLastYear,yearsOpen,averageSnowfall,AdultWeekday,AdultWeekend,projectedDaysOpen,NightSkiing_ac
104,Crystal Mountain,Michigan,Michigan,1132,375,757,0,0.0,0,1,...,0.3,102.0,96.0,120.0,63.0,132.0,54.0,64.0,135.0,56.0
295,Crystal Mountain,Washington,Washington,7012,3100,4400,1,,2,2,...,2.5,2600.0,10.0,,57.0,486.0,99.0,99.0,,


We find that Crystal Mountain is not duplicated. Every row is a unique record.