# A Data Driven Approach to European Ski Resorts

Ski resorts come in all different shapes, sizes, and prices. There are thousands of european ski resorts known for their luscious mountains, and elegant experiences. Today we'll take a closer look at one the world's most prestigious sports.

We'll use a **manual, library free** analysis to see what kind of basic insights we can extract. 

Afterwards a **second analysis, and visualization** will be done using pandas, numpy, and seaborn.

## Our Dataset

Our dataset features a sample of 376 european ski resorts provided by  ski-resort-stats.com, made availble through kaggle.

[Kaggle Dataset](https://www.kaggle.com/thomasnibb/european-ski-resorts)

[Data Source: Ski-resort-stats.com](Ski-resort-stats.com)

## Potential Questions

1. Whats the min, max, and mean price of a Daypass?
2. Is there any correlation between price and available slope diffculty?
3. How many resorts are available in a given price range?
4. What country has the most affordable prices at each difficulty level?
5. What slope difficulty range is most represented in europe? 

# Data Dictionary 

|Column Name| Description|
|-----------|-----------|
|**#**|Rownumber|
|**Resort**|The name at the ski & snowboard resort.
|**Country**| The name of the country in which the resort is located.|
|**HighestPoint**|The highest mountain point at the ski resort.|
|**LowestPoint**|The lowest possible point to ski at the ski resort.|
|**DayPassPriceAdult**| The price shows what it costs for 1 adult for 1 day in the main season in Euros €.|
|**BeginnerSlope**|The total amount of “beginner” slopes in kilometer at the resort. “Beginner slopes” contains “children”, “blue” and “green” slopes.|
|**IntermediateSlope**| The total amount of “intermediate” slopes in kilometer at the resort.“Intermediate slopes” contains “red” slopes. |
|**DifficultSlope**| The total amount of “difficult” slopes in kilometer at the resort.“Difficult slopes” contains “black”, “advanced” and”expert” slopes.| 
|**TotalSlope**| The sum of “beginner slopes” + “intermediate slopes” + “difficult slopes”|
|**Snowparks**| Does the resort have one or more snowparks, or not?|
|**NightSki**|Does the resort offer skiing on illuminated slopes?|
|**SurfaceLifts**| The amount of lifts in this category: T-bar, Sunkidslift, Rope lifts and people mower.|
|**ChairLifts**| The total amount of chairlifts.|
|**GondolaLifts**|The amount of lifts in this category: Gondola, Train lifts, Funicular, Combined gondola and chairlifts, Helicopter lifts, Snowcats and Aerial tramways.|
|**TotalLifts**| The sum of “surface lifts etc” + “gondola etc” + “chairlifts etc”|
|**LiftCapacity**| How many passengers can the lift system at the ski resort mowe in one hour?|
|**SnowCannons**| The total amount of snow cannons at the ski resort.|

### Imports & Initial Setup

We're keeping it simple today. We'll try our hand at a library-less analysis. 

Reader is used to import our csv file. Our file is opened, read, then it is converted into a list. Lastly, we verify our import was successful by viewing the column names.

In [1]:
# to import our file
from csv import reader

In [2]:
# open file
o_file = open("European_Ski_Resorts.csv")

# read file
r_file = reader(o_file)

# convert to list
ski_data = list(r_file)

# verify header/column names
ski_data[0]

['',
 'Resort',
 'Country',
 'HighestPoint',
 'LowestPoint',
 'DayPassPriceAdult',
 'BeginnerSlope',
 'IntermediateSlope',
 'DifficultSlope',
 'TotalSlope',
 'Snowparks',
 'NightSki',
 'SurfaceLifts',
 'ChairLifts',
 'GondolaLifts',
 'TotalLifts',
 'LiftCapacity',
 'SnowCannons']

## Brief Overview of our dataset

Before we began our quest for answers, its a good idea to find out what the data we're working with generally looks like.

An examination of the first and last row, gives us a quick glimpse of potential data types. Similiar to a .head() or .tail() method.

In [3]:
# view first resort row/head
print("Head")
for num in range(len(ski_data[0])):
    print(ski_data[0][num], ":", ski_data[1][num])
    # print(type(ski_data[1][num])) all strings

print("\n")
print("Tail")

# view last resort row/tail
for num in range(len(ski_data[0])):
    print(ski_data[0][num], ": ", ski_data[376][num])

Head
 : 1
Resort : Alpendorf (Ski amedé)
Country : Austria
HighestPoint : 1980
LowestPoint : 740
DayPassPriceAdult : 52
BeginnerSlope : 30
IntermediateSlope : 81
DifficultSlope : 4
TotalSlope : 115
Snowparks : Yes
NightSki : No
SurfaceLifts : 22
ChairLifts : 16
GondolaLifts : 11
TotalLifts : 49
LiftCapacity : 75398
SnowCannons : 600


Tail
 :  376
Resort :  Zauchensee
Country :  Austria
HighestPoint :  2188
LowestPoint :  1000
DayPassPriceAdult :  52
BeginnerSlope :  23
IntermediateSlope :  16
DifficultSlope :  4
TotalSlope :  44
Snowparks :  Yes
NightSki :  No
SurfaceLifts :  9
ChairLifts :  6
GondolaLifts :  4
TotalLifts :  19
LiftCapacity :  25988
SnowCannons :  113


## What do we know so far?

1. All of our data is stored as strings regardless of the actual value type.
2. 14/17 of our columns should be converted to integers.
4. Snowparks & NightSki could be converted into booleans.
5. HighestPoint and LowestPoint are a measurement but the unit type is missing
6. Resort and Country can stay as strings.
7. There is a lack of snake_case for our column names.

## Minor Cleaning Needs

We'll save our major data cleaning for the second run. For now, we'll do basic column name clean up, and type conversions.

1. Clean our first column label.
2. Update column labels to snake_case.
3. Correct the Data type for each column.
4. Add value to highest_point and lowestpoint labels.
4. Add value to daypass_price_adult.

From the previous cell, we can see that the column **label for RowNumbers is missing**. We can fix this with a simple assignment.

Next, we'll adjust each **cells data type** to match its value. Finally, we'll add the **measurement unit** to highest/lowest labels and **euros** to the dayprice label.



### Snake_Case and Missing Label

[snake_case](https://peps.python.org/pep-0008/) is the reccomended way to style variables according to the industry standard, Pep-8.

It makes our columns easier to work with programatically. 

We also have a missing column label. The rownumber cell is empty. Both are easy fixes.

In [4]:
# clean up missing column label
ski_data[0][0] = "#"

# update column labels
ski_data[0] = [
    "#",
    "resort",
    "country",
    "highest_point",
    "lowest_point",
    "daypass_price_adult",
    "beginner_slope",
    "intermediate_slope",
    "difficult_slope",
    "total_slope",
    "snowparks",
    "night_ski",
    "surface_lifts",
    "chair_lifts",
    "gondola_lifts",
    "total_lifts",
    "lift_capacity",
    "snow_cannons",
]
ski_data[0]

['#',
 'resort',
 'country',
 'highest_point',
 'lowest_point',
 'daypass_price_adult',
 'beginner_slope',
 'intermediate_slope',
 'difficult_slope',
 'total_slope',
 'snowparks',
 'night_ski',
 'surface_lifts',
 'chair_lifts',
 'gondola_lifts',
 'total_lifts',
 'lift_capacity',
 'snow_cannons']

### Conversions

Here we convert each cell into its appropriate data type. 

This could be done as we come across them, but for ease of usability, we'll convert them in mass now.

Our dataset contains **ints, floats, bools, and strings**.

In [5]:
# could be replaced with more efficient method
for num in range(1, len(ski_data[1:]) + 1):
    # print(num)
    ski_data[num][0] = int(ski_data[num][0]) # resort_number
    
    ski_data[num][3] = float(ski_data[num][3]) # highest_point
    ski_data[num][4] = float(ski_data[num][4]) # lowest_point
    

    ski_data[num][5] = int(ski_data[num][5]) # daypass_price_adult
    ski_data[num][6] = int(ski_data[num][6]) # beginner_slope
    ski_data[num][7] = int(ski_data[num][7]) # intermediate_slope
    ski_data[num][8] = int(ski_data[num][8]) # difficult_slope
    ski_data[num][9] = int(ski_data[num][9]) # total_slope
    
    # convert yes/no into true/false
    # snowparks = 10, night_ski = 11
    if ski_data[num][10].lower() == "yes":
        ski_data[num][10] = True
    else:
        ski_data[num][10] = False
    if ski_data[num][11].lower() == "yes":
        ski_data[num][11] = True
    else:
        ski_data[num][11] = False
    
    ski_data[num][12] = int(ski_data[num][12]) # surface_lifts
    ski_data[num][13] = int(ski_data[num][13]) # chair_lifts
    ski_data[num][14] = int(ski_data[num][14]) # gondola_lifts
    ski_data[num][15] = int(ski_data[num][15]) # total_lifts
    ski_data[num][16] = int(ski_data[num][16]) # lift_capacity
    ski_data[num][17] = int(ski_data[num][17]) # snow_cannons

    
# verify head & tail
print("Head")
for num in range(len(ski_data[0])):
    print(ski_data[0][num], ":", type(ski_data[1][num]))
    # print(type(ski_data[1][num])) all strings

print("\n")
print("Tail")

# view last resort row/tail
for num in range(len(ski_data[0])):
    print(ski_data[0][num], ": ", type(ski_data[376][num]))

Head
# : <class 'int'>
resort : <class 'str'>
country : <class 'str'>
highest_point : <class 'float'>
lowest_point : <class 'float'>
daypass_price_adult : <class 'int'>
beginner_slope : <class 'int'>
intermediate_slope : <class 'int'>
difficult_slope : <class 'int'>
total_slope : <class 'int'>
snowparks : <class 'bool'>
night_ski : <class 'bool'>
surface_lifts : <class 'int'>
chair_lifts : <class 'int'>
gondola_lifts : <class 'int'>
total_lifts : <class 'int'>
lift_capacity : <class 'int'>
snow_cannons : <class 'int'>


Tail
# :  <class 'int'>
resort :  <class 'str'>
country :  <class 'str'>
highest_point :  <class 'float'>
lowest_point :  <class 'float'>
daypass_price_adult :  <class 'int'>
beginner_slope :  <class 'int'>
intermediate_slope :  <class 'int'>
difficult_slope :  <class 'int'>
total_slope :  <class 'int'>
snowparks :  <class 'bool'>
night_ski :  <class 'bool'>
surface_lifts :  <class 'int'>
chair_lifts :  <class 'int'>
gondola_lifts :  <class 'int'>
total_lifts :  <class 

### Value Adds

Increasing information density or aotherwise adding addtional details is a way to add value to your analysis. 

Things such as monetary, measurement, and time units are easy value adds.

Most ski mountains are measured in meters, and the general currency of Europe is the Euro.

In [6]:
# add measurement units and euros to labels
ski_data[0][3] = "highest_point_meters"
ski_data[0][4] = "lowest_point_meters"
ski_data[0][5] = "adult_daypass_euros"
ski_data[0]

['#',
 'resort',
 'country',
 'highest_point_meters',
 'lowest_point_meters',
 'adult_daypass_euros',
 'beginner_slope',
 'intermediate_slope',
 'difficult_slope',
 'total_slope',
 'snowparks',
 'night_ski',
 'surface_lifts',
 'chair_lifts',
 'gondola_lifts',
 'total_lifts',
 'lift_capacity',
 'snow_cannons']

## What resorts are we working with?

We will extract and store all the names of the various resorts in a seperate array. 

This could be useful for later analysis. We'll then view, and sort them alphabetically

In [7]:
# holds all resort names
resort_names = []
resort_names_min = [] # holds first 50
# loop through our data
for row in ski_data[1:]:
    resort = row[1]  # resort name column
    # print(resort)

    # store names
    resort_names.append(resort)

# view a semi-sorted list of resort names
# cut at 50
num = 0
for name in sorted(resort_names):
    if num < 50:
        num += 1
        print(name)

Abetone-​Val di Luce
Adelboden-​Lenk-Chuenisbärgli-​Silleren-​Hahnenmoos-​Metsch
Aiguille du Midi-Chamonix-
Aillons-Margériaz
Albiez-Montrond
Aletsch Arena-Riederalp-​Bettmeralp-​Fiesch Eggishorn
Alpe Lusia-Moena-​Bellamonte
Alpe d'Huez
Alpe di Siusi-Seiser Alm-
Alpe du Grand-Serre-La Morte
Alpendorf (Ski amedé)
Alpika Service
Alta Badia
Alto Sangro-Roccaraso-​Rivisondoli
Anzère
Aprica
Arabba
Arcalís-Ordino (Vallnord)
Arêches-Beaufort-
Arosa Lenzerheide
Auron-Saint-Etienne-de-Tinée-
Aussois
Avoriaz (Les Portes du Soleil)
Ax les Thermes
Axamer Lizum
Bad Gastein
Bad Kleinkirchheim-​St. Oswald
Balme-​Les Autannes-Vallorcine-​Le Tour
Bansko
Baqueira / ​Beret
Bardonecchia
Belalp-Blatten
Bellwald
Belvedere-​Col Rodella-​Ciampac-​Buffaure-Canazei-​Campitello-​Alba-​Pozza di Fassa
Berwang-​Bichlbach-​Rinnen
Białka Tatrzańska-Kotelnica-​Kaniówka-​Bania
Bonneval sur Arc
Bormio-Cima Bianca
Borovets
Brandnertal-Brand-​Bürserberg
Brauneck Lenggries ​Wegscheid
Brévent-​Flégère-Chamonix-


## How many resorts are we working with?

A basic count using a len() method can assist us with finding averages, and other statistical calculations. Also, its always a good idea to verify the actual count for accuracy. 

Accurate data = Accurate analysis.

In [8]:
# count of total resorts
print(f"Total Resorts in our Dataset:", len(resort_names))

Total Resorts in our Dataset: 376


## Next Steps

Now is the time to start our analysis to answer our questions.

1. Whats the min, max, and mean price of a Daypass?
2. Is there any correlation between price and available slope diffculty?
3. How many resorts are available in a given price range?
4. What country has the most affordable prices at each difficulty level?
5. What slope difficulty range is most represented in europe?


## Total Price to visit all Resorts

Lets say you're someone who loves to ski. You might just want to visit ALL of europes ski resorts. How much would his realistically cost?

Using a simple for loop, we can loop through, and sum all the prices in the DayPassPriceAdult column.

In [9]:
# sum for all day passes
total_price = 0
for row in ski_data[1:]:
    total_price += int(row[5])  # price column

print(f"If we visted every ski resort it would cost us €{total_price}.")

If we visted every ski resort it would cost us €15333.


## Daypass Analysis

Our first question: What is the min, max, and mean price of a Daypass?

In [10]:
# finding the highest price of a day pass

highest_price, highest_resort, highest_country = 0, "", ""
for row in ski_data[1:]:
    price = int(row[5])
    if price >= highest_price:
        highest_resort = row[1]
        highest_country = row[2]
        highest_price = price

# print(highest_price, highest_resort, highest_country)

# finding the lowest price of a day pass
lowest_price, lowest_resort, lowest_country = highest_price,"", ""
for row in ski_data[1:]:
    price = int(row[5])
    # not storing zero outliers
    if price <= lowest_price and price != 0:
        lowest_resort = row[1]
        lowest_country = row[2]
        lowest_price = price

# print(lowest_price, lowest_resort, lowest_country)


# finding the average price of a day pass
avg_price = 0
for row in ski_data[1:]:
    price = int(row[5])
    avg_price += price

avg_price /= 376
print(avg_price)

average_priced = []
for row in ski_data[1:]:
    price = int(row[5])
    if price >= avg_price and price <= 45:
        average_priced.append([row[1],row[2], price])

40.77925531914894


## High, Low, Average

Day Passes range from 15 at the lowest, to 81 at the highest, with an average price of €40.78. 

The first three resorts that fall around the midpoint of our price range, are included in the table below.


|<div style="font-size: 20px">Type</div>| <div style="font-size: 20px">Resort</div>|<div style="font-size: 20px">Country</div>|<div style="font-size: 20px">Price</div>|
|-----------|-----------|-----------|-----------|
|<div style="text-align: center;font-size: 15px">**Highest**</div>|Zermatt - Matterhorn|Switzerland| €81 
|<div style="text-align: center;font-size: 15px">**Lowest**</div>|Hoch Hylkedal – Kolding |Denmark|€15 
|<div style="text-align: center;font-size: 15px">**Average 1**</div>|Dachstein West|Austria|  €42|
|<div style="text-align: center;font-size: 15px">**Average 2**</div>|Les Sybelles-Le Corbier...|France|  €44|
|<div style="text-align: center;font-size: 15px">**Average 3**</div>|Le Grand Domaine-Valmorel...|France|  €44|


### Next

We will seperate our prices into **price brackets**, in order to find out how many resorts fall within each.


In [11]:
# finding price brackets
price_brackets = {}
for row in ski_data[1:]:
    price = int(row[5])
    if price >= 0 and price <= 20:
        if "€0-€20" in price_brackets.keys():
            price_brackets["€0-€20"] += 1
        else:
            price_brackets["€0-€20"] = 1
    elif price >= 20 and price <= 40:
        if "€20-€40" in price_brackets.keys():
            price_brackets["€20-€40"] += 1
        else:
            price_brackets["€20-€40"] = 1
    elif price >= 41 and price <= 60:
        if "€41-€60" in price_brackets.keys():
            price_brackets["€41-€60"] += 1
        else:
            price_brackets["€41-€60"] = 1
    else:
        if "€60+" in price_brackets.keys():
            price_brackets["€60+"] += 1
        else:
            price_brackets["€60+"] = 1

print(price_brackets, "\n")

print(
    f"""There are {price_brackets['€0-€20']} resorts in the '€0-€20' range.\n{price_brackets['€20-€40']} resorts in the €20-€40' range,\n{price_brackets['€41-€60']} resorts in the €41-€60 range, and\n{price_brackets['€60+']} resorts in the '€60+' range."""
)

{'€41-€60': 191, '€20-€40': 144, '€0-€20': 22, '€60+': 19} 

There are 22 resorts in the '€0-€20' range.
144 resorts in the €20-€40' range,
191 resorts in the €41-€60 range, and
19 resorts in the '€60+' range.


|<div style="font-size: 20px">Name </div>| <div style="font-size: 20px">&nbsp; &nbsp;  &nbsp; &nbsp; &nbsp; &nbsp;€ Price Bracket </div>|<div style="font-size: 20px"> &nbsp; &nbsp;  &nbsp; &nbsp; &nbsp; &nbsp;# of Resorts</div>|
|-----------|-----------|-----------|
|<div style="text-align: center;font-size: 15px">Low</div>|<div style="text-align: center;font-size: 15px">0-20</div>|<div style="font-size: 15px">36</div>|
|<div style="text-align: center;font-size: 15px">Mid</div>|<div style="text-align: center;font-size: 15px">21-40</div>|<div style="font-size: 15px">130</div>|
|<div style="text-align: center;font-size: 15px">High</div>|<div style="text-align: center;font-size: 15px">41-60</div>|<div style="font-size: 15px">191</div>
|<div style="text-align: center;font-size: 15px">Extreme</div>|<div style="text-align: center;font-size: 15px">60-81</div>|<div style="font-size: 15px">19</div>|
    
Our price brackets seem to fall in line with common assumptions. One would assume very little low cost ski options, as well as a relatively small amount extremely costly options.

85%+ of our ski resorts fall within the bounds of €20-€60 as assumed. We will get a more exact estimate on our second run.
    
## Next Up.

Is there any correlation between price and available slope diffculty?

#### BeginnerSlope 	

The total amount of “beginner” slopes in kilometer at the resort. “Beginner slopes” contains “children”, “blue” and 
“green” slopes.
#### IntermediateSlope 	

The total amount of “intermediate” slopes in kilometer at the resort.“Intermediate slopes” contains “red” slopes.

#### DifficultSlope 	

The total amount of “difficult” slopes in kilometer at the resort.“Difficult slopes” contains “black”, “advanced” and”expert” slopes.m

#### Price Ranges

Low: 0 - 20

Mid: 20 - 40

High: 40 - 60

Extreme: 60 -81

In [12]:
# finding price brackets
price_brackets = {}

slopes = {}
price_difficulty_ratios = {}
price_difficulty_ratios["low"] = {"beginner": 0, "intermediate": 0,"difficult": 0 }
price_difficulty_ratios["mid"] = {"beginner": 0, "intermediate": 0,"difficult": 0 }
price_difficulty_ratios["high"] = {"beginner": 0, "intermediate": 0,"difficult": 0 }
price_difficulty_ratios["extreme"] = {"beginner": 0, "intermediate": 0,"difficult": 0 }

low = range(0, 26)
mid = range(20, 41)
high = range(40, 61)
extreme = range(60, 82)


 
for row in ski_data[1:]:
    price = row[5]
    beginner = row[6]
    intermediate = row[7]
    difficult = row[8]
    
    if price in low:
        price_difficulty_ratios["low"]["beginner"] += beginner
        price_difficulty_ratios["low"]["intermediate"] += intermediate
        price_difficulty_ratios["low"]["difficult"] += difficult
    elif price in mid:
        price_difficulty_ratios["mid"]["beginner"] += beginner
        price_difficulty_ratios["mid"]["intermediate"] += intermediate
        price_difficulty_ratios["mid"]["difficult"] += difficult       
    elif price in high:
        price_difficulty_ratios["high"]["beginner"] += beginner
        price_difficulty_ratios["high"]["intermediate"] += intermediate
        price_difficulty_ratios["high"]["difficult"] += difficult
    else:
        price_difficulty_ratios["extreme"]["beginner"] += beginner
        price_difficulty_ratios["extreme"]["intermediate"] += intermediate
        price_difficulty_ratios["extreme"]["difficult"] += difficult
        
# print(price_difficulty_ratios)
print("Low", price_difficulty_ratios["low"])
print("Mid", price_difficulty_ratios["mid"])
print("High", price_difficulty_ratios["high"])
print("Extreme", price_difficulty_ratios["extreme"])

Low {'beginner': 260, 'intermediate': 220, 'difficult': 72}
Mid {'beginner': 2642, 'intermediate': 2548, 'difficult': 727}
High {'beginner': 7835, 'intermediate': 8642, 'difficult': 2584}
Extreme {'beginner': 2750, 'intermediate': 3009, 'difficult': 1045}


### Find Ratios

Now that we have our frequienceis we can find out ratios.

In [13]:
# store the total
low_total = sum(price_difficulty_ratios["low"].values())
mid_total = sum(price_difficulty_ratios["mid"].values())
high_total = sum(price_difficulty_ratios["high"].values())
extreme_total = sum(price_difficulty_ratios["extreme"].values())

# store the ratios
price_ratio_low = {
    "beginner": price_difficulty_ratios["low"]["beginner"] / low_total,
    "intermediate": price_difficulty_ratios["low"]["intermediate"] / low_total,
    "difficult": price_difficulty_ratios["low"]["difficult"]  / low_total
}

price_ratio_mid = {
    "beginner": price_difficulty_ratios["mid"]["beginner"] / mid_total,
    "intermediate": price_difficulty_ratios["mid"]["intermediate"] / mid_total,
    "difficult": price_difficulty_ratios["mid"]["difficult"] / mid_total
}

price_ratio_high = {
    "beginner": price_difficulty_ratios["high"]["beginner"] / high_total,
    "intermediate": price_difficulty_ratios["high"]["intermediate"] / high_total,
    "difficult": price_difficulty_ratios["high"]["difficult"]  / high_total
}

price_ratio_extreme = {
    "beginner": price_difficulty_ratios["high"]["beginner"] / high_total,
    "intermediate": price_difficulty_ratios["high"]["intermediate"] / high_total,
    "difficult": price_difficulty_ratios["high"]["difficult"]  / high_total
}

# verify
print("Low:", price_ratio_low)
print("Mid:", price_ratio_mid)
print("High:", price_ratio_high)
print("Extreme:", price_ratio_extreme)

Low: {'beginner': 0.47101449275362317, 'intermediate': 0.39855072463768115, 'difficult': 0.13043478260869565}
Mid: {'beginner': 0.4465100557715058, 'intermediate': 0.4306236268379246, 'difficult': 0.12286631739056955}
High: {'beginner': 0.4110487382613714, 'intermediate': 0.45338649598656944, 'difficult': 0.13556476575205917}
Extreme: {'beginner': 0.4110487382613714, 'intermediate': 0.45338649598656944, 'difficult': 0.13556476575205917}


|Price Type|Beginner %|Intermediate %|Difficult %| Total Beginner| Total Intermediate| Total High| Total Extreme
|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|Low|47.10|39.85|13.04||||