# A Data Driven Approach to European Ski Resorts 2


## Our Dataset

Our dataset features a sample of 376 european ski resorts provided by  ski-resort-stats.com, made availble through kaggle.

[Kaggle Dataset](https://www.kaggle.com/thomasnibb/european-ski-resorts)

[Data Source: Ski-resort-stats.com](Ski-resort-stats.com)

## Potential Questions

# Data Dictionary 

|Column Name| Description|
|-----------|-----------|
|**#**|Rownumber|
|**Resort**|The name at the ski & snowboard resort.
|**Country**| The name of the country in which the resort is located.|
|**HighestPoint**|The highest mountain point at the ski resort.|
|**LowestPoint**|The lowest possible point to ski at the ski resort.|
|**DayPassPriceAdult**| The price shows what it costs for 1 adult for 1 day in the main season in Euros €.|
|**BeginnerSlope**|The total amount of “beginner” slopes in kilometer at the resort. “Beginner slopes” contains “children”, “blue” and “green” slopes.|
|**IntermediateSlope**| The total amount of “intermediate” slopes in kilometer at the resort.“Intermediate slopes” contains “red” slopes. |
|**DifficultSlope**| The total amount of “difficult” slopes in kilometer at the resort.“Difficult slopes” contains “black”, “advanced” and”expert” slopes.| 
|**TotalSlope**| The sum of “beginner slopes” + “intermediate slopes” + “difficult slopes”|
|**Snowparks**| Does the resort have one or more snowparks, or not?|
|**NightSki**|Does the resort offer skiing on illuminated slopes?|
|**SurfaceLifts**| The amount of lifts in this category: T-bar, Sunkidslift, Rope lifts and people mower.|
|**ChairLifts**| The total amount of chairlifts.|
|**GondolaLifts**|The amount of lifts in this category: Gondola, Train lifts, Funicular, Combined gondola and chairlifts, Helicopter lifts, Snowcats and Aerial tramways.|
|**TotalLifts**| The sum of “surface lifts etc” + “gondola etc” + “chairlifts etc”|
|**LiftCapacity**| How many passengers can the lift system at the ski resort mowe in one hour?|
|**SnowCannons**| The total amount of snow cannons at the ski resort.|

# Data Cleaning

Data cleaning is done before the analysis.

Clean data leads to accurate analysis.

A Typical Work Flow is 80% cleaning vs 20% analyzing.

### Things to look for:
* wrong language
* wrong label
* missing information
* duplicates
* invalid inputs
* wrong criteria
    * free vs paid
    * country
    * type

In [1]:
from csv import reader
o_file = open('European_Ski_Resorts.csv')
r_file = reader(o_file)
ski_data = list(r_file)
ski_data_head = ski_data[0]
ski_data = ski_data[1:]

In [2]:
ski_data_head

['',
 'Resort',
 'Country',
 'HighestPoint',
 'LowestPoint',
 'DayPassPriceAdult',
 'BeginnerSlope',
 'IntermediateSlope',
 'DifficultSlope',
 'TotalSlope',
 'Snowparks',
 'NightSki',
 'SurfaceLifts',
 'ChairLifts',
 'GondolaLifts',
 'TotalLifts',
 'LiftCapacity',
 'SnowCannons']

In [3]:
ski_data

[['1',
  'Alpendorf (Ski amedé)',
  'Austria',
  '1980',
  '740',
  '52',
  '30',
  '81',
  '4',
  '115',
  'Yes',
  'No',
  '22',
  '16',
  '11',
  '49',
  '75398',
  '600'],
 ['2',
  'Soldeu-Pas de la Casa/\u200bGrau Roig/\u200bEl Tarter/\u200bCanillo/\u200bEncamp (Grandvalira)',
  'Andorra',
  '2640',
  '1710',
  '47',
  '100',
  '77',
  '33',
  '210',
  'Yes',
  'Yes',
  '37',
  '28',
  '7',
  '72',
  '99017',
  '1032'],
 ['3',
  'Oberau (Wildschönau)',
  'Austria',
  '1130',
  '900',
  '30',
  '1',
  '0',
  '1',
  '2',
  'No',
  'No',
  '2',
  '0',
  '0',
  '2',
  '1932',
  '0'],
 ['4',
  'Dachstein West',
  'Austria',
  '1620',
  '780',
  '42',
  '15',
  '33',
  '3',
  '51',
  'Yes',
  'Yes',
  '25',
  '8',
  '3',
  '36',
  '32938',
  '163'],
 ['5',
  'Rosa Khutor',
  'Southern Russia',
  '2320',
  '940',
  '22',
  '30',
  '26',
  '21',
  '77',
  'Yes',
  'No',
  '6',
  '11',
  '10',
  '27',
  '49228',
  '450'],
 ['6',
  'Białka Tatrzańska-Kotelnica-\u200bKaniówka-\u200bBania'

In [4]:
dict = {'some data': 55, "other data": 83, "more data": 23}

# Frequencies & Percentages

Frequency is the number of times a unique value is found. This is useful for extracting proprotions and percentages from your data.

* countries
* night ski
* snowparks
* nightskis 

In [5]:
country_freq = {}
# loop for the length of ski_data
for i in range(len(ski_data)):
    # index 2 = country
    country = ski_data[i][2]
    # if found increase, else add initial count
    if country in country_freq:
        country_freq[country] += 1
    else:
        country_freq[country] = 1

        # check
print(country_freq)

{'Austria': 89, 'Andorra': 5, 'Southern Russia': 4, 'Poland': 2, 'Bulgaria': 4, 'Bosnia and Herzegovina': 1, 'Slovenia': 4, 'France': 83, 'Switzerland': 61, 'Germany': 24, 'Spain': 8, 'Italy': 44, 'Sweden': 6, 'Norway': 10, 'Ukraine': 1, 'Finland': 3, 'Serbia': 1, 'Slovakia': 5, 'Siberia': 1, 'Czech Republic': 2, 'Romania': 2, 'Greece': 1, 'United Kingdom': 4, 'Liechtenstein': 1, 'Lithuania': 1, 'Denmark': 8, 'Netherlands': 1}


In [6]:
country_proportions = {}
country_percents = {}
for freq in country_freq:
    country_proportions = country_freq[freq] / len(ski_data)
    country_percents[freq] = country_proportions * 100
   # print(country_proportions)
   # print(freq, " ", country_percents)
    print(country_percents)

{'Austria': 23.670212765957448}
{'Austria': 23.670212765957448, 'Andorra': 1.3297872340425532}
{'Austria': 23.670212765957448, 'Andorra': 1.3297872340425532, 'Southern Russia': 1.0638297872340425}
{'Austria': 23.670212765957448, 'Andorra': 1.3297872340425532, 'Southern Russia': 1.0638297872340425, 'Poland': 0.5319148936170213}
{'Austria': 23.670212765957448, 'Andorra': 1.3297872340425532, 'Southern Russia': 1.0638297872340425, 'Poland': 0.5319148936170213, 'Bulgaria': 1.0638297872340425}
{'Austria': 23.670212765957448, 'Andorra': 1.3297872340425532, 'Southern Russia': 1.0638297872340425, 'Poland': 0.5319148936170213, 'Bulgaria': 1.0638297872340425, 'Bosnia and Herzegovina': 0.26595744680851063}
{'Austria': 23.670212765957448, 'Andorra': 1.3297872340425532, 'Southern Russia': 1.0638297872340425, 'Poland': 0.5319148936170213, 'Bulgaria': 1.0638297872340425, 'Bosnia and Herzegovina': 0.26595744680851063, 'Slovenia': 1.0638297872340425}
{'Austria': 23.670212765957448, 'Andorra': 1.32978723

In [7]:
ski_data[0][1]

'Alpendorf (Ski amedé)'

In [8]:
ski_data[0][1].split(" ")

['Alpendorf', '(Ski', 'amedé)']

In [12]:
ski_data[0][1][0:9]

'Alpendorf'