# A Data Driven Approach to European Ski Resorts - Pandas

[Previously](https://github.com/nembdev/python_portfolio/blob/main/Data%20Analysis%20Projects/European%20Ski%20Resorts(Manual)/A%20Data%20Driven%20Approach%20to%20European%20Ski%20Resorts.ipynb) we tried a tool/libraryless approach to Data Analysis. While manual analysis is tedious and limits our scope, its always a good idea to practice the fundamentals.

Now we will use the full might of Pandas, and Seaborn to go further in depth.

Pandas will handle our analysis, while Seaborn will help us create our Visuals. 


## Our Dataset

Our dataset features a sample of 376 european ski resorts provided by  ski-resort-stats.com, made availble through kaggle.

[Kaggle Dataset](https://www.kaggle.com/thomasnibb/european-ski-resorts)

[Data Source: Ski-resort-stats.com](Ski-resort-stats.com)

## Potential Avenues of Exploration

1. Top 5 most represented countries
2. Comparison of Price to Difficulty Options
3. Ratio of Lift Types 
4. Price to Elevation

These are just a few of the areas we could explore.

# Data Dictionary 

|Column Name| Description|
|-----------|-----------|
|**#**|Rownumber|
|**Resort**|The name at the ski & snowboard resort.
|**Country**| The name of the country in which the resort is located.|
|**HighestPoint**|The highest mountain point at the ski resort.|
|**LowestPoint**|The lowest possible point to ski at the ski resort.|
|**DayPassPriceAdult**| The price shows what it costs for 1 adult for 1 day in the main season in Euros €.|
|**BeginnerSlope**|The total amount of “beginner” slopes in kilometer at the resort. “Beginner slopes” contains “children”, “blue” and “green” slopes.|
|**IntermediateSlope**| The total amount of “intermediate” slopes in kilometer at the resort.“Intermediate slopes” contains “red” slopes. |
|**DifficultSlope**| The total amount of “difficult” slopes in kilometer at the resort.“Difficult slopes” contains “black”, “advanced” and”expert” slopes.| 
|**TotalSlope**| The sum of “beginner slopes” + “intermediate slopes” + “difficult slopes”|
|**Snowparks**| Does the resort have one or more snowparks, or not?|
|**NightSki**|Does the resort offer skiing on illuminated slopes?|
|**SurfaceLifts**| The amount of lifts in this category: T-bar, Sunkidslift, Rope lifts and people mower.|
|**ChairLifts**| The total amount of chairlifts.|
|**GondolaLifts**|The amount of lifts in this category: Gondola, Train lifts, Funicular, Combined gondola and chairlifts, Helicopter lifts, Snowcats and Aerial tramways.|
|**TotalLifts**| The sum of “surface lifts etc” + “gondola etc” + “chairlifts etc”|
|**LiftCapacity**| How many passengers can the lift system at the ski resort mowe in one hour?|
|**SnowCannons**| The total amount of snow cannons at the ski resort.|

In [1]:
import numpy as np
# import pandas as pd
# import seaborn as sns
from csv import reader
# import matplotlib.pyplot as plt

In [2]:
# using open & reader
o_file = open("European_Ski_Resorts.csv")
r_file = reader(open("European_Ski_Resorts.csv"))
l_file = list(r_file)
ski_data = np.array(l_file)
ski_data_head = ski_data[0]
ski_data = ski_data[1:]

In [3]:
ski_data_head

array(['', 'Resort', 'Country', 'HighestPoint', 'LowestPoint',
       'DayPassPriceAdult', 'BeginnerSlope', 'IntermediateSlope',
       'DifficultSlope', 'TotalSlope', 'Snowparks', 'NightSki',
       'SurfaceLifts', 'ChairLifts', 'GondolaLifts', 'TotalLifts',
       'LiftCapacity', 'SnowCannons'], dtype='<U107')

In [4]:
# using numpy
ski_data = np.genfromtxt("European_Ski_Resorts.csv", delimiter=",", dtype=str, skip_header=True)

In [5]:
ski_data

array([['"1"', '"Alpendorf (Ski amedé)"', '"Austria"', ..., '49',
        '75398', '600'],
       ['"2"',
        '"Soldeu-Pas de la Casa/\u200bGrau Roig/\u200bEl Tarter/\u200bCanillo/\u200bEncamp (Grandvalira)"',
        '"Andorra"', ..., '72', '99017', '1032'],
       ['"3"', '"Oberau (Wildschönau)"', '"Austria"', ..., '2', '1932',
        '0'],
       ...,
       ['"374"', '"Gressoney - La-Trinite (Monterosa Ski)"', '"Italy"',
        ..., '30', '31984', '655'],
       ['"375"', '"Champoluc (Monterosa Ski)"', '"Italy"', ..., '30',
        '31984', '655'],
       ['"376"', '"Zauchensee"', '"Austria"', ..., '19', '25988', '113']],
      dtype='<U109')

In [6]:
# rows and columns
ski_data.shape

(376, 18)

In [7]:
# grab an entire column
ski_data[:,2]

array(['"Austria"', '"Andorra"', '"Austria"', '"Austria"',
       '"Southern Russia"', '"Poland"', '"Bulgaria"', '"Poland"',
       '"Bosnia and Herzegovina"', '"Slovenia"', '"France"',
       '"Southern Russia"', '"Switzerland"', '"France"', '"Germany"',
       '"France"', '"France"', '"France"', '"France"', '"France"',
       '"Switzerland"', '"France"', '"France"', '"France"', '"France"',
       '"Austria"', '"Spain"', '"France"', '"Italy"', '"France"',
       '"France"', '"France"', '"France"', '"Switzerland"', '"France"',
       '"Switzerland"', '"Austria"', '"Italy"', '"Switzerland"',
       '"Austria"', '"Italy"', '"France"', '"France"', '"France"',
       '"France"', '"Switzerland"', '"Sweden"', '"Italy"', '"France"',
       '"France"', '"Switzerland"', '"Switzerland"', '"Austria"',
       '"France"', '"France"', '"Austria"', '"Switzerland"', '"Italy"',
       '"France"', '"Austria"', '"Austria"', '"Spain"', '"Italy"',
       '"France"', '"Italy"', '"Switzerland"', '"Norway"', 

In [8]:
# grab multiple rows
ski_data[0:2:,]

array([['"1"', '"Alpendorf (Ski amedé)"', '"Austria"', '1980', '740',
        '52', '30', '81', '4', '115', '"Yes"', '"No"', '22', '16', '11',
        '49', '75398', '600'],
       ['"2"',
        '"Soldeu-Pas de la Casa/\u200bGrau Roig/\u200bEl Tarter/\u200bCanillo/\u200bEncamp (Grandvalira)"',
        '"Andorra"', '2640', '1710', '47', '100', '77', '33', '210',
        '"Yes"', '"Yes"', '37', '28', '7', '72', '99017', '1032']],
      dtype='<U109')

In [9]:
# grab multiple entire columns
ski_data[:,0:2]

array([['"1"', '"Alpendorf (Ski amedé)"'],
       ['"2"',
        '"Soldeu-Pas de la Casa/\u200bGrau Roig/\u200bEl Tarter/\u200bCanillo/\u200bEncamp (Grandvalira)"'],
       ['"3"', '"Oberau (Wildschönau)"'],
       ['"4"', '"Dachstein West"'],
       ['"5"', '"Rosa Khutor"'],
       ['"6"',
        '"Białka Tatrzańska-Kotelnica-\u200bKaniówka-\u200bBania"'],
       ['"7"', '"Vitosha-Sofia"'],
       ['"8"', '"Szczyrk-Skrzyczne"'],
       ['"9"', '"Jahorina"'],
       ['"10"', '"Kobla-Bohinj"'],
       ['"11"', '"Aillons-Margériaz"'],
       ['"12"', '"Gornaya Karusel"'],
       ['"13"', '"Kaiseregg-\u200bRiggisalp-Schwarzsee"'],
       ['"14"', '"St. Pierre de Chartreuse-Le Planolet"'],
       ['"15"', '"Buchenberg-Buching-Halblech-"'],
       ['"16"', '"Méribel (Les 3 Vallées)"'],
       ['"17"', '"\u200bLes Menuires (Les 3 Vallées)"'],
       ['"18"',
        '"Les Sybelles-Le Corbier-\u200bLa Toussuire-\u200bLes Bottières-\u200bSt Colomban des Villards-\u200bSt Sorlin-\u200

In [10]:
# colum by index 
col_extract = [1, 2]
resorts_countries = ski_data[:,col_extract]
resorts_countries

array([['"Alpendorf (Ski amedé)"', '"Austria"'],
       ['"Soldeu-Pas de la Casa/\u200bGrau Roig/\u200bEl Tarter/\u200bCanillo/\u200bEncamp (Grandvalira)"',
        '"Andorra"'],
       ['"Oberau (Wildschönau)"', '"Austria"'],
       ['"Dachstein West"', '"Austria"'],
       ['"Rosa Khutor"', '"Southern Russia"'],
       ['"Białka Tatrzańska-Kotelnica-\u200bKaniówka-\u200bBania"',
        '"Poland"'],
       ['"Vitosha-Sofia"', '"Bulgaria"'],
       ['"Szczyrk-Skrzyczne"', '"Poland"'],
       ['"Jahorina"', '"Bosnia and Herzegovina"'],
       ['"Kobla-Bohinj"', '"Slovenia"'],
       ['"Aillons-Margériaz"', '"France"'],
       ['"Gornaya Karusel"', '"Southern Russia"'],
       ['"Kaiseregg-\u200bRiggisalp-Schwarzsee"', '"Switzerland"'],
       ['"St. Pierre de Chartreuse-Le Planolet"', '"France"'],
       ['"Buchenberg-Buching-Halblech-"', '"Germany"'],
       ['"Méribel (Les 3 Vallées)"', '"France"'],
       ['"\u200bLes Menuires (Les 3 Vallées)"', '"France"'],
       ['"Les Syb

In [11]:
# rows 1-3, columns 2-6 
ski_data[1:3,2:6]

array([['"Andorra"', '2640', '1710', '47'],
       ['"Austria"', '1130', '900', '30']], dtype='<U109')

In [12]:
# all rows, one column
ski_data[0:,2]

array(['"Austria"', '"Andorra"', '"Austria"', '"Austria"',
       '"Southern Russia"', '"Poland"', '"Bulgaria"', '"Poland"',
       '"Bosnia and Herzegovina"', '"Slovenia"', '"France"',
       '"Southern Russia"', '"Switzerland"', '"France"', '"Germany"',
       '"France"', '"France"', '"France"', '"France"', '"France"',
       '"Switzerland"', '"France"', '"France"', '"France"', '"France"',
       '"Austria"', '"Spain"', '"France"', '"Italy"', '"France"',
       '"France"', '"France"', '"France"', '"Switzerland"', '"France"',
       '"Switzerland"', '"Austria"', '"Italy"', '"Switzerland"',
       '"Austria"', '"Italy"', '"France"', '"France"', '"France"',
       '"France"', '"Switzerland"', '"Sweden"', '"Italy"', '"France"',
       '"France"', '"Switzerland"', '"Switzerland"', '"Austria"',
       '"France"', '"France"', '"Austria"', '"Switzerland"', '"Italy"',
       '"France"', '"Austria"', '"Austria"', '"Spain"', '"Italy"',
       '"France"', '"Italy"', '"Switzerland"', '"Norway"', 

In [13]:
# multi row multi col
ski_data[0:3,0:3]

array([['"1"', '"Alpendorf (Ski amedé)"', '"Austria"'],
       ['"2"',
        '"Soldeu-Pas de la Casa/\u200bGrau Roig/\u200bEl Tarter/\u200bCanillo/\u200bEncamp (Grandvalira)"',
        '"Andorra"'],
       ['"3"', '"Oberau (Wildschönau)"', '"Austria"']], dtype='<U109')

In [14]:
# extract and type conversion
lifts_int =  ski_data[:,9].astype("int64")

In [15]:
lifts_int.max()

600

In [16]:
lifts_int.mean()

86.25797872340425

In [17]:
ski_data_nd = np.genfromtxt("European_Ski_Resorts.csv", delimiter=",")

In [18]:
ski_data_nd.dtype

dtype('float64')