# Sample Open Data Analysis Script
This is a sample script showing how an open data datasets can be analysed and demonstrated using [Jupiter Notebooks](http://jupyter.org). Similar examples can be found on [Kaggle web-site](https://www.kaggle.com/kernels).

## Acknowledgments
1. Based on the Kaggle kernel: [Detail Analysis of various Hospital factors](https://www.kaggle.com/nirajvermafcb/d/cms/hospital-general-information/detail-analysis-of-various-hospital-factors)

In [22]:
# Load libraries
import pandas as pd # CSV file processing
import numpy as np # vector and matrix manipulation
import matplotlib.pyplot as plt # visualisation
import seaborn as sns # visualisation

In [43]:
# Load the csv file from the open data portal
# dataset description: https://www.data.gv.at/katalog/dataset/stadt-wien_anzahlderhundeprobezirkderstadtwien/resource/b8d97349-c993-486d-b273-362e0524f98c
data_path = 'https://www.wien.gv.at/finanzen/ogd/hunde-wien.csv'
# Specify the dataset format, e.g. delimiters
data = pd.read_csv(data_path, delimiter=';', skiprows=1, encoding = 'latin-1')

In [39]:
# Check the top of the table to make sure the dataset is loaded correctly 
data.head()

Unnamed: 0,NUTS1,NUTS2,NUTS3,DISTRICT_CODE,SUB_DISTRICT_CODE,Postal_CODE,Dog Breed,Anzahl,Ref_Date
0,AT1,AT13,AT113,90100,.,1010,Afghanischer Windhund,1,20161201
1,AT1,AT13,AT113,90100,.,1010,Amerikanischer Pit-Bullterrier,1,20161201
2,AT1,AT13,AT113,90100,.,1010,Amerikanischer Staffordshire-Terrier,4,20161201
3,AT1,AT13,AT113,90100,.,1010,Amerikanischer Staffordshire-Terrier / Mischling,1,20161201
4,AT1,AT13,AT113,90100,.,1010,Australian Shepherd Dog,3,20161201


In [48]:
# Check the column types to make sure the dataset is loaded correctly
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5740 entries, 0 to 5739
Data columns (total 9 columns):
NUTS1                5740 non-null object
NUTS2                5740 non-null object
NUTS3                5740 non-null object
DISTRICT_CODE        5740 non-null int64
SUB_DISTRICT_CODE    5740 non-null object
Postal_CODE          5740 non-null int64
Dog Breed            5740 non-null object
Anzahl               5740 non-null object
Ref_Date             5740 non-null int64
dtypes: int64(3), object(6)
memory usage: 403.7+ KB


Count (Anzahl) is not recognized as numeric data. We shall fix this!

In [53]:
# TODO
# data['Anzahl'].unique()
# data.applymap(np.isreal)
# data['Anzahl']=clean_hospital_data['Hospital overall rating'].astype(float)
# data['Anzahl'].mean()
# clean_hospital_data['Hospital overall rating'].count()

In [14]:
# Check the size of the dataset
data.shape

(5740, 9)

Our dataset contains 9 columns and 5740 rows

In [26]:
# Check descriptive statistics
data.describe()
# data['Postal_CODE'].unique()

Unnamed: 0,DISTRICT_CODE,Postal_CODE,Ref_Date
count,5740.0,5740.0,5740.0
mean,91355.0,1135.5,20161201.0
std,668.274742,66.827474,0.0
min,90100.0,1010.0,20161201.0
25%,90900.0,1090.0,20161201.0
50%,91400.0,1140.0,20161201.0
75%,92000.0,1200.0,20161201.0
max,92300.0,1230.0,20161201.0


The district and postal codes range from 90100 (1010) to 92300 (1230) indicating the 23 districts of Vienna (Wiener Bezirke).

All rows describe the data for a single reference date: 2016 12 01. Since the date format is not explicitly specified, it is not clear though whether it is the 1st of December or the 12th of January.

Essentially the dataset boils down to the information: District | Dog Breed | Dog Count

In [17]:
# Check unique values in one of the columns
unique_breeds = data['Dog Breed'].unique()
len(unique_breeds)

1061

We have 1,061 different types of dogs living in Vienna, how cool is that!

In [44]:
# Check the top counts
sorted_count = data.sort_values(['Anzahl'], ascending=False)
sorted_count[['Postal_CODE','Dog Breed', 'Anzahl']].head()

Unnamed: 0,Postal_CODE,Dog Breed,Anzahl
1934,1100,Zwergpinscher,99
456,1030,Chihuahua kurzhaariger Schlag,98
4073,1190,Golden Retriever,96
1700,1100,Deutscher SchÃ¤ferhund / Mischling,95
5618,1230,Malteser,95


The inhabitants of the 10th district turned out to be huge fans of Pinschers (Zwergpinscher) and German Shepherds (Deutscher Schäferhund), while the 3rd district of Vienna is full of Chihuahuas. 

Now you know where to go if you enjoy dog-watching!