## Open Government Data, provided by **opendata.swiss**
*Autogenerated Python starter code for data set with identifier* **vm-liste@oevch**

## Dataset
# **Verkehrsmittel- und Hinweislisten**

## Description

Die Verkehrsmittel- und Hinweislisten beziehen sich auf das Produkt 06 Harmonisierung Verkehrsmittel der Vorschrift 580 – FIScommun (leider nur in Französisch und Deutsch verfügbar). Fachliche Details können direkt dort entnommen werden. Das File Transportmodes entspricht der Verkehrsmittelkategorie/Catégorie de moyen de transport, das File TransportSubmodes entspricht der Angebotskategorie/Catégorie d’offre und das File Hints entspricht den Beförderungshinweise, Angebotshinweise und Informationen/Indications de transport, indications d’offre et informations. Die Verkehrsmittel- und Hinweislisten steuern die Verwendung auf allen Schnittstellen der Systemführerschaft.

## Data set links

[Direct link by opendata.swiss for dataset](https://opendata.swiss/de/dataset/verkehrsmittel-und-hinweislisten1)<br>
[Direct link by Geschäftsstelle Systemaufgaben Kundeninformation (SBB AG) for dataset](https://opentransportdata.swiss/dataset/vm-liste)

## Metadata
- **Publisher** `Alliance SwissPass`
- **Organization.display_name.de** `Geschäftsstelle Systemaufgaben Kundeninformation (SBB AG)`
- **Organization.url** `https://www.opentransportdata.swiss`
- **Maintainer** `Open Data Mobilität Schweiz`
- **Maintainer_email** `opendata@sbb.ch`
- **Keywords.de** `[]`
- **Issued** `2021-06-15T00:00:00`
- **Metadata_created** `2021-06-16T01:02:51.465884`
- **Metadata_modified** `2023-01-16T00:25:27.654966`


## Imports and helper functions

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')

params = {
    'text.color': (0.25, 0.25, 0.25),
    'figure.figsize': [18, 6],
   }

plt.rcParams.update(params)

import pandas as pd 

In [None]:
# helper function for reading datasets with proper separator
def get_dataset(url):
    if url[-3:] != "csv":
        print("The data set URL has no proper 'csv' extension. Reading the dataset might not have worked as expected.\nPlease check the dataset link and adjust pandas' read_csv() parameters accordingly.")
    data = pd.read_csv(url, sep=",", on_bad_lines='warn', encoding_errors='ignore', low_memory=False)
    # if dataframe only has one column or less the data is not comma separated, use ";" instead
    if data.shape[1] <= 1:
        data = pd.read_csv(url, sep=';', on_bad_lines='warn', encoding_errors='ignore', low_memory=False)
        if data.shape[1] <= 1:
            print("The data wasn't imported properly. Very likely the correct separator couldn't be found.\nPlease check the dataset manually and adjust the code.")
    return data

## Load data

- The dataset has **`3` distribution(s)** in CSV format.
- All available CSV distributions are listed below and can be read into a pandas dataframe.

In [None]:
# Distribution 0
# Package_id               : 2a57d4b8-bc7a-465e-a023-7d1b6e411f92
# Description              : 
# Issued                   : 2021-06-15T00:00:00
# Modified                 : 2022-10-24T00:00:00
# Rights                   : NonCommercialAllowed-CommercialAllowed-ReferenceRequired

df = get_dataset('https://opentransportdata.swiss/de/dataset/ea5ba0cb-b434-4f18-9220-e8edb2e1dd88/resource/66201b06-e34b-486b-bf7b-b7d826105edf/download/hints.csv')

# Distribution 1
# Package_id               : 2a57d4b8-bc7a-465e-a023-7d1b6e411f92
# Description              : 
# Issued                   : 2022-01-31T00:00:00
# Modified                 : 2022-01-31T00:00:00
# Rights                   : NonCommercialAllowed-CommercialAllowed-ReferenceRequired

df = get_dataset('https://opentransportdata.swiss/de/dataset/ea5ba0cb-b434-4f18-9220-e8edb2e1dd88/resource/59d4cf59-800e-4c8d-ae0f-b8e9936afe9e/download/transportmodes310122.csv')

# Distribution 2
# Package_id               : 2a57d4b8-bc7a-465e-a023-7d1b6e411f92
# Description              : 
# Issued                   : 2022-01-31T00:00:00
# Modified                 : 2022-01-31T00:00:00
# Rights                   : NonCommercialAllowed-CommercialAllowed-ReferenceRequired

df = get_dataset('https://opentransportdata.swiss/de/dataset/ea5ba0cb-b434-4f18-9220-e8edb2e1dd88/resource/cf87d908-0f4c-4bd6-badf-7ca14de82f04/download/transportsubmodes210122.csv')



## Analyze data

In [None]:
# drop columns that have no values
df.dropna(how='all', axis=1, inplace=True)

In [None]:
print(f'The dataset has {df.shape[0]:,.0f} rows (observations) and {df.shape[1]:,.0f} columns (variables).')
print(f'There seem to be {df.duplicated().sum()} exact duplicates in the data.')

In [None]:
df.info(memory_usage='deep', verbose=True)

In [None]:
df.head()

In [None]:
# display a small random sample transposed in order to see all variables
df.sample(3).T

In [None]:
# describe non-numerical features
try:
    with pd.option_context('display.float_format', '{:,.2f}'.format):
        display(df.describe(exclude='number'))
except:
    print("No categorical data in dataset.")

In [None]:
# describe numerical features
try:
    with pd.option_context('display.float_format', '{:,.2f}'.format):
        display(df.describe(include='number'))
except:
    print("No numercial data in dataset.")

In [None]:
# check missing values with missingno
# https://github.com/ResidentMario/missingno
import missingno as msno
msno.matrix(df, labels=True, sort='descending');

In [None]:
# plot a histogram for each numerical feature
try:
    df.hist(bins=25, rwidth=.9)
    plt.tight_layout()
    plt.show()
except:
    print("No numercial data to plot.") 

In [None]:
# continue your code here...

**Contact**: opendata@sbb.ch | Open Data Mobilität Schweiz