## Open Government Data, provided by **Statistisches Amt des Kantons Basel-Stadt - DCC Data Competence Center**
*Autogenerated Python starter code for data set with identifier* **100254**

## Dataset
# **Tägliche Klimadaten der NBCN-Station Basel-Binningen**

## Data set links

[Direct data shop link for dataset](https://data.bs.ch/explore/dataset/100254)

## Metadata
- **Dataset_identifier** `100254`
- **Title** `Tägliche Klimadaten der NBCN-Station Basel-Binningen`
- **Description** `<p>Tagesdaten der NBCN-Station (Swiss National Basic Climatological Network) Basel-Binningen der MeteoSchweiz. Komplette, <a href='https://www.meteoschweiz.admin.ch/klima/klimawandel/entwicklung-temperatur-niederschlag-sonnenschein/homogene-messreihen-ab-1864/homogenisierung-von-klima-messreihen.html' target='_blank'>nicht homogenisierte</a>  Datenreihe der wichtigsten Tageswerte seit Messbeginn.</p><p> Methodischer Hinweis:<br>Die Berechnung des Tagesmittels der Lufttemperatur erfolgte je nach historischer Zeitperiode unterschiedlich. <br>Bis 1980 wurden die Temperaturwerte nur dreimal täglich erfasst, der Mittelwert basiert entsprechend auf diesen drei Ablesungen. Ab 1981 liegen Stundenmittel zugrunde und ab 2018 10min-Werte. Das tägliche Minimum und Maximum der Lufttemperatur wurde bis 1980 anhand von Minimum- bzw. Maximum-Thermometern erfasst.</p>`
- **Contact_name** `Open Data Basel-Stadt`
- **Issued** `2023-02-10`
- **Modified** `2026-02-17T15:02:13+00:00`
- **Rights** `NonCommercialAllowed-CommercialAllowed-ReferenceRequired`
- **Temporal_coverage_start_date** `1863-12-31T23:30:14+00:00`
- **Temporal_coverage_end_date** `2026-02-15T23:00:00+00:00`
- **Themes** `['Raum und Umwelt']`
- **Keywords** `['Niederschlag', 'Strahlung', 'Druck', 'Luftdruck', 'Sonne', 'Klimatologie', 'Klima', 'Temperatur', 'Lufttemperatur', 'Sonnenschein', 'Sonnenscheindauer', 'Wetter', 'Luftfeuchtigkeit', 'Atmosphäre']`
- **Publisher** `MeteoSchweiz`
- **Reference** `None`


## Imports and helper functions

In [None]:
import os
import pandas as pd
import requests
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
plt.style.use("ggplot")

params = {
    "text.color": (0.25, 0.25, 0.25),
    "figure.figsize": [18, 6],
}

plt.rcParams.update(params)

In [None]:
# helper function for reading datasets with proper separator
def get_dataset(url):
    r = requests.get(url, params={"format": "csv", "timezone": "Europe%2FZurich"})
    data_path = os.path.join(os.getcwd(), "..", "data")
    if not os.path.exists(data_path):
        os.makedirs(data_path)
    csv_path = os.path.join(data_path, "100254.csv")
    with open(csv_path, "wb") as f:
        f.write(r.content)
    data = pd.read_csv(
        url, sep=";", on_bad_lines="warn", encoding_errors="ignore", low_memory=False
    )
    # if dataframe only has one column or less the data is not ";" separated
    if data.shape[1] <= 1:
        print(
            "The data wasn't imported properly. Very likely the correct separator couldn't be found.\nPlease check the dataset manually and adjust the code."
        )
    return data

## Load data

The dataset is read into a dataframe

In [None]:
# Read the dataset
df = get_dataset('https://data.bs.ch/explore/dataset/100254/download')

## Analyze data

In [None]:
# drop columns that have no values
df.dropna(how="all", axis=1, inplace=True)

In [None]:
print(
    f"The dataset has {df.shape[0]:,.0f} rows (observations) and {df.shape[1]:,.0f} columns (variables)."
)
print(f"There seem to be {df.duplicated().sum()} exact duplicates in the data.")

In [None]:
df.info(memory_usage="deep", verbose=True)

In [None]:
df.head()

In [None]:
# display a small random sample transposed in order to see all variables
df.sample(3).T

In [None]:
# describe non-numerical features
try:
    with pd.option_context("display.float_format", "{:,.2f}".format):
        display(df.describe(exclude="number"))
except ValueError:
    print("No categorical data in dataset.")

In [None]:
# describe numerical features
try:
    with pd.option_context("display.float_format", "{:,.2f}".format):
        display(df.describe(include="number"))
except ValueError:
    print("No numercial data in dataset.")

In [None]:
# check missing values with missingno
# https://github.com/ResidentMario/missingno
import missingno as msno

msno.matrix(df, labels=True, sort="descending");

In [None]:
# plot a histogram for each numerical feature
try:
    df.hist(bins=25, rwidth=0.9)
    plt.tight_layout()
    plt.show()
except ValueError:
    print("No numercial data to plot.")

In [None]:
# continue your code here...

**Questions about the data?** Open Data Basel-Stadt | opendata@bs.ch