# Import der Daten für PowerBI

Liest die Exceldatei des Kapitels *PowerBI* in eine SQL Server Datenbank, sodass sie mit PowerBI visualisiert werden kann.
Voraussetzung ist ein SQL Server Container:

*docker run -d -p 1433:1433  --name sqlserver2019 -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=SqlServer2019" mcr.microsoft.com/azure-sql-edge*

Quelle: http://www.griesmayer.com/?menu=Business%20Intelligence&semester=Semsester_6&topic=05_PowerBI

In [1]:
from pathlib import Path
import sqlalchemy, re
import pandas as pd

# Prompt for IP, username and password.
host = input("Host (IP), Enter für localhost.") or "localhost" if "host" not in locals() else host
username = input("Username, Enter für sa.") or "sa" if "username" not in locals() else username
password = input("Passwort, Enter für SqlServer2019.") or "SqlServer2019" if "password" not in locals() else password
database = "Sales"
connection_url = sqlalchemy.engine.URL.create("mssql+pyodbc", username=username,
    password=password, host=host, database=database,
    query={ "driver": "ODBC Driver 18 for SQL Server" })
# We cannot connect to sales to create the database (does not exist at this time). We use tempdb.
# Autocommit is necessary for create database and ddl statements.
tempdb_engine = sqlalchemy.create_engine(
    connection_url.set(database="tempdb"), isolation_level="AUTOCOMMIT", 
    connect_args={"TrustServerCertificate": "yes"})
# We drop the database just before connecting, so we set pool_pre_ping=True
engine = sqlalchemy.create_engine(
    connection_url, fast_executemany=True, pool_pre_ping=True,
    connect_args={"TrustServerCertificate": "yes"})


Zuerst löschen wir die Datenbank und erstellen sie neu.
Das ist natürlich nur zum Testen, sonst ist das Löschen der Datenbank nicht ideal...

In [9]:
with tempdb_engine.connect() as conn: 
    try: conn.execute(sqlalchemy.text(f"ALTER DATABASE {database} SET SINGLE_USER WITH ROLLBACK IMMEDIATE"))
    except: pass
    conn.execute(sqlalchemy.text(f"DROP DATABASE IF EXISTS {database}"))
    conn.execute(sqlalchemy.text(f"CREATE DATABASE {database}"))
with engine.connect() as conn:
    conn.execution_options(isolation_level="AUTOCOMMIT")
    conn.execute(sqlalchemy.text("""
        CREATE TABLE Country (
            CountryID   INTEGER       PRIMARY KEY,
            Country     VARCHAR(255)  NOT NULL,
            Continent   VARCHAR(32)   NOT NULL,
            Region      VARCHAR(255)  NOT NULL,
            Population  INTEGER       NOT NULL,
            Size        VARCHAR(8)    NOT NULL
        )
    """))
    conn.execute(sqlalchemy.text("""
        CREATE TABLE Product (
            ProduktID         INTEGER      PRIMARY KEY,
            ProductKategory   VARCHAR(255) NOT NULL,
            ShoppingBasket    INTEGER      NOT NULL,
            Price             DECIMAL(9,2) NOT NULL,
            Cost              DECIMAL(9,2) NOT NULL
        )
    """))
    conn.execute(sqlalchemy.text("""
        CREATE TABLE Customer (
            CustomerID     INTEGER      PRIMARY KEY,
            FirstName      VARCHAR(255) NOT NULL,
            Gender         VARCHAR(8)   NOT NULL,
            Income         VARCHAR(16)  NOT NULL,
            ShoppingBasket INTEGER      NOT NULL
        )
    """))
    conn.execute(sqlalchemy.text("""
        CREATE TABLE Sales (
            SalesID      INTEGER       PRIMARY KEY IDENTITY(1,1),
            Date         DATE          NOT NULL,
            CountryID    INTEGER       NOT NULL,
            ProduktID    INTEGER       NOT NULL,
            CustomerID   INTEGER       NOT NULL,
            Pieces       INTEGER       NOT NULL,
            Revenue      DECIMAL(9,2)  NOT NULL,
            Cost         DECIMAL(9,2)  NOT NULL,
            Margin       DECIMAL(9,2)  NOT NULL,
            FOREIGN KEY (CountryID)  REFERENCES Country(CountryID),
            FOREIGN KEY (ProduktID)  REFERENCES Product(ProduktID),
            FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)
        )
    """))


Nun wird die Exceldatei *Sales.xls* gelesen.
Dafür muss das Python paket *xlrd* mit dem Befehl *pip3 install xlrd --upgrade* installiert werden.
Es liest das alte Excelformat (xls) ein.

In [10]:
# pip3 install xlrd --upgrade
cols = {
    "Product": ["ProduktID", "ProductKategory", "ShoppingBasket", "Price", "Cost"],
    "Sales": ["Date", "CountryID", "ProduktID", "Pieces", "Revenue", "Cost", "Margin", "CustomerID"]
}
with engine.connect() as conn:
    for sheet in ["Product", "Country", "Customer", "Sales"]:
        data = pd.read_excel("Sales.xlsx", sheet_name=sheet, usecols=cols.get(sheet))
        data.to_sql(sheet, conn, if_exists="append", index=False)
        conn.commit()
