In [1]:
import getml
from challenge.utils.data import load_ctu_dataset

getml.set_project("db_transformer_northwind")

# Task: northwind
### Dataset Description
> <span style="font-weight: 500; color: #3b3b3b;">ⓘ️&nbsp; Generated by `gpt-4o`</span>
>
> *Data Model (Relational Schema)*
> 
> The Northwind dataset is structured into multiple tables, including:
> 
> - **Orders**: Contains details about customer orders.
> - **Order Details**: Provides specifics about each order, such as product and quantity.
> - **Products**: Lists products available for sale.
> - **Customers**: Contains customer information.
> - **Employees**: Details about employees.
> - **Suppliers**: Information about product suppliers.
> - **Categories**: Product categories.
> - **Shippers**: Shipping companies.
> - **Territories**: Sales territories.
> - **Region**: Geographic regions.
> 
> *Task and Target Column*
> 
> The primary task associated with this dataset is *regression*, with the target column being *Freight* in the *Orders* table.
> 
> *Types of the Columns*
> 
> - **Numeric**: Includes columns like `UnitPrice`, `Quantity`, `Freight`.
> - **String**: Includes `CustomerID`, `ProductName`, `CompanyName`.
> - **LOB (Large Object)**: Includes `Photo`, `Picture`.
> - **Temporal**: Includes `OrderDate`, `ShippedDate`, `HireDate`.
> 
> *Metadata about the Dataset*
> 
> - **Size**: 1.1 MB
> - **Number of Tables**: 29
> - **Number of Rows**: 3,308
> - **Number of Columns**: 191
> - **Missing Values**: Yes
> - **Compound Keys**: No
> - **Loops**: No
> - **Type**: Synthetic
> - **Instance Count**: 830
> 
> This dataset is a synthetic representation of a retail business, providing a comprehensive view of sales, products, and customer interactions.

### Tables
Population table: orders

<h4>
  <details open>
     <summary>ER Diagram</summary>
       <img src="https://relational.fel.cvut.cz/assets/img/datasets-generated/northwind.svg" alt="northwind ER Diagram">
   </details>
</h4>

To load the dataset, we use the `load_ctu_dataset` function from the `utils`
module. This function returns a tuple with the population table as the first
element and the a dictionary of peripheral tables as the second element.

In [2]:
orders, peripheral = load_ctu_dataset("northwind")

(
    employees,
    suppliers,
    territories,
    shippers,
    categories,
    order_details,
    customer_demographics,
    region,
    products,
    customers,
    customer_customer_demo,
    employee_territories,
) = peripheral.values()

Analyzing schema:   0%|          | 0/13 [00:00<?, ?it/s]

Downloading tables:   0%|          | 0/13 [00:00<?, ?it/s]

Building data:   0%|          | 0/13 [00:00<?, ?it/s]

Now, we can inspect all tables and annotate the columns with [roles](https://getml.com/latest/user_guide/concepts/annotating_data/).

The population table (`orders`). We already set the `target` role for the target (`Freight`). If the task is a multiclass classification,
we split the target column into multiple columns in an one-vs-all fashion. In this case, the original target is still avaiable as `Freight`.

In [3]:
# TODO: Annotate remaining columns with roles
orders

name,Freight,OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry,split
role,target,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,32.38,10248,VINET,5,1996-07-04 00:00:00.000000,1996-08-01 00:00:00.000000,1996-07-16 00:00:00.000000,3,Vins et alcools Chevalier,59 rue de l-Abbaye,Reims,,51100,France,train
1.0,11.61,10249,TOMSP,6,1996-07-05 00:00:00.000000,1996-08-16 00:00:00.000000,1996-07-10 00:00:00.000000,1,Toms Spezialitten,Luisenstr. 48,Mnster,,44087,Germany,train
2.0,65.83,10250,HANAR,4,1996-07-08 00:00:00.000000,1996-08-05 00:00:00.000000,1996-07-12 00:00:00.000000,2,Hanari Carnes,"Rua do Pao, 67",Rio de Janeiro,RJ,05454-876,Brazil,train
3.0,41.34,10251,VICTE,3,1996-07-08 00:00:00.000000,1996-08-05 00:00:00.000000,1996-07-15 00:00:00.000000,1,Victuailles en stock,"2, rue du Commerce",Lyon,,69004,France,train
4.0,51.3,10252,SUPRD,4,1996-07-09 00:00:00.000000,1996-08-06 00:00:00.000000,1996-07-11 00:00:00.000000,2,Suprmes dlices,"Boulevard Tirou, 255",Charleroi,,B-6000,Belgium,train
,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
825.0,24.95,11073,PERIC,2,1998-05-05 00:00:00.000000,1998-06-02 00:00:00.000000,,2,Pericles Comidas clsicas,Calle Dr. Jorge Cash 321,Mxico D.F.,,5033,Mexico,train
826.0,18.44,11074,SIMOB,7,1998-05-06 00:00:00.000000,1998-06-03 00:00:00.000000,,2,Simons bistro,Vinbltet 34,Kobenhavn,,1734,Denmark,train
827.0,6.19,11075,RICSU,8,1998-05-06 00:00:00.000000,1998-06-03 00:00:00.000000,,2,Richter Supermarkt,Starenweg 5,Genve,,1204,Switzerland,train
828.0,38.28,11076,BONAP,4,1998-05-06 00:00:00.000000,1998-06-03 00:00:00.000000,,2,Bon app-,"12, rue des Bouchers",Marseille,,13008,France,train


Peripheral tables,

In [4]:
# TODO: Annotate columns with roles
employees

name,SupplierID,CompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,HomePage
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,1,Exotic Liquids,Charlotte Cooper,Purchasing Manager,49 Gilbert St.,London,,EC1 4SD,UK,(171) 555-2222,,
1.0,2,New Orleans Cajun Delights,Shelley Burke,Order Administrator,P.O. Box 78934,New Orleans,LA,70117,USA,(100) 555-4822,,#CAJUN.HTM#
2.0,3,Grandma Kelly's Homestead,Regina Murphy,Sales Representative,707 Oxford Rd.,Ann Arbor,MI,48104,USA,(313) 555-5735,(313) 555-3349,
3.0,4,Tokyo Traders,Yoshi Nagase,Marketing Manager,9-8 Sekimai Musashino-shi,Tokyo,,100,Japan,(03) 3555-5011,,
4.0,5,Cooperativa de Quesos 'Las Cabra...,Antonio del Valle Saavedra,Export Administrator,Calle del Rosal 4,Oviedo,Asturias,33007,Spain,(98) 598 76 54,,
,...,...,...,...,...,...,...,...,...,...,...,...
24.0,25,Ma Maison,Jean-Guy Lauzon,Marketing Manager,2960 Rue St. Laurent,Montral,Qubec,H1J 1C3,Canada,(514) 555-9022,,
25.0,26,Pasta Buttini s.r.l.,Giovanni Giudici,Order Administrator,"Via dei Gelsomini, 153",Salerno,,84100,Italy,(089) 6547665,(089) 6547667,
26.0,27,Escargots Nouveaux,Marie Delamare,Sales Manager,"22, rue H. Voiron",Montceau,,71300,France,85.57.00.07,,
27.0,28,Gai pturage,Eliane Noz,Sales Representative,"Bat. B 3, rue des Alpes",Annecy,,74000,France,38.76.98.06,38.76.98.58,


In [5]:
# TODO: Annotate columns with roles
suppliers

name,EmployeeID,TerritoryID
role,unused_string,unused_string
0.0,1,06897
1.0,1,19713
2.0,2,01581
3.0,2,01730
4.0,2,01833
,...,...
44.0,9,48075
45.0,9,48084
46.0,9,48304
47.0,9,55113


In [6]:
# TODO: Annotate columns with roles
territories

name,ShipperID,CompanyName,Phone
role,unused_string,unused_string,unused_string
0,1,Speedy Express,(503) 555-9831
1,2,United Package,(503) 555-3199
2,3,Federal Shipping,(503) 555-9931


In [7]:
# TODO: Annotate columns with roles
shippers

name,OrderID,ProductID,UnitPrice,Quantity,Discount
role,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,10248,11,14,12,0
1.0,10248,42,9.8,10,0
2.0,10248,72,34.8,5,0
3.0,10249,14,18.6,9,0
4.0,10249,51,42.4,40,0
,...,...,...,...,...
2150.0,11077,64,33.25,2,0
2151.0,11077,66,17,1,0
2152.0,11077,73,15,2,0
2153.0,11077,75,7.75,4,0


In [8]:
# TODO: Annotate columns with roles
categories

name,RegionID,RegionDescription
role,unused_string,unused_string
0,1,Eastern ...
1,2,Westerns ...
2,3,Northern ...
3,3,Northern ...


In [9]:
# TODO: Annotate columns with roles
order_details

name
0
1
2
3
4
6
7
8
9
10


In [10]:
# TODO: Annotate columns with roles
customer_demographics

name,TerritoryID,TerritoryDescription,RegionID
role,unused_string,unused_string,unused_string
0.0,01581,Westboro ...,1
1.0,01730,Bedford ...,1
2.0,01833,Georgetow ...,1
3.0,02116,Boston ...,1
4.0,02139,Cambridge ...,1
,...,...,...
48.0,95054,Santa Clara ...,2
49.0,95060,Santa Cruz ...,2
50.0,98004,Bellevue ...,2
51.0,98052,Redmond ...,2


In [11]:
# TODO: Annotate columns with roles
region

name,CustomerID,CompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,030-0076545
1.0,ANATR,Ana Trujillo Emparedados y helad...,Ana Trujillo,Owner,Avda. de la Constitucin 2222,Mxico D.F.,,05021,Mexico,(5) 555-4729,(5) 555-3745
2.0,ANTON,Antonio Moreno Taquera,Antonio Moreno,Owner,Mataderos 2312,Mxico D.F.,,05023,Mexico,(5) 555-3932,
3.0,AROUT,Around the Horn,Thomas Hardy,Sales Representative,120 Hanover Sq.,London,,WA1 1DP,UK,(171) 555-7788,(171) 555-6750
4.0,BERGS,Berglunds snabbkp,Christina Berglund,Order Administrator,Berguvsvgen 8,Lule,,S-958 22,Sweden,0921-12 34 65,0921-12 34 67
,...,...,...,...,...,...,...,...,...,...,...
86.0,WARTH,Wartian Herkku,Pirkko Koskitalo,Accounting Manager,Torikatu 38,Oulu,,90110,Finland,981-443655,981-443655
87.0,WELLI,Wellington Importadora,Paula Parente,Sales Manager,"Rua do Mercado, 12",Resende,SP,08737-363,Brazil,(14) 555-8122,
88.0,WHITC,White Clover Markets,Karl Jablonski,Owner,305 - 14th Ave. S. Suite 3B,Seattle,WA,98128,USA,(206) 555-4112,(206) 555-4115
89.0,WILMK,Wilman Kala,Matti Karttunen,Owner/Marketing Assistant,Keskuskatu 45,Helsinki,,21240,Finland,90-224 8858,90-224 8858


In [12]:
# TODO: Annotate columns with roles
products

name,EmployeeID,LastName,FirstName,Title,TitleOfCourtesy,BirthDate,HireDate,Address,City,Region,PostalCode,Country,HomePhone,Extension,Photo,Notes,ReportsTo,PhotoPath,Salary
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0,1,Davolio,Nancy,Sales Representative,Ms.,1948-12-08 00:00:00.000000,1992-05-01 00:00:00.000000,507 - 20th Ave. E.Apt. 2A,Seattle,WA,98122,USA,(206) 555-9857,5467,����,Education includes a BA in psych...,2.0,http://accweb/emmployees/davolio...,2954.55
1,2,Fuller,Andrew,"Vice President, Sales",Dr.,1952-02-19 00:00:00.000000,1992-08-14 00:00:00.000000,908 W. Capital Way,Tacoma,WA,98401,USA,(206) 555-9482,3457,����,Andrew received his BTS commerci...,,http://accweb/emmployees/fuller....,2254.49
2,3,Leverling,Janet,Sales Representative,Ms.,1963-08-30 00:00:00.000000,1992-04-01 00:00:00.000000,722 Moss Bay Blvd.,Kirkland,WA,98033,USA,(206) 555-3412,3355,����,Janet has a BS degree in chemist...,2.0,http://accweb/emmployees/leverli...,3119.15
3,4,Peacock,Margaret,Sales Representative,Mrs.,1937-09-19 00:00:00.000000,1993-05-03 00:00:00.000000,4110 Old Redmond Rd.,Redmond,WA,98052,USA,(206) 555-8122,5176,����,Margaret holds a BA in English l...,2.0,http://accweb/emmployees/peacock...,1861.08
4,5,Buchanan,Steven,Sales Manager,Mr.,1955-03-04 00:00:00.000000,1993-10-17 00:00:00.000000,14 Garrett Hill,London,,SW1 8JR,UK,(71) 555-4848,3453,����,Steven Buchanan graduated from S...,2.0,http://accweb/emmployees/buchana...,1744.21
5,6,Suyama,Michael,Sales Representative,Mr.,1963-07-02 00:00:00.000000,1993-10-17 00:00:00.000000,Coventry House Miner Rd.,London,,EC2 7JR,UK,(71) 555-7773,428,����,Michael is a graduate of Sussex ...,5.0,http://accweb/emmployees/davolio...,2004.07
6,7,King,Robert,Sales Representative,Mr.,1960-05-29 00:00:00.000000,1994-01-02 00:00:00.000000,Edgeham Hollow Winchester Way,London,,RG1 9SP,UK,(71) 555-5598,465,����,Robert King served in the Peace ...,5.0,http://accweb/emmployees/davolio...,1991.55
7,8,Callahan,Laura,Inside Sales Coordinator,Ms.,1958-01-09 00:00:00.000000,1994-03-05 00:00:00.000000,4726 - 11th Ave. N.E.,Seattle,WA,98105,USA,(206) 555-1189,2344,����,Laura received a BA in psycholog...,2.0,http://accweb/emmployees/davolio...,2100.5
8,9,Dodsworth,Anne,Sales Representative,Ms.,1966-01-27 00:00:00.000000,1994-11-15 00:00:00.000000,7 Houndstooth Rd.,London,,WG2 7LT,UK,(71) 555-4444,452,����,Anne has a BA degree in English ...,5.0,http://accweb/emmployees/davolio...,2333.33


In [13]:
# TODO: Annotate columns with roles
customers

name,CategoryID,CategoryName,Description,Picture
role,unused_string,unused_string,unused_string,unused_string
0,1,Beverages,"Soft drinks, coffees, teas, beer...",����
1,2,Condiments,"Sweet and savory sauces, relishe...",����
2,3,Confections,"Desserts, candies, and sweet bre...",����
3,4,Dairy Products,Cheeses,����
4,5,Grains/Cereals,"Breads, crackers, pasta, and cer...",����
5,5,Grains/Cereals,"Breads, crackers, pasta, and cer...",����
6,6,Meat/Poultry,Prepared meats,����
7,7,Produce,Dried fruit and bean curd,����


In [14]:
# TODO: Annotate columns with roles
customer_customer_demo

name
0
1
2
3
4
6
7
8
9
10


In [15]:
# TODO: Annotate columns with roles
employee_territories

name,ProductID,ProductName,SupplierID,CategoryID,QuantityPerUnit,UnitPrice,UnitsInStock,UnitsOnOrder,ReorderLevel,Discontinued
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,1,Chai,1,1,10 boxes x 20 bags,18,39,0,10,0
1.0,2,Chang,1,1,24 - 12 oz bottles,19,17,40,25,0
2.0,3,Aniseed Syrup,1,2,12 - 550 ml bottles,10,13,70,25,0
3.0,4,Chef Anton's Cajun Seasoning,2,2,48 - 6 oz jars,22,53,0,0,0
4.0,5,Chef Anton's Gumbo Mix,2,2,36 boxes,21.35,0,0,0,1
,...,...,...,...,...,...,...,...,...,...
72.0,73,Rd Kaviar,17,8,24 - 150 g jars,15,101,0,5,0
73.0,74,Longlife Tofu,4,7,5 kg pkg.,10,4,20,5,0
74.0,75,Rhnbru Klosterbier,12,1,24 - 0.5 l bottles,7.75,125,0,25,0
75.0,76,Lakkalikri,23,1,500 ml,18,57,0,20,0


The next step is to define the data model. Refer to [https://relational.fel.cvut.cz/dataset/northwind](https://relational.fel.cvut.cz/dataset/northwind)
for a description of the dataset.

In [16]:
dm = getml.data.DataModel(population=orders.to_placeholder())
dm.add(getml.data.to_placeholder(**peripheral))

# TODO
# dm.population.join(...)

Now we can create the container and add the tables to it.

In [17]:
container = getml.data.Container(population=orders, split=orders.split)
container.add(**peripheral)

container

Unnamed: 0,subset,name,rows,type
0,train,orders,581,View
1,val,orders,249,View

Unnamed: 0,name,rows,type
0.0,suppliers,29,DataFrame
1.0,employee_territories,49,DataFrame
2.0,shippers,3,DataFrame
3.0,order details,2155,DataFrame
4.0,region,4,DataFrame
,...,...,...
7.0,customers,91,DataFrame
8.0,employees,9,DataFrame
9.0,categories,8,DataFrame
10.0,customer_customer_demo,0,DataFrame
