In [1]:
import getml
from challenge.utils.data import load_ctu_dataset

getml.set_project("db_transformer_classicmodels")

# Task: classicmodels
### Dataset Description
> <span style="font-weight: 500; color: #3b3b3b;">ⓘ️&nbsp; Generated by `gpt-4o`</span>
>
> *ClassicModels Dataset Description*
> 
> - *Data Model (Relational Schema)*:
>   - **orderdetails**: Contains columns `orderNumber` (int), `productCode` (varchar), `quantityOrdered` (int), `priceEach` (double), and `orderLineNumber` (smallint).
>   - **orders**: Contains columns `orderNumber` (int), `orderDate` (date), `requiredDate` (date), `shippedDate` (date), `status` (varchar), `comments` (text), and `customerNumber` (int).
>   - **payments**: Contains columns `customerNumber` (int), `checkNumber` (varchar), `paymentDate` (date), and `amount` (double).
>   - **products**: Contains columns `productCode` (varchar), `productName` (varchar), `productLine` (varchar), `productScale` (varchar), `productVendor` (varchar), `productDescription` (text), `quantityInStock` (smallint), `buyPrice` (double), and `MSRP` (double).
>   - **customers**: Contains columns `customerNumber` (int), `customerName` (varchar), `contactLastName` (varchar), `contactFirstName` (varchar), `phone` (varchar), `addressLine1` (varchar), `addressLine2` (varchar), `city` (varchar), `state` (varchar), `postalCode` (varchar), `country` (varchar), `salesRepEmployeeNumber` (int), and `creditLimit` (double).
>   - **employees**: Contains columns `employeeNumber` (int), `lastName` (varchar), `firstName` (varchar), `extension` (varchar), `email` (varchar), `officeCode` (varchar), `reportsTo` (int), and `jobTitle` (varchar).
>   - **offices**: Contains columns `officeCode` (varchar), `city` (varchar), `phone` (varchar), `addressLine1` (varchar), `addressLine2` (varchar), `state` (varchar), `country` (varchar), `postalCode` (varchar), and `territory` (varchar).
>   - **productlines**: Contains columns `productLine` (varchar), `textDescription` (varchar), `htmlDescription` (mediumtext), and `image` (mediumblob).
> 
> - *Task*: Regression
>   - *Target Column*: `amount` in the `payments` table.
> 
> - *Types of the Columns*:
>   - *Int*: Used for identifiers and quantities.
>   - *Varchar*: Used for names, codes, and descriptions.
>   - *Double*: Used for prices and amounts.
>   - *Date*: Used for dates.
>   - *Text/Mediumtext/Mediumblob*: Used for descriptions and images.
> 
> - *Metadata*:
>   - *Size*: 500 KB
>   - *Number of Tables*: 8
>   - *Number of Rows*: 3,864
>   - *Number of Columns*: 59
>   - *Missing Values*: Yes
>   - *Instance Count*: 273
>   - *Target Table*: `payments`
>   - *Target ID*: `checkNumber`
>   - *Target Timestamp*: `paymentDate`
> 
> This dataset represents business data for a retailer of scale models of classic cars, including customers, orders, products, and payments.

### Tables
Population table: payments

<h4>
  <details open>
     <summary>ER Diagram</summary>
       <img src="https://relational.fel.cvut.cz/assets/img/datasets-generated/classicmodels.svg" alt="classicmodels ER Diagram">
   </details>
</h4>

To load the dataset, we use the `load_ctu_dataset` function from the `utils`
module. This function returns a tuple with the population table as the first
element and the a dictionary of peripheral tables as the second element.

In [2]:
payments, peripheral = load_ctu_dataset("classicmodels")

(
    offices,
    employees,
    orderdetails,
    productlines,
    products,
    orders,
    customers,
) = peripheral.values()

Analyzing schema:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading tables:   0%|          | 0/8 [00:00<?, ?it/s]

Building data:   0%|          | 0/8 [00:00<?, ?it/s]

Now, we can inspect all tables and annotate the columns with [roles](https://getml.com/latest/user_guide/concepts/annotating_data/).

The population table (`payments`). We already set the `target` role for the target (`amount`). If the task is a multiclass classification,
we split the target column into multiple columns in an one-vs-all fashion. In this case, the original target is still avaiable as `amount`.

In [3]:
# TODO: Annotate remaining columns with roles
payments

name,amount,customerNumber,checkNumber,paymentDate,split
role,target,unused_string,unused_string,unused_string,unused_string
0.0,6066.78,103,HQ336336,2004-10-19,train
1.0,14571.44,103,JM555205,2003-06-05,train
2.0,1676.14,103,OM314933,2004-12-18,val
3.0,14191.12,112,BO864823,2004-12-17,train
4.0,32641.98,112,HQ55022,2003-06-06,train
,...,...,...,...,...
268.0,59265.14,495,BH167026,2003-12-26,val
269.0,6276.6,495,FN155234,2004-05-14,train
270.0,30253.75,496,EU531600,2005-05-25,val
271.0,32077.44,496,MB342426,2003-07-16,val


Peripheral tables,

In [4]:
# TODO: Annotate columns with roles
offices

name,orderNumber,orderDate,requiredDate,shippedDate,status,comments,customerNumber
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,10100,2003-01-06,2003-01-13,2003-01-10,Shipped,,363
1.0,10101,2003-01-09,2003-01-18,2003-01-11,Shipped,Check on availability.,128
2.0,10102,2003-01-10,2003-01-18,2003-01-14,Shipped,,181
3.0,10103,2003-01-29,2003-02-07,2003-02-02,Shipped,,121
4.0,10104,2003-01-31,2003-02-09,2003-02-01,Shipped,,141
,...,...,...,...,...,...,...
321.0,10421,2005-05-29,2005-06-06,,In Process,Custom shipping instructions wer...,124
322.0,10422,2005-05-30,2005-06-11,,In Process,,157
323.0,10423,2005-05-30,2005-06-05,,In Process,,314
324.0,10424,2005-05-31,2005-06-08,,In Process,,141


In [5]:
# TODO: Annotate columns with roles
employees

name,employeeNumber,lastName,firstName,extension,email,officeCode,reportsTo,jobTitle
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,1002,Murphy,Diane,x5800,dmurphy@classicmodelcars.com,1,,President
1.0,1056,Patterson,Mary,x4611,mpatterso@classicmodelcars.com,1,1002,VP Sales
2.0,1076,Firrelli,Jeff,x9273,jfirrelli@classicmodelcars.com,1,1002,VP Marketing
3.0,1088,Patterson,William,x4871,wpatterson@classicmodelcars.com,6,1056,Sales Manager (APAC)
4.0,1102,Bondur,Gerard,x5408,gbondur@classicmodelcars.com,4,1056,Sale Manager (EMEA)
,...,...,...,...,...,...,...,...
18.0,1612,Marsh,Peter,x102,pmarsh@classicmodelcars.com,6,1088,Sales Rep
19.0,1619,King,Tom,x103,tking@classicmodelcars.com,6,1088,Sales Rep
20.0,1621,Nishi,Mami,x101,mnishi@classicmodelcars.com,5,1056,Sales Rep
21.0,1625,Kato,Yoshimi,x102,ykato@classicmodelcars.com,5,1621,Sales Rep


In [6]:
# TODO: Annotate columns with roles
orderdetails

name,customerNumber,customerName,contactLastName,contactFirstName,phone,addressLine1,addressLine2,city,state,postalCode,country,salesRepEmployeeNumber,creditLimit
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,103,Atelier graphique,Schmitt,Carine,40.32.2555,"54, rue Royale",,Nantes,,44000,France,1370,21000
1.0,112,Signal Gift Stores,King,Jean,7025551838,8489 Strong St.,,Las Vegas,NV,83030,USA,1166,71800
2.0,114,"Australian Collectors, Co.",Ferguson,Peter,03 9520 4555,636 St Kilda Road,Level 3,Melbourne,Victoria,3004,Australia,1611,117300
3.0,119,La Rochelle Gifts,Labrune,Janine,40.67.8555,"67, rue des Cinquante Otages",,Nantes,,44000,France,1370,118200
4.0,121,Baane Mini Imports,Bergulfsen,Jonas,07-98 9555,Erling Skakkes gate 78,,Stavern,,4110,Norway,1504,81700
,...,...,...,...,...,...,...,...,...,...,...,...,...
117.0,486,Motor Mint Distributors Inc.,Salazar,Rosa,2155559857,11328 Douglas Av.,,Philadelphia,PA,71270,USA,1323,72600
118.0,487,Signal Collectibles Ltd.,Taylor,Sue,4155554312,2793 Furth Circle,,Brisbane,CA,94217,USA,1165,60300
119.0,489,"Double Decker Gift Stores, Ltd",Smith,Thomas,(171) 555-7555,120 Hanover Sq.,,London,,WA1 1DP,UK,1501,43300
120.0,495,Diecast Collectables,Franco,Valarie,6175552555,6251 Ingle Ln.,,Boston,MA,51003,USA,1188,85100


In [7]:
# TODO: Annotate columns with roles
productlines

name,productCode,productName,productLine,productScale,productVendor,productDescription,quantityInStock,buyPrice,MSRP
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,S10_1678,1969 Harley Davidson Ultimate Ch...,Motorcycles,1:10,Min Lin Diecast,This replica features working ki...,7933,48.81,95.7
1.0,S10_1949,1952 Alpine Renault 1300,Classic Cars,1:10,Classic Metal Creations,Turnable front wheels; steering ...,7305,98.58,214.3
2.0,S10_2016,1996 Moto Guzzi 1100i,Motorcycles,1:10,Highway 66 Mini Classics,Official Moto Guzzi logos and in...,6625,68.99,118.94
3.0,S10_4698,2003 Harley-Davidson Eagle Drag ...,Motorcycles,1:10,Red Start Diecast,"Model features, official Harley ...",5582,91.02,193.66
4.0,S10_4757,1972 Alfa Romeo GTA,Classic Cars,1:10,Motor City Art Classics,Features include: Turnable front...,3252,85.68,136
,...,...,...,...,...,...,...,...,...
105.0,S700_3505,The Titanic,Ships,1:700,Carousel DieCast Legends,Completed model measures 19 1/2 ...,1956,51.09,100.17
106.0,S700_3962,The Queen Mary,Ships,1:700,Welly Diecast Productions,Exact replica. Wood and Metal. M...,5088,53.63,99.31
107.0,S700_4002,American Airlines: MD-11S,Planes,1:700,Second Gear Diecast,Polished finish. Exact replia wi...,8820,36.27,74.03
108.0,S72_1253,Boeing X-32A JSF,Planes,1:72,Motor City Art Classics,"10"" Wingspan with retractable la...",4857,32.77,49.66


In [8]:
# TODO: Annotate columns with roles
products

name,officeCode,city,phone,addressLine1,addressLine2,state,country,postalCode,territory
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0,1,San Francisco,+1 650 219 4782,100 Market Street,Suite 300,CA,USA,94080,
1,2,Boston,+1 215 837 0825,1550 Court Place,Suite 102,MA,USA,02107,
2,3,NYC,+1 212 555 3000,523 East 53rd Street,apt. 5A,NY,USA,10022,
3,4,Paris,+33 14 723 4404,43 Rue Jouffroy D'abbans,,,France,75017,EMEA
4,5,Tokyo,+81 33 224 5000,4-1 Kioicho,,Chiyoda-Ku,Japan,102-8578,Japan
5,6,Sydney,+61 2 9264 2451,5-11 Wentworth Avenue,Floor #2,,Australia,NSW 2010,APAC
6,7,London,+44 20 7877 2041,25 Old Broad Street,Level 7,,UK,EC2N 1HN,EMEA


In [9]:
# TODO: Annotate columns with roles
orders

name,productLine,textDescription,htmlDescription,image
role,unused_string,unused_string,unused_string,unused_string
0,Classic Cars,Attention car enthusiasts: Make ...,,
1,Motorcycles,Our motorcycles are state of the...,,
2,Planes,"Unique, diecast airplane and hel...",,
3,Ships,The perfect holiday or anniversa...,,
4,Trains,Model trains are a rewarding hob...,,
5,Trucks and Buses,The Truck and Bus models are rea...,,
6,Vintage Cars,Our Vintage Car models realistic...,,


In [10]:
# TODO: Annotate columns with roles
customers

name,orderNumber,productCode,quantityOrdered,priceEach,orderLineNumber
role,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,10100,S18_1749,30,136,3
1.0,10100,S18_2248,50,55.09,2
2.0,10100,S18_4409,22,75.46,4
3.0,10100,S24_3969,49,35.29,1
4.0,10101,S18_2325,25,108.06,4
,...,...,...,...,...
2991.0,10425,S24_2300,49,127.79,9
2992.0,10425,S24_2840,31,31.82,5
2993.0,10425,S32_1268,41,83.79,11
2994.0,10425,S32_2509,11,50.32,6


The next step is to define the data model. Refer to [https://relational.fel.cvut.cz/dataset/classicmodels](https://relational.fel.cvut.cz/dataset/classicmodels)
for a description of the dataset.

In [11]:
dm = getml.data.DataModel(population=payments.to_placeholder())
dm.add(getml.data.to_placeholder(**peripheral))

# TODO
# dm.population.join(...)

Now we can create the container and add the tables to it.

In [12]:
container = getml.data.Container(population=payments, split=payments.split)
container.add(**peripheral)

container

Unnamed: 0,subset,name,rows,type
0,train,payments,192,View
1,val,payments,81,View

Unnamed: 0,name,rows,type
0,orders,326,DataFrame
1,employees,23,DataFrame
2,customers,122,DataFrame
3,products,110,DataFrame
4,offices,7,DataFrame
5,productlines,7,DataFrame
6,orderdetails,2996,DataFrame
