In [1]:
import getml
from challenge.utils.data import load_ctu_dataset

getml.set_project("db_transformer_musk_small")

# Task: MuskSmall
### Dataset Description
> <span style="font-weight: 500; color: #3b3b3b;">ⓘ️&nbsp; Generated by `gpt-4o`</span>
>
> It seems that the *MuskSmall* dataset is not available in the repository. However, I can provide a description based on the data model you provided.
> 
> *Data Model:*
> 
> The *MuskSmall* dataset consists of two tables: `conformation` and `molecule`. These tables provide information about molecular conformations and their classifications.
> 
> - **conformation**: Contains `conformation_name` (varchar), `molecule_name` (varchar), and features `f1` to `f166` (int). This table details the features of different molecular conformations.
> 
> - **molecule**: Includes `molecule_name` (varchar) and `class` (int). This table classifies molecules as either musk or non-musk.
> 
> *Task and Target Column:*
> 
> The primary task is *classification*, with the target column being `class` in the `molecule` table. The goal is to classify molecules based on their conformations.
> 
> *Column Types:*
> 
> - Varchar: `conformation_name`, `molecule_name`
> - Integer: `f1` to `f166`, `class`
> 
> *Metadata:*
> 
> - **Number of Tables**: 2
> - **Target Table**: `molecule`
> - **Target Column**: `class`
> 
> This dataset is typically used in cheminformatics to analyze and classify molecular structures based on their conformational features.

### Tables
Population table: molecule

<h4>
  <details open>
     <summary>ER Diagram</summary>
       <img src="https://relational.fel.cvut.cz/assets/img/datasets-generated/MuskSmall.svg" alt="MuskSmall ER Diagram">
   </details>
</h4>

To load the dataset, we use the `load_ctu_dataset` function from the `utils`
module. This function returns a tuple with the population table as the first
element and the a dictionary of peripheral tables as the second element.

In [2]:
molecule, peripheral = load_ctu_dataset("MuskSmall")

(
    conformation,
) = peripheral.values()

Analyzing schema:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading tables:   0%|          | 0/2 [00:00<?, ?it/s]

Building data:   0%|          | 0/2 [00:00<?, ?it/s]

Now, we can inspect all tables and annotate the columns with [roles](https://getml.com/latest/user_guide/concepts/annotating_data/).

The population table (`molecule`). We already set the `target` role for the target (`class`). If the task is a multiclass classification,
we split the target column into multiple columns in an one-vs-all fashion. In this case, the original target is still avaiable as `class`.

In [3]:
# TODO: Annotate remaining columns with roles
molecule

name,class,molecule_name,split
role,target,unused_string,unused_string
0.0,0,MUSK-188,val
1.0,0,MUSK-190,train
2.0,0,MUSK-211,val
3.0,0,MUSK-212,train
4.0,0,MUSK-213,train
,...,...,...
87.0,1,NON-MUSK-j93,train
88.0,1,NON-MUSK-j96,train
89.0,1,NON-MUSK-j97,val
90.0,1,NON-MUSK-jp10,train


Peripheral tables,

In [4]:
# TODO: Annotate columns with roles
conformation

name,conformation_name,molecule_name,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41,f42,f43,f44,f45,f46,f47,f48,f49,f50,f51,f52,f53,f54,f55,f56,f57,f58,f59,f60,f61,f62,f63,f64,f65,f66,f67,f68,f69,f70,f71,f72,f73,f74,f75,f76,f77,f78,f79,f80,f81,f82,f83,f84,f85,f86,f87,f88,f89,f90,f91,f92,f93,f94,f95,f96,f97,f98,f99,f100,f101,f102,f103,f104,f105,f106,f107,f108,f109,f110,f111,f112,f113,f114,f115,f116,f117,f118,f119,f120,f121,f122,f123,f124,f125,f126,f127,f128,f129,f130,f131,f132,f133,f134,f135,f136,f137,f138,f139,f140,f141,f142,f143,f144,f145,f146,f147,f148,f149,f150,f151,f152,f153,f154,f155,f156,f157,f158,f159,f160,f161,f162,f163,f164,f165,f166
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,188_1+1,MUSK-188,42,-198,-109,-75,-117,11,23,-88,-28,-27,-232,-212,-66,-286,-287,-300,-57,-75,-192,-184,-66,-18,-50,111,110,18,-18,-127,25,63,-117,-114,-47,9,-135,26,-175,73,-143,71,-177,-85,-30,-282,-280,-249,-135,-11,-139,-105,-142,-32,-9,-48,147,1,40,-170,35,33,-101,-195,26,-5,-144,48,-165,18,-133,15,-146,-148,-146,-246,-216,-181,-37,-212,-216,-174,-20,8,-120,-38,-7,11,-156,-39,-7,82,-202,-15,-115,-46,26,-49,-166,32,-141,76,-206,26,-257,-289,-304,-163,-117,-17,-247,-283,-244,-64,-35,-32,-10,57,110,25,6,-117,80,149,130,-110,-134,-14,35,51,11,-187,13,-138,-67,-163,-201,-19,45,-115,-11,-37,-100,77,78,60,-178,-102,-118,-33,-104,41,-77,-120,-111,-168,-54,-195,-238,-74,-129,-120,-38,30,48,-37,6,30
1.0,188_1+2,MUSK-188,42,-191,-142,-65,-117,55,49,-170,-45,5,-325,-115,-107,-281,-257,-303,54,-154,-101,-47,-31,-28,1,191,72,-38,50,-64,-63,98,-117,-113,-46,2,-135,25,-159,1,-143,38,-169,-85,-31,-323,-234,-334,-88,-73,-109,-4,-75,-31,-14,-137,105,-94,90,-132,3,7,-101,-195,26,109,-130,48,-165,-80,-133,-50,-153,-148,-297,-194,-96,-181,-5,-289,-107,-179,-21,4,-34,115,15,15,-74,-164,-73,131,-202,-15,-115,-308,26,-50,-33,25,-154,75,-191,26,-227,-309,-284,-266,-163,-122,-185,-234,-212,0,-3,-3,22,-12,156,36,82,31,82,70,111,-110,-133,-13,-26,75,-107,-187,13,-138,-77,-129,-224,-89,51,-70,-19,-35,-29,3,43,10,-178,-102,-119,-57,-70,53,-77,-123,-111,-168,-54,-195,-238,-302,60,-120,-39,31,48,-37,5,30
2.0,188_1+3,MUSK-188,42,-191,-142,-75,-117,11,49,-161,-45,-28,-278,-115,-67,-274,-285,-303,53,-154,-100,-183,-31,-28,1,110,110,-38,51,-64,25,63,-117,-113,-47,10,-135,26,-175,2,-143,38,-168,-85,-31,-293,-246,-326,-89,-73,-108,-105,-75,-31,-14,-117,148,-93,90,-132,35,33,-101,-195,26,-5,-144,49,-165,-80,-133,-50,-153,-148,-148,-194,-217,-181,-5,-278,-107,-163,-21,4,-34,-37,-8,15,-74,-164,-7,81,-202,-15,-115,-46,26,-49,-166,25,-154,76,-191,26,-254,-280,-291,-266,-164,-122,-247,-250,-233,0,-4,-3,-9,57,110,37,82,31,79,148,130,-110,-133,-13,35,51,11,-187,13,-138,-77,-129,-221,-89,52,-71,-19,-35,-29,3,43,10,-178,-102,-118,-57,-70,54,-77,-120,-111,-168,-54,-195,-238,-73,-127,-120,-38,30,48,-37,5,31
3.0,188_1+4,MUSK-188,42,-198,-110,-65,-117,55,23,-95,-28,5,-301,-212,-107,-280,-284,-301,-56,-74,-192,-46,-66,-19,-50,191,73,18,-19,-128,-63,98,-117,-113,-46,3,-135,25,-159,73,-143,70,-177,-86,-31,-286,-280,-330,-135,-11,-138,-4,-143,-32,-9,-136,105,1,40,-170,2,6,-101,-195,26,109,-130,48,-165,18,-133,15,-146,-148,-263,-246,-96,-181,-37,-230,-216,-162,-20,8,-120,116,14,10,-156,-40,-72,131,-202,-15,-115,-308,26,-50,-34,32,-141,75,-206,26,-227,-282,-300,-183,-116,-16,-206,-283,-244,-65,-35,-32,22,-11,157,24,6,-117,83,69,111,-110,-133,-13,-27,76,-108,-187,13,-138,-67,-163,-201,-19,45,-115,-10,-37,-100,76,78,59,-178,-102,-118,-33,-104,41,-77,-123,-111,-168,-54,-195,-238,-302,60,-120,-39,30,48,-37,6,30
4.0,190_1+1,MUSK-190,42,-198,-102,-75,-117,10,24,-87,-28,-28,-233,-212,-67,-286,-286,-299,-57,-74,-191,-182,-66,-18,-50,109,111,18,-18,-128,25,63,-117,55,-28,10,-131,66,-175,73,-128,71,-177,102,-31,-283,-280,-249,-135,-11,-137,-105,-143,-32,-9,-48,148,1,41,-170,34,33,-101,-156,24,-5,-144,58,-165,17,-123,15,-140,-113,-148,-246,-217,-181,-37,-212,-215,-173,-20,8,-120,-38,-8,10,-156,-40,-7,81,-202,70,-114,-46,28,-34,-167,32,-126,76,-206,134,-257,-289,-304,-162,-116,-16,-247,-282,-243,-64,-35,-32,-9,57,110,24,6,-117,80,148,130,-107,-108,-9,36,51,12,-187,17,-138,-67,-163,-201,-20,45,-116,-10,-37,-99,76,79,60,-177,-102,-118,-33,-104,41,-66,-120,-111,-120,97,-121,-238,-73,-127,51,128,144,43,-30,14,26
,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
471.0,jp13_1+4,NON-MUSK-jp13,49,-199,-161,29,-95,-86,-48,2,112,-79,-47,-43,-30,-134,-89,-244,96,-57,5,-58,-78,68,15,-58,-98,-64,-72,-157,10,-139,-117,-60,-8,-212,-36,91,-169,-4,-117,-50,-186,60,-41,-79,-100,-116,11,-45,-25,-87,-90,112,11,83,-51,-39,-29,-177,-18,-139,-16,-192,33,-190,-14,55,-155,-117,57,-115,-170,-141,-33,-104,-42,-129,148,-93,-11,56,-25,-11,-53,-21,-163,-72,-171,-166,28,-148,-202,31,3,-98,-3,116,-82,167,-155,-20,-208,100,-44,-54,-113,-87,111,-16,-67,-97,-65,-19,29,34,-173,-57,-94,-91,-62,-140,-117,-67,-102,7,-125,46,-102,-173,-83,-50,-56,-108,-80,-89,-98,-89,119,-38,2,-59,-41,-56,9,112,-177,-102,-122,-126,-152,-49,-25,0,6,-158,31,-188,-220,-246,-209,33,152,134,47,-43,-15,-10
472.0,jp13_2+1,NON-MUSK-jp13,38,-123,-139,30,-117,-88,214,-13,-74,-129,-40,-26,-59,-109,-106,-271,-112,-91,-18,-19,-52,104,-6,-60,-98,63,53,102,9,-144,-116,110,-43,-217,-36,73,-170,-42,-143,83,-124,107,-84,-64,-87,-87,-153,-64,-31,-98,-71,4,92,50,-50,-42,210,-65,-18,-168,-13,54,-12,-191,-15,61,-165,-114,-133,-53,-141,-123,-47,-70,-80,-183,-129,-48,-12,61,-2,174,-50,-17,-180,174,-50,97,28,-147,-160,97,4,-108,17,-41,-85,-34,-155,206,-120,135,-69,-61,-109,-89,-182,-72,-72,-97,-62,-3,29,47,-145,-22,-95,123,194,36,-116,-68,-102,-67,55,-6,-146,-180,-112,-51,35,-110,-95,-89,-111,-136,-51,-150,-22,-60,-64,31,185,110,-147,-97,-71,-33,-36,85,-25,0,7,-49,124,25,-236,-226,-210,20,55,119,79,-28,4,74
473.0,jp13_2+2,NON-MUSK-jp13,43,-102,-20,-101,-116,200,-166,66,-222,-49,-120,-80,-48,-82,-98,-99,-216,-22,-59,-2,-112,-183,-201,-59,49,-206,-163,-160,72,118,-117,-104,-18,37,-156,144,-93,-153,-28,-126,-49,-24,-52,-100,-100,-151,-151,-135,13,-35,-95,-200,-198,11,47,-100,-164,-153,159,29,-98,-129,27,197,-81,74,-46,-40,28,-35,-32,-34,-73,-67,-46,-145,-241,-42,-88,8,-176,-172,-127,-61,23,-195,-80,-143,-67,230,-194,18,-105,-84,-29,194,117,-184,-26,-152,-71,76,-93,-122,-114,-51,-169,-164,-4,-119,-62,-165,-195,-198,53,-2,28,-198,-187,-188,91,167,226,48,-135,55,-2,98,-55,-187,-87,-104,-68,-57,-34,-29,-175,-50,-206,-97,-130,-137,-155,-169,-178,-102,-122,-32,-51,-135,-99,-106,-111,-57,-66,-85,114,32,136,-15,143,121,55,-37,-19,-36
474.0,jp13_2+3,NON-MUSK-jp13,39,-58,27,31,-117,-92,85,21,-73,-68,-47,-27,-23,-112,-98,-291,-90,-63,-64,-65,-46,-17,-2,-67,-100,-33,127,147,9,-132,-116,113,-43,-198,-38,73,-171,114,-142,185,-42,105,-32,-89,-87,-113,-153,-98,-44,-79,-60,-35,2,56,-50,-42,114,17,-20,-125,-9,60,-12,-188,-17,60,-165,133,-133,159,-12,-48,-29,-47,-41,-181,-117,-97,-35,55,-4,29,-58,-11,-155,50,31,-10,29,-144,-78,98,5,-86,16,-49,-91,-55,-71,132,-73,134,-44,-58,-114,-104,-182,-136,-69,-105,-53,-1,30,18,-176,-94,-100,84,83,116,-116,-68,-102,-62,63,-2,-85,-161,-69,-54,35,-113,-88,-45,-99,56,-26,-160,-40,-44,-70,164,91,11,-127,-97,-87,134,81,174,-26,-1,7,-27,125,29,-228,-232,-206,13,45,116,79,-28,3,74


The next step is to define the data model. Refer to [https://relational.fel.cvut.cz/dataset/MuskSmall](https://relational.fel.cvut.cz/dataset/MuskSmall)
for a description of the dataset.

In [5]:
dm = getml.data.DataModel(population=molecule.to_placeholder())
dm.add(getml.data.to_placeholder(**peripheral))

# TODO
# dm.population.join(...)

Now we can create the container and add the tables to it.

In [6]:
container = getml.data.Container(population=molecule, split=molecule.split)
container.add(**peripheral)

container

Unnamed: 0,subset,name,rows,type
0,train,molecule,65,View
1,val,molecule,27,View

Unnamed: 0,name,rows,type
0,conformation,476,DataFrame
