In [1]:
import getml
from challenge.utils.data import load_ctu_dataset

getml.set_project("db_transformer_mondial")

# Task: Mondial
### Dataset Description
> <span style="font-weight: 500; color: #3b3b3b;">ⓘ️&nbsp; Generated by `gpt-4o`</span>
>
> *Data Model:*
> 
> The *Mondial* dataset consists of 40 tables, providing a comprehensive view of geographical, political, and economic data across various countries. Key tables include `country`, `city`, `economy`, `population`, `politics`, and `religion`, among others.
> 
> - **country**: Contains `Name` (varchar), `Code` (varchar), `Capital` (varchar), `Province` (varchar), `Area` (float), and `Population` (int). This table provides basic information about countries.
> 
> - **city**: Includes `Name` (varchar), `Country` (varchar), `Province` (varchar), `Population` (int), `Longitude` (float), and `Latitude` (float). It details city-level data.
> 
> - **economy**: Contains `Country` (varchar), `GDP` (float), `Agriculture` (float), `Service` (float), `Industry` (float), and `Inflation` (float). This table provides economic indicators.
> 
> - **population**: Includes `Country` (varchar), `Population_Growth` (float), and `Infant_Mortality` (float). It provides demographic data.
> 
> - **politics**: Contains `Country` (varchar), `Independence` (date), `Dependent` (varchar), and `Government` (varchar). This table provides political information.
> 
> - **religion**: Includes `Country` (varchar), `Name` (varchar), and `Percentage` (float). It details religious demographics.
> 
> *Task and Target Column:*
> 
> The primary task is *classification*, with the target column being `Target` in the `target` table. The goal is to classify countries based on various attributes.
> 
> *Column Types:*
> 
> - Varchar: `Country`, `Name`, `Code`, `Capital`, `Province`, `Dependent`, `Government`
> - Float: `GDP`, `Agriculture`, `Service`, `Industry`, `Inflation`, `Population_Growth`, `Infant_Mortality`, `Percentage`, `Area`, `Longitude`, `Latitude`
> - Integer: `Population`
> - Date: `Independence`
> 
> *Metadata:*
> 
> - **Size**: 3.2 MB
> - **Number of Tables**: 40
> - **Number of Rows**: 21,497
> - **Number of Columns**: 167
> - **Missing Values**: Yes
> - **Instance Count**: 204
> - **Target Table**: `target`
> - **Target Column**: `Target`
> 
> This dataset is used in the geography domain to analyze and classify countries based on a wide range of geographical, economic, and political factors.

### Tables
Population table: target

<h4>
  <details open>
     <summary>ER Diagram</summary>
       <img src="https://relational.fel.cvut.cz/assets/img/datasets-generated/Mondial.svg" alt="Mondial ER Diagram">
   </details>
</h4>

To load the dataset, we use the `load_ctu_dataset` function from the `utils`
module. This function returns a tuple with the population table as the first
element and the a dictionary of peripheral tables as the second element.

In [2]:
target, peripheral = load_ctu_dataset("Mondial")

(
    lake,
    island_in,
    geo_river,
    politics,
    river,
    is_member,
    ethnic_group,
    population,
    merges_with,
    continent,
    language,
    desert,
    borders,
    country,
    religion,
    geo_island,
    sea,
    located,
    mountain_on_island,
    geo_sea,
    geo_estuary,
    mountain,
    located_on,
    province,
    economy,
    geo_lake,
    encompasses,
    city,
    geo_desert,
    geo_mountain,
    organization,
    island,
    geo_source,
) = peripheral.values()

Analyzing schema:   0%|          | 0/34 [00:00<?, ?it/s]

Downloading tables:   0%|          | 0/34 [00:00<?, ?it/s]

Building data:   0%|          | 0/34 [00:00<?, ?it/s]

Now, we can inspect all tables and annotate the columns with [roles](https://getml.com/latest/user_guide/concepts/annotating_data/).

The population table (`target`). We already set the `target` role for the target (`Target`). If the task is a multiclass classification,
we split the target column into multiple columns in an one-vs-all fashion. In this case, the original target is still avaiable as `Target`.

In [3]:
# TODO: Annotate remaining columns with roles
target

name,Target,Country,split
role,target,unused_string,unused_string
0.0,0,A,train
1.0,1,AFG,val
2.0,1,AL,train
3.0,0,AMSA,train
4.0,0,AND,train
,...,...,...
199.0,1,XMAS,val
200.0,0,YV,train
201.0,0,Z,val
202.0,0,ZRE,val


Peripheral tables,

In [4]:
# TODO: Annotate columns with roles
lake

name,Mountain,Island
role,unused_string,unused_string
0.0,Andringitra,Madagaskar
1.0,Asahi-Dake,Hokkaido
2.0,Barbeau Peak,Ellesmere Island
3.0,Ben Nevis,Great Britain
4.0,Blue Mountain Peak,Jamaica
,...,...
62.0,Tatamailau,Timor
63.0,Tsaratanana,Madagaskar
64.0,Tsiafajavona,Madagaskar
65.0,Yu Shan,Taiwan


In [5]:
# TODO: Annotate columns with roles
island_in

name,Name,Area
role,unused_string,unused_string
0,Africa,30254700
1,America,39872000
2,Asia,45095300
3,Australia/Oceania,8503470
4,Europe,9562490


In [6]:
# TODO: Annotate columns with roles
geo_river

name,Name,Country,Population,Area,Capital,CapProv
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,Aali an Nil,SUD,1599605,238792,Malakal,Aali an Nil
1.0,Aberconwy and Colwyn,GB,110700,1130,Colwyn Bay,Aberconwy and Colwyn
2.0,Abruzzo,I,1263000,10794,LAquila,Abruzzo
3.0,Abu Dhabi,UAE,670000,67350,,
4.0,Acre,BR,483483,153149,Rio Branco,Acre
,...,...,...,...,...,...
1445.0,Ziguinchor,SN,394700,7339,Ziguinchor,Ziguinchor
1446.0,Zimbabwe,ZW,11271314,,Harare,Zimbabwe
1447.0,Zonguldak,TR,1073560,8629,Zonguldak,Zonguldak
1448.0,Zuid Holland,NL,3325064,2859,s Gravenhage,Zuid Holland


In [7]:
# TODO: Annotate columns with roles
politics

name,Desert,Country,Province
role,unused_string,unused_string,unused_string
0.0,Rub Al Chali,UAE,Abu Dhabi
1.0,Dascht-e-Margoh,AFG,Afghanistan
2.0,Rigestan,AFG,Afghanistan
3.0,Karakum,TM,Ahal
4.0,Syrian Desert,IRQ,Al Anbar
,...,...,...
149.0,TaklaMakan,TJ,Xinjiang Uygur
150.0,Dascht-e-Kavir,IR,Yazd
151.0,Dascht-e-Lut,IR,Yazd
152.0,Rub Al Chali,YE,Yemen


In [8]:
# TODO: Annotate columns with roles
river

name,Name,Code,Capital,Province,Area,Population
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,Austria,A,Vienna,Vienna,83850,8023244
1.0,Afghanistan,AFG,Kabul,Afghanistan,647500,22664136
2.0,Antigua and Barbuda,AG,Saint Johns,Antigua and Barbuda,442,65647
3.0,Albania,AL,Tirane,Albania,28750,3249136
4.0,American Samoa,AMSA,Pago Pago,American Samoa,199,65628
,...,...,...,...,...,...
233.0,Yemen,YE,Sanaa,Yemen,527970,13483178
234.0,Venezuela,YV,Caracas,Distrito Federal,912050,21983188
235.0,Zambia,Z,Lusaka,Lusaka,752610,9159072
236.0,Zaire,ZRE,Kinshasa,Kinshasa,2345410,46498539


In [9]:
# TODO: Annotate columns with roles
is_member

name,River,Country,Province
role,unused_string,unused_string,unused_string
0.0,Bahr el-Djebel/Albert-Nil,SUD,Aali an Nil
1.0,Bahr el-Ghasal,SUD,Aali an Nil
2.0,Pibor,SUD,Aali an Nil
3.0,Sobat,SUD,Aali an Nil
4.0,White Nile,SUD,Aali an Nil
,...,...,...
846.0,Oder,PL,Zielonogorskie
847.0,Limpopo,ZW,Zimbabwe
848.0,Zambezi,ZW,Zimbabwe
849.0,Maas,NL,Zuid Holland


In [10]:
# TODO: Annotate columns with roles
ethnic_group

name,Name,Depth
role,unused_string,unused_string
0.0,Andaman Sea,3113
1.0,Arabian Sea,5203
2.0,Arctic Ocean,5608
3.0,Atlantic Ocean,9219
4.0,Baltic Sea,459
,...,...
30.0,South China Sea,5420
31.0,Sulawesi Sea,6218
32.0,Sunda Sea,7440
33.0,The Channel,175


In [11]:
# TODO: Annotate columns with roles
population

name,Island,Sea,Lake,River
role,unused_string,unused_string,unused_string,unused_string
0.0,Svalbard,Norwegian Sea,,
1.0,Svalbard,Barents Sea,,
2.0,Svalbard,Arctic Ocean,,
3.0,Greenland,Atlantic Ocean,,
4.0,Greenland,Norwegian Sea,,
,...,...,...,...
344.0,Olkhon,,Ozero Baikal,
345.0,Samosir,,Lake Toba,
346.0,Rene Levasseur Island,,Lake Manicouagan,
347.0,Manitoulin,,Lake Huron,


In [12]:
# TODO: Annotate columns with roles
merges_with

name,City,Province,Country,Island
role,unused_string,unused_string,unused_string,unused_string
0.0,Aberdeen,Grampian,GB,Great Britain
1.0,Aberystwyth,Ceredigion,GB,Great Britain
2.0,Adamstown,Pitcairn Islands,PITC,Pitcairn
3.0,Agana,Guam,GUAM,Guam
4.0,Ajaccio,Corse,F,Corse
,...,...,...,...
429.0,York,North Yorkshire,GB,Great Britain
430.0,Ystrad Fawr,Caerphilly,GB,Great Britain
431.0,Yungho,Taiwan,RC,Taiwan
432.0,Yuzhno Sakhalinsk,Sakhalinskaya oblast,R,Sachalin


In [13]:
# TODO: Annotate columns with roles
continent

name,Abbreviation,Name,City,Country,Province,Established
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,ABEDA,Arab Bank for Economic Developme...,Khartoum,SUD,al Khartum,1974-02-18
1.0,ACC,Arab Cooperation Council,,,,1989-02-16
2.0,ACCT,Agency for Cultural and Technica...,Paris,F,Ile de France,1970-03-21
3.0,ACP,"African, Caribbean, and Pacific ...",Brussels,B,Brabant,1976-04-01
4.0,AfDB,African Development Bank,Abidjan,CI,Cote dIvoire,1963-08-04
,...,...,...,...,...,...
148.0,WIPO,World Intellectual Property Orga...,Geneva,CH,GE,1967-07-14
149.0,WMO,World Meteorological Organizatio...,Geneva,CH,GE,1947-10-11
150.0,WToO,World Tourism Organization,Madrid,E,Madrid,1975-01-02
151.0,WTrO,World Trade Organization,,,,1994-04-15


In [14]:
# TODO: Annotate columns with roles
language

name,Island,Country,Province
role,unused_string,unused_string,unused_string
0.0,Great Britain,GB,Aberconwy and Colwyn
1.0,Honshu,J,Aichi
2.0,Honshu,J,Akita
3.0,Aland,SF,Aland
4.0,Tutuila,AMSA,American Samoa
,...,...,...
413.0,Great Britain,GB,Wrexham
414.0,Honshu,J,Yamagata
415.0,Honshu,J,Yamaguchi
416.0,Honshu,J,Yamanashi


In [15]:
# TODO: Annotate columns with roles
desert

name,Name,Islands,Area,Height,Type,Longitude,Latitude
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,Aland,Aland Islands,650,,,20,60.1
1.0,Alicudi,Lipari Islands,5.2,675,volcanic,14.4,38.6
2.0,Ambon,Moluccan Islands,775,1225,,128.2,-3.7
3.0,Ameland,Westfriesische Inseln,57.6,,,5.75,53.5
4.0,Amrum,Nordfriesische Inseln,20.5,32,,8.3,54.65
,...,...,...,...,...,...,...
271.0,Vulcano,Lipari Islands,21.2,499,volcanic,15,38.4
272.0,Wangerooge,Ostfriesische Inseln,7.9,17,,7.9,53.8
273.0,West Falkland,Falkland Islands,4532,700,,-60.1,-51.8
274.0,Westray,Orkney Islands,47,,,-3,59.4


In [16]:
# TODO: Annotate columns with roles
borders

name,Name,Country,Province,Population,Longitude,Latitude
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,Aachen,D,Nordrhein Westfalen,247113,,
1.0,Aalborg,DK,Denmark,113865,10,57
2.0,Aarau,CH,AG,,,
3.0,Aarhus,DK,Denmark,194345,10.1,56.1
4.0,Aarri,WAN,Nigeria,111000,,
,...,...,...,...,...,...
3106.0,Zug,CH,ZG,,,
3107.0,Zunyi,TJ,Guizhou,261862,,
3108.0,Zurich,CH,ZH,343106,,
3109.0,Zwickau,D,Sachsen,104921,,


In [17]:
# TODO: Annotate columns with roles
country

name,Country,Organization,Type
role,unused_string,unused_string,unused_string
0.0,A,AfDB,nonregional member
1.0,A,AG,observer
2.0,A,ANC,member
3.0,A,AsDB,nonregional member
4.0,A,BIS,member
,...,...,...
8003.0,ZW,WHO,member
8004.0,ZW,WIPO,member
8005.0,ZW,WMO,member
8006.0,ZW,WToO,member


In [18]:
# TODO: Annotate columns with roles
religion

name,Country,Population_Growth,Infant_Mortality
role,unused_string,unused_string,unused_string
0.0,A,0.41,6.2
1.0,AFG,4.78,149.7
2.0,AG,0.76,17.2
3.0,AL,1.34,49.2
4.0,AMSA,1.22,10.18
,...,...,...
233.0,YE,3.56,71.5
234.0,YV,1.89,29.5
235.0,Z,2.11,96.1
236.0,ZRE,1.67,108


In [19]:
# TODO: Annotate columns with roles
geo_island

name,Name,Mountains,Height,Type,Longitude,Latitude
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,Aconcagua,Andes,6962,,-70,-32.65
1.0,Alpamayo,Cordillera Blanca,5947,,-77.7,-8.9
2.0,Ampato,Andes,6288,volcano,-71.9,-15.8
3.0,Andringitra,,2658,volcanic,47,-22.5
4.0,Annapurna,Himalaya,8091,,83.8,28.6
,...,...,...,...,...,...
236.0,Wheeler Peak,Rocky Mountains,4011,,-105.4,36.55
237.0,Yu Shan,,3950,,121,23.5
238.0,Zard Kuh,Zagros,4550,,50.1,32.35
239.0,Zhima,,1276,,107.4,53.15


In [20]:
# TODO: Annotate columns with roles
sea

name,Mountain,Country,Province
role,unused_string,unused_string,unused_string
0.0,Gran Sasso,I,Abruzzo
1.0,Tirich Mir,AFG,Afghanistan
2.0,Ararat,TR,Agri
3.0,Mt Blackburn,USA,Alaska
4.0,Mt Bona,USA,Alaska
,...,...,...
290.0,Pik Pobeda,TJ,Xinjiang Uygur
291.0,Fujisan,J,Yamanashi
292.0,Jabal Shuayb,YE,Yemen
293.0,Mt Logan,CDN,Yukon Territory


In [21]:
# TODO: Annotate columns with roles
located

name,Country,Continent,Percentage
role,unused_string,unused_string,unused_string
0.0,A,Europe,100
1.0,AFG,Asia,100
2.0,AG,America,100
3.0,AL,Europe,100
4.0,AMSA,Australia/Oceania,100
,...,...,...
237.0,YE,Asia,100
238.0,YV,America,100
239.0,Z,Africa,100
240.0,ZRE,Africa,100


In [22]:
# TODO: Annotate columns with roles
mountain_on_island

name,Name,Area,Depth,Altitude,Type,River,Longitude,Latitude
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,Ammersee,46.6,81.1,533,,Ammer,11.6,48
1.0,Arresoe,40.2,5.6,,,,12.1,56
2.0,Atlin Lake,798,283,668,,Yukon River,-133.75,59.5
3.0,Balaton,594,12.5,104,,,17.6,46.8
4.0,Barrage de Mbakaou,,,,artificial,Sanaga,12.75,6.4
,...,...,...,...,...,...,...,...
125.0,Thunersee,48.3,217,558,,Aare,7.716,46.69
126.0,Vaenern,5648,106,44,,Goetaaelv,13.3,58.8
127.0,Vaettern,1900,119,88,,,14.5,58.3
128.0,Vierwaldstattersee,113.7,214,434,,Reuss,8.4,47


In [23]:
# TODO: Annotate columns with roles
geo_sea

name,Name,River,Lake,Sea,Length,SourceLongitude,SourceLatitude,Mountains,SourceAltitude,EstuaryLongitude,EstuaryLatitude
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,Aare,Rhein,Brienzersee,,288,8.2,46.55,Alps,2310,8.22,47.61
1.0,Adda,Po,Lago di Como,,313,10.3,46.55,Alps,2235,9.88,45.13
2.0,Akagera,,Lake Victoria,,275,29.3,-2.5,East African Rift,2700,33,-1
3.0,Allegheny River,Ohio River,,,523,-77.9,41.9,Appalachian Mountains,759,-80,40.42
4.0,Aller,Weser,,,211,11.23,52.1,,130,9.18,52.94
,...,...,...,...,...,...,...,...,...,...,...
213.0,White Nile,Nile,,,,30.43,9.5,,,32.5,15.6
214.0,Wurm,Ammer,Starnberger See,,35,11.35,48,,596,11.5,48.3
215.0,Yukon River,,,Bering Sea,3185,-134.5,60.54,,668,-163.98,62.574
216.0,Zaire,,,Atlantic Ocean,4374,25.2,0.5,,,12,-6


In [24]:
# TODO: Annotate columns with roles
geo_estuary

name,Sea1,Sea2
role,unused_string,unused_string
0.0,Andaman Sea,Gulf of Bengal
1.0,Andaman Sea,Indian Ocean
2.0,Andaman Sea,Malakka Strait
3.0,Arabian Sea,Gulf of Aden
4.0,Arabian Sea,Gulf of Oman
,...,...
49.0,Sea of Japan,Sea of Okhotsk
50.0,Sea of Japan,Yellow Sea
51.0,South China Sea,Sulawesi Sea
52.0,South China Sea,Sunda Sea


In [25]:
# TODO: Annotate columns with roles
mountain

name,Sea,Country,Province
role,unused_string,unused_string,unused_string
0.0,Mediterranean Sea,I,Abruzzo
1.0,Persian Gulf,UAE,Abu Dhabi
2.0,Mediterranean Sea,TR,Adana
3.0,Pacific Ocean,J,Aichi
4.0,Persian Gulf,UAE,Ajman
,...,...,...
730.0,East China Sea,TJ,Zhejiang
731.0,Atlantic Ocean,SN,Ziguinchor
732.0,Black Sea,TR,Zonguldak
733.0,North Sea,NL,Zuid Holland


In [26]:
# TODO: Annotate columns with roles
located_on

name,River,Country,Province
role,unused_string,unused_string,unused_string
0.0,White Nile,SUD,Aali an Nil
1.0,Amudarja,AFG,Afghanistan
2.0,Pjandsh,AFG,Afghanistan
3.0,Murat,TR,Agri
4.0,Schatt al Arab,IRQ,Al Anbar
,...,...,...
214.0,Rhone,CH,VS
215.0,Elbe,CZ,Vychodocesky
216.0,Thames,GB,Wiltshire
217.0,Yukon River,CDN,Yukon Territory


In [27]:
# TODO: Annotate columns with roles
province

name,Country,GDP,Agriculture,Service,Industry,Inflation
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,A,152000,2,34,64,2.3
1.0,AFG,12800,65,15,20,
2.0,AG,425,3.5,19.3,77.2,3.5
3.0,AL,4100,55,,,16
4.0,AMSA,462.2,,,,
,...,...,...,...,...,...
233.0,YE,37100,21,24,55,71.3
234.0,YV,195500,5,41,54,57
235.0,Z,8900,32,22,46,55
236.0,ZRE,16500,,,,12


In [28]:
# TODO: Annotate columns with roles
economy

name,Lake,Country,Province
role,unused_string,unused_string,unused_string
0.0,Barrage de Mbakaou,CAM,Adamaoua
1.0,Lake Nicaragua,CR,Alajuela
2.0,Lake Ohrid,AL,Albania
3.0,Lake Prespa,AL,Albania
4.0,Lake Skutari,AL,Albania
,...,...,...
248.0,Zurichsee,CH,ZH
249.0,Ozero Balchash,KAZ,Zhambyl
250.0,Ozero Balchash,KAZ,Zhezkazghan
251.0,Lake Kariba,ZW,Zimbabwe


In [29]:
# TODO: Annotate columns with roles
geo_lake

name,City,Province,Country,River,Lake,Sea
role,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,Shkoder,Albania,AL,,Lake Skutari,
1.0,Durres,Albania,AL,,,Mediterranean Sea
2.0,Vlore,Albania,AL,,,Mediterranean Sea
3.0,Kavalla,Anatoliki Makedhonia kai Thraki,GR,,,Mediterranean Sea
4.0,Athens,Attiki,GR,,,Mediterranean Sea
,...,...,...,...,...,...
852.0,Mamoutzou,Mayotte,MAYO,,,Indian Ocean
853.0,Saint-Denis,Reunion,REUN,,,Indian Ocean
854.0,Jamestown,Saint Helena,HELX,,,Atlantic Ocean
855.0,Sao Tome,Sao Tome and Principe,STP,,,Atlantic Ocean


In [30]:
# TODO: Annotate columns with roles
encompasses

name,Country,Name,Percentage
role,unused_string,unused_string,unused_string
0.0,GE,Abkhaz,1.8
1.0,EAU,Acholi,4
2.0,DJI,Afar,35
3.0,ER,Afar,4
4.0,ETH,Afar,4
,...,...,...
535.0,BVIR,White,7
536.0,CAYM,White,20
537.0,HELX,White,25
538.0,PR,White,80.2


In [31]:
# TODO: Annotate columns with roles
city

name,Country,Name,Percentage
role,unused_string,unused_string,unused_string
0.0,AFG,Afghan Persian,50
1.0,NAM,Afrikaans,60
2.0,MK,Albanian,21
3.0,MNE,Albanian,5.3
4.0,IR,Arabic,1
,...,...,...
139.0,TM,Turkmen,72
140.0,PK,Urdu,8
141.0,TM,Uzbek,9
142.0,UZB,Uzbek,74.3


In [32]:
# TODO: Annotate columns with roles
geo_desert

name,Country1,Country2,Length
role,unused_string,unused_string,unused_string
0.0,A,CH,164
1.0,A,CZ,362
2.0,A,D,784
3.0,A,FL,37
4.0,A,H,366
,...,...,...
315.0,TCH,WAN,87
316.0,TJ,VN,1281
317.0,TM,UZB,1621
318.0,Z,ZRE,1930


In [33]:
# TODO: Annotate columns with roles
geo_mountain

name,Name,Area,Longitude,Latitude
role,unused_string,unused_string,unused_string,unused_string
0.0,Arabian Desert,50000,26,33
1.0,Atacama,181300,-69.25,-24.5
2.0,Azaouad,80000,0,20
3.0,Baja California Desert,30000,-116,31
4.0,Chihuahua,360000,-105,31
,...,...,...,...
58.0,Tanezrouft,160000,0,23
59.0,Tenere,600000,11,18
60.0,Thar,240000,72,27.5
61.0,Trarza,50000,-15,18


In [34]:
# TODO: Annotate columns with roles
organization

name,Country,Independence,Dependent,Government
role,unused_string,unused_string,unused_string,unused_string
0.0,A,1918-11-12,,federal republic
1.0,AFG,1919-08-19,,transitional government
2.0,AG,1981-11-01,,parliamentary democracy
3.0,AL,1912-11-28,,emerging democracy
4.0,AMSA,,USA,unincorporated and unorganized t...
,...,...,...,...
233.0,YE,1990-05-22,,republic
234.0,YV,1811-07-05,,republic
235.0,Z,1964-10-24,,republic
236.0,ZRE,1960-06-30,,republic with a strong president...


In [35]:
# TODO: Annotate columns with roles
island

name,Country,Name,Percentage
role,unused_string,unused_string,unused_string
0.0,BERM,African Methodist Episcopal,11
1.0,AUS,Anglican,26.1
2.0,AXA,Anglican,29
3.0,BERM,Anglican,23
4.0,BS,Anglican,20
,...,...,...
449.0,RC,Taoist,93
450.0,SLB,United,11
451.0,CAYM,United Church,11.8
452.0,CDN,United Church,12


In [36]:
# TODO: Annotate columns with roles
geo_source

name,River,Country,Province
role,unused_string,unused_string,unused_string
0.0,Bahr el-Djebel/Albert-Nil,SUD,Aali an Nil
1.0,Bahr el-Ghasal,SUD,Aali an Nil
2.0,Sobat,SUD,Aali an Nil
3.0,Pjandsh,AFG,Afghanistan
4.0,Aare,CH,AG
,...,...,...
260.0,Mur,H,Zala
261.0,Chire,MOC,Zambezia
262.0,Zambezi,MOC,Zambezia
263.0,Maas,NL,Zuid Holland


The next step is to define the data model. Refer to [https://relational.fel.cvut.cz/dataset/Mondial](https://relational.fel.cvut.cz/dataset/Mondial)
for a description of the dataset.

In [37]:
dm = getml.data.DataModel(population=target.to_placeholder())
dm.add(getml.data.to_placeholder(**peripheral))

# TODO
# dm.population.join(...)

Now we can create the container and add the tables to it.

In [38]:
container = getml.data.Container(population=target, split=target.split)
container.add(**peripheral)

container

Unnamed: 0,subset,name,rows,type
0,train,target,143,View
1,val,target,61,View

Unnamed: 0,name,rows,type
0.0,mountain_on_island,67,DataFrame
1.0,continent,5,DataFrame
2.0,province,1450,DataFrame
3.0,geo_desert,154,DataFrame
4.0,country,238,DataFrame
,...,...,...
28.0,borders,320,DataFrame
29.0,desert,63,DataFrame
30.0,politics,238,DataFrame
31.0,religion,454,DataFrame
