In [1]:
import getml
from challenge.utils.data import load_ctu_dataset

getml.set_project("fnhk")

# Task: FNHK
### Dataset Description
> <span style="font-weight: 500; color: #3b3b3b;">ⓘ️&nbsp; Generated by `gpt-4o`</span>
>
> The *FNHK* dataset contains anonymized medical data from a hospital in Hradec Kralove, Czech Republic, focusing on treatment and medication. The task is a *regression* task, with the target column being `Delka_hospitalizace` in the `pripady` table, which represents the length of hospitalization.
> 
> **Data Model:**
> - **Tables:** 3 (vykony, zup, pripady)
> - **Columns:**
>   - **vykony:** Details of medical procedures.
>   - **zup:** Information on items and their costs.
>   - **pripady:** Patient case details, including the target column `Delka_hospitalizace`.
> 
> **Task and Target:**
> - **Task:** Regression
> - **Target Column:** `Delka_hospitalizace` (in the pripady table)
> 
> **Metadata:**
> - **Size:** 130.8 MB
> - **Number of Rows:** 2,108,356
> - **Number of Columns:** 24
> - **Missing Values:** Yes
> - **Compound Keys:** No
> - **Loops:** No
> - **Type:** Real
> - **Instance Count:** 41,392
> 
> This dataset is used for analyzing hospital treatment data, providing insights into patient care and resource management in the medical field.

### Tables
Population table: pripady

<h4>
  <details open>
     <summary>ER Diagram</summary>
       <img src="https://relational.fel.cvut.cz/assets/img/datasets-generated/FNHK.svg" alt="FNHK ER Diagram">
   </details>
</h4>

To load the dataset, we use the `load_ctu_dataset` function from the `utils`
module. This function returns a tuple with the population table as the first
element and the a dictionary of peripheral tables as the second element.

In [2]:
pripady, peripheral = load_ctu_dataset("FNHK")

(
    vykony,
    zup,
) = peripheral.values()

Analyzing schema:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading tables:   0%|          | 0/3 [00:00<?, ?it/s]

Building data:   0%|          | 0/3 [00:00<?, ?it/s]

Now, we can inspect all tables and annotate the columns with [roles](https://getml.com/latest/user_guide/concepts/annotating_data/).

The population table (`pripady`).

We already set the `target` role for the target (`Delka_hospitalizace`).


Delka_hospitalizace is the target column for a regression task.

In [3]:
# TODO: Annotate remaining columns with roles
pripady

name,Delka_hospitalizace,Identifikace_pripadu,Identifikator_pacienta,Kod_zdravotni_pojistovny,DRG_skupina,Datum_prijeti,Datum_propusteni,Vekovy_Interval_Pacienta,Pohlavi_pacienta,Zakladni_diagnoza,Seznam_vedlejsich_diagnoz,PSC,split
role,target,unused_float,unused_float,unused_float,unused_float,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string,unused_string
0.0,61,1829493,104337,207,8111,2014-11-01,2014-12-31,50-60,M,S8230,,50352,train
1.0,41,1840525,324424,111,88873,2013-11-22,2014-01-01,60-70,F,C240,K720 C767 I10 E46 E118,50346,val
2.0,3,1840526,30854,111,11342,2013-12-30,2014-01-01,80+,F,N23,N131 N201,50347,val
3.0,4,1840527,1343226,111,15751,2013-12-29,2014-01-01,0-10,M,Z380,,28126,train
4.0,4,1840528,1343217,205,15751,2013-12-29,2014-01-01,0-10,M,Z380,,50332,train
,...,...,...,...,...,...,...,...,...,...,...,...,...
41514.0,2,1882040,260206,111,1443,2014-12-30,2014-12-31,70-80,M,S0650,I958 R001 R402 G936,50002,train
41515.0,5,1882041,1372657,111,15751,2014-12-27,2014-12-31,0-10,F,Z380,,50801,val
41516.0,3,1882042,129640,207,8111,2014-12-29,2014-12-31,40-50,F,S8270,S8230S8280,50324,val
41517.0,34,1958501,1363126,611,15743,2014-08-30,2014-10-02,0-10,F,P364,G008 Z290,99999,val


Peripheral tables,

In [4]:
# TODO: Annotate columns with roles
vykony

name,Identifikace_pripadu,Typ_polozky,Kod_polozky,Pocet,Body,Datum_provedeni_vykonu
role,unused_float,unused_float,unused_float,unused_float,unused_float,unused_string
0.0,1829493,0,38210,1,79,2014-11-01
1.0,1829493,0,51859,1,300,2014-11-01
2.0,1829493,0,53021,1,348,2014-11-01
3.0,1829493,0,53022,1,234,2014-11-01
4.0,1829493,0,66819,1,4067,2014-11-01
,...,...,...,...,...,...
1879332.0,1958505,0,63115,1,239,2014-08-30
1879333.0,1958505,0,63117,4,1508,2014-08-30
1879334.0,1958505,0,63120,1,2987,2014-08-30
1879335.0,1958505,0,602,1,1381,2014-08-31


In [5]:
# TODO: Annotate columns with roles
zup

name,Identifikace_pripadu,Typ_polozky,Kod_polozky,Pocet,Cena,Datum_provedeni_vykonu
role,unused_float,unused_float,unused_float,unused_float,unused_float,unused_string
0.0,1829493,3,2370,4,458.48,2014-11-01
1.0,1829493,1,58092,0.1,23,2014-11-01
2.0,1829493,3,73679,4,7395.48,2014-11-01
3.0,1829493,3,99861,6,11024.82,2014-11-01
4.0,1829493,3,99862,1,841.53,2014-11-01
,...,...,...,...,...,...
192414.0,1958501,1,83050,3,134.82,2014-09-21
192415.0,1958501,1,83050,3,134.82,2014-09-22
192416.0,1958501,1,83050,3,134.82,2014-09-23
192417.0,1958501,1,83050,1,44.94,2014-09-24


The next step is to define the data model. Refer to [https://relational.fel.cvut.cz/dataset/FNHK](https://relational.fel.cvut.cz/dataset/FNHK)
for a description of the dataset.

In [6]:
dm = getml.data.DataModel(population=pripady.to_placeholder())
dm.add(getml.data.to_placeholder(**peripheral))

# TODO
# dm.population.join(...)

Now we can create the container and add the tables to it.

In [7]:
container = getml.data.Container(population=pripady, split=pripady.split)
container.add(**peripheral)

container

Unnamed: 0,subset,name,rows,type
0,train,pripady,29064,View
1,val,pripady,12455,View

Unnamed: 0,name,rows,type
0,vykony,1879337,DataFrame
1,zup,192419,DataFrame
