diff --git a/README.md b/README.md
index e56d6f1..e3bd4f2 100644
--- a/README.md
+++ b/README.md
@@ -8,7 +8,7 @@
-
+
@@ -20,24 +20,24 @@
# Introduction
-This repository contains different [Jupyter Notebooks](https://jupyter.org/) to demonstrate the capabilities of [getML](https://www.getml.com/) in the realm of machine learning on relational data-sets in various domains. getML and its feature engineering algorithms ([FastProp](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#fastprop), [Multirel](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#multirel), [Relboost](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#relboost), [RelMT](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#relmt)), its [predictors](https://docs.getml.com/latest/user_guide/predicting/predicting.html#using-getml) (LinearRegression, LogisticRegression, XGBoostClassifier, XGBoostRegressor) and its [hyperparameter optimizer](https://docs.getml.com/latest/user_guide/hyperopt/hyperopt.html#hyperparameter-optimization) (RandomSearch, LatinHypercubeSearch, GaussianHyperparameterSearch), are benchmarked against competing tools in similar categories, like [featuretools](https://www.featuretools.com/), [tsfresh](https://tsfresh.com/), [prophet](https://facebook.github.io/prophet/). While [FastProp](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#fastprop) usually outperforms the competition in terms of runtime and resource requirements, the more sophisticated algorithms ([Multirel](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#multirel), [Relboost](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#relboost), [RelMT](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#relmt)), which are part of the [professional and enterprise feature-sets](https://www.getml.com/pricing), can lead to higher accuracy with lower resource requirements still then the competition. The demonstrations are done on publicly available data-sets, which are standardly used for such comparisons.
+This repository contains different [Jupyter Notebooks](https://jupyter.org) to demonstrate the capabilities of [getML](https://www.getml.com) in the realm of machine learning on relational data-sets in various domains. getML and its feature engineering algorithms ([FastProp](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-fastprop), [Multirel](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-multirel), [Relboost](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-relboost), [RelMT](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-relmt)), its [predictors](https://getml.com/latest/user_guide/concepts/predicting#using-getml) (LinearRegression, LogisticRegression, XGBoostClassifier, XGBoostRegressor) and its [hyperparameter optimizer](https://getml.com/latest/user_guide/concepts/hyperopt#hyperparameter-optimization) (RandomSearch, LatinHypercubeSearch, GaussianHyperparameterSearch), are benchmarked against competing tools in similar categories, like [featuretools](https://www.featuretools.com/), [tsfresh](https://tsfresh.com/), and [prophet](https://facebook.github.io/prophet/). While [FastProp](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-fastprop) usually outperforms the competition in terms of runtime and resource requirements, the more sophisticated algorithms ([Multirel](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-multirel), [Relboost](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-relboost), [RelMT](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-relmt)), which are part of the [Enterprise edition](https://getml.com/latest/enterprise), often lead to even higher accuracy while maintaining low resource requirements. The demonstrations are done on publicly available data-sets, which are standardly used for such comparisons.
# Table of Contents
-* [Introduction](#introduction)
-* [Table of Contents](#table-of-contents)
-* [Usage](#usage)
- * [Reading Online](#reading-online)
- * [Experimenting Locally](#experimenting-locally)
- * [Using Docker](#using-docker)
- * [On the Machine (Linux/x64 & arm64)](#on-the-machine-linuxx64--arm64)
-* [Notebooks](#notebooks)
- * [Overview](#overview)
- * [Descriptions](#descriptions)
- * [Quick access by grouping by](#quick-access-by-grouping-by)
- * [Benchmarks](#benchmarks)
- * [FastProp-Benchmarks](#fastprop-benchmarks)
- * [Further Benchmarks in the Relational Dataset Repository](#further-benchmarks-in-the-relational-dataset-repository)
+- [Introduction](#introduction)
+- [Table of Contents](#table-of-contents)
+- [Usage](#usage)
+ - [Reading Online](#reading-online)
+ - [Experimenting Locally](#experimenting-locally)
+ - [Using Docker](#using-docker)
+ - [On the Machine (Linux/x64 \& arm64)](#on-the-machine-linuxx64--arm64)
+- [Notebooks](#notebooks)
+ - [Overview](#overview)
+ - [Descriptions](#descriptions)
+ - [Quick access by grouping by](#quick-access-by-grouping-by)
+ - [Benchmarks](#benchmarks)
+ - [FastProp Benchmarks](#fastprop-benchmarks)
+ - [Further Benchmarks in the Relational Dataset Repository](#further-benchmarks-in-the-relational-dataset-repository)
# Usage
@@ -55,14 +55,14 @@ To experiment with the notebooks, such as playing with different pipelines and p
There are a `docker-compose.yml` and a `Dockerfile` for easy usage provided.
-Simply clone this repository and command to start the `notebooks` service. The image, it depends on, will be build if it is not already available.
+Simply clone this repository and run the docker command to start the `notebooks` service. The image it depends on will be build if it is not already available.
```
$ git clone https://github.com/getml/getml-demo.git
$ docker compose up notebooks
```
-To open Jupyter Lab in the browser, look for the following lines in the output and copy-paste it in your browser:
+To open Jupyter Lab in the browser, look for the following lines in the output and copy-paste them in your browser:
```
Or copy and paste one of these URLs:
@@ -70,25 +70,25 @@ Or copy and paste one of these URLs:
http://localhost:8888/lab?token=
```
-After the first `getml.engine.launch(...)` is executed and the engine is started, its monitor can be opened in the browser under
+After the first `getml.engine.launch(...)` is executed and the Engine is started, the corresponding Monitor can be opened in the browser under
```
http://localhost:1709/#/token/token
```
> [!NOTE]
-> Using alternatives to [Docker Desktop](https://www.docker.com/products/docker-desktop/) like
-> * [Podman](https://podman.io/),
-> * [Podman Desktop](https://podman-desktop.io/) or
-> * [Rancher Desktop](https://rancherdesktop.io/) with a container engine like dockerd(moby) or containerd(nerdctl)
+> Using alternatives to [Docker Desktop](https://www.docker.com/products/docker-desktop) like
+> * [Podman](https://podman.io),
+> * [Podman Desktop](https://podman-desktop.io) or
+> * [Rancher Desktop](https://rancherdesktop.io) with a container engine like dockerd(moby) or containerd(nerdctl)
>
-> allows bind-mounting the notebooks in a user-writeable way (this might need to include `userns_mode: keep-id`) instead of having to `COPY` them in. In combination with volume-binding `/home/getml/.getML/logs` and `/home/getml/.getML/projects`, runs and changes can be persisted across containers.
+> allows bind-mounting the notebooks in a user-writeable way (this might need to be included: `userns_mode: keep-id`) instead of having to `COPY` them in. In combination with volume-binding `/home/user/.getML/logs` and `/home/user/.getML/projects`, runs and changes can be persisted across containers.
### On the Machine (Linux/x64 & arm64)
-Alternatively, getML and the notebooks can be run natively on the local Linux machine by having certain software installed, like Python and some Python libraries, Jupyter-Lab and the getML engine. The [getML Python library](https://github.com/getml/getml-community/) provides an engine version without [enterprise features](https://www.getml.com/pricing). But as those features are shown in the demonstration notebooks, the [trail of the enterprise version](https://www.getml.com/download) can be used for those cases.
+Alternatively, getML and the notebooks can be run natively on the local Linux machine by having certain software installed, like Python and some Python libraries, Jupyter-Lab and the getML Engine. The [getML Python library](https://github.com/getml/getml-community) provides an Engine version without [Enterprise features](https://getml.com/latest/enterprise). In order to replicate Enterprise functionalities in the notebooks, you may obtain an [Enterprise trial version](https://getml.com/latest/enterprise/request-trial).
-The following commands will set up a Python environment with necessary Python libraries and the trail of the getML enterprise version, and Jupyter-Lab
+The following commands will set up a Python environment with necessary Python libraries and the getML Enterprise trial version, and Jupyter-Lab
```
$ git clone https://github.com/getml/getml-demo.git
@@ -101,7 +101,7 @@ $ jupyter-lab
```
> [!TIP]
-> Install the [trail of the enterprise version](https://www.getml.com/download) via the [Install getML on Linux guide](https://docs.getml.com/latest/home/installation/linux.html#install-getml-on-linux) to try the enterprise features.
+> Install the [Enterprise trial version](https://getml.com/latest/enterprise/request-trial) via the [Install getML on Linux guide](https://getml.com/latest/install/packages/linux#install-getml-on-linux) to try the Enterprise features.
With the last command, Jupyter-Lab should automatically open in the browser. If not, look for the following lines in the output and copy-paste it in your browser:
@@ -111,7 +111,7 @@ Or copy and paste one of these URLs:
http://localhost:8888/lab?token=
```
-After the first `getml.engine.launch(...)` is executed and the engine is started, its monitor can be opened in the browser under
+After the first `getml.engine.launch(...)` is executed and the Engine is started, the corresponding Monitor can be opened in the browser under
```
http://localhost:1709/#/token/token
@@ -446,7 +446,7 @@ relational data scheme involving many tables.
An algorithm, that generates specific different features can only use columns for conditions, it is not allowed to aggregate columns – and it doesn't need to do so. That means, the computational complexity is linear instead of quadratic. For data sets with a large number of columns, this can make all the difference in the world. For instance, if you have 100 columns the size of the search space of the second approach is only 1% of the size of the search space of the first one.
- getML features an algorithm called relboost, which generates features according to this principle and is therefore very suitable for data sets with many columns.
+ getML features an algorithm called Relboost, which generates features according to this principle and is therefore very suitable for data sets with many columns.
To illustrate the problem, we use a data set related to robotics. When robots interact with humans, the most important thing is, that they don't hurt people. In order to prevent such accidents, the force vector on the robot's arm is measured. However, measuring the force vector is expensive. Therefore, we want consider an alternative approach, where we would like to predict the force vector based on other sensor data that are less costly to measure. To do so, we use machine learning. However, the data set contains measurements from almost 100 different sensors and we do not know which and how many sensors are relevant for predicting the force vector.
diff --git a/air_pollution.ipynb b/air_pollution.ipynb
index 7908755..ff845d7 100644
--- a/air_pollution.ipynb
+++ b/air_pollution.ipynb
@@ -195,7 +195,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "First, we spilt our data. We introduce a [simple, time-based split](https://docs.getml.com/latest/api/split/getml.data.split.time.html) and use all data until 2013-12-31 for training and everything starting from 2014-01-01 for testing."
+ "First, we spilt our data. We introduce a [simple, time-based split](https://getml.com/latest/reference/data/split/#getml.data.split.time.time) and use all data until 2013-12-31 for training and everything starting from 2014-01-01 for testing."
]
},
{
@@ -4672,7 +4672,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "This is a typical [RelMT](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#relmt) feature, where the aggregation (`SUM` in this case) is applied conditionally – the conditions are learned by `RelMT` – to a set of linear models, whose weights are, again, learned by `RelMT`."
+ "This is a typical [RelMT](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-relmt) feature, where the aggregation (`SUM` in this case) is applied conditionally – the conditions are learned by `RelMT` – to a set of linear models, whose weights are, again, learned by `RelMT`."
]
},
{
diff --git a/atherosclerosis.ipynb b/atherosclerosis.ipynb
index 714ec0e..d040f58 100644
--- a/atherosclerosis.ipynb
+++ b/atherosclerosis.ipynb
@@ -207,8 +207,8 @@
"\n",
"The `getml.datasets.load_atherosclerosis` method took care of the entire data lifting:\n",
"* Downloads csv's from our servers in python\n",
- "* Converts csv's to getML [DataFrames](https://docs.getml.com/latest/api/getml.data.DataFrame.html#getml.data.DataFrame)\n",
- "* Sets [roles](https://docs.getml.com/latest/user_guide/annotating_data/annotating_data.html#roles) to columns inside getML DataFrames"
+ "* Converts csv's to getML [DataFrames](https://getml.com/latest/reference/data/data_frame#getml.data.DataFrame)\n",
+ "* Sets [roles](https://getml.com/latest/user_guide/concepts/annotating_data#roles) to columns inside getML DataFrames"
]
},
{
@@ -18729,7 +18729,7 @@
"source": [
"#### 1.3 Define relational model\n",
"\n",
- "To start with relational learning, we need to specify an abstract data model. Here, we use the [high-level star schema API](https://docs.getml.com/latest/api/getml.data.StarSchema.html) that allows us to define the abstract data model and construct a [container](https://docs.getml.com/latest/api/getml.data.Container.html) with the concrete data at one-go. While a simple `StarSchema` indeed works in many cases, it is not sufficient for more complex data models like schoflake schemas, where you would have to define the data model and construct the container in separate steps, by utilzing getML's [full-fledged data model](https://docs.getml.com/latest/api/getml.data.DataModel.html) and [container](https://docs.getml.com/latest/api/getml.data.Container.html) APIs respectively."
+ "To start with relational learning, we need to specify an abstract data model. Here, we use the [high-level star schema API](https://getml.com/latest/reference/data/star_schema) that allows us to define the abstract data model and construct a [container](https://getml.com/latest/reference/data/container) with the concrete data at one-go. While a simple `StarSchema` indeed works in many cases, it is not sufficient for more complex data models like schoflake schemas, where you would have to define the data model and construct the container in separate steps, by utilzing getML's [full-fledged data model](https://getml.com/latest/reference/data/data_model) and [container](https://getml.com/latest/reference/data/container) APIs respectively."
]
},
{
diff --git a/dodgers.ipynb b/dodgers.ipynb
index ec8257f..85c0184 100644
--- a/dodgers.ipynb
+++ b/dodgers.ipynb
@@ -977,7 +977,7 @@
"source": [
"#### 1.3 Define relational model\n",
"\n",
- "To start with relational learning, we need to specify the data model. We manually replicate the appropriate time series structure by setting time series related join conditions (`horizon`, `memory` and `allow_lagged_targets`). This is done abstractly using [Placeholders](https://docs.getml.com/latest/user_guide/data_model/data_model.html#placeholders)\n",
+ "To start with relational learning, we need to specify the data model. We manually replicate the appropriate time series structure by setting time series related join conditions (`horizon`, `memory` and `allow_lagged_targets`). This is done abstractly using [Placeholders](https://getml.com/latest/user_guide/concepts/data_model#placeholders)\n",
"\n",
"The data model consists of two tables:\n",
"* __Population table__ `traffic_{test/train}`: holds target and the contemporarily available time-based components\n",
@@ -6484,7 +6484,7 @@
"\n",
"We have compared getML's feature learning algorithms to Prophet and tsfresh on a data set related to traffic on LA's 101 North freeway. We found that getML significantly outperforms both Prophet and tsfresh. These results are consistent with the view that relational learning is a powerful tool for time series analysis.\n",
"\n",
- "You are encouraged to reproduce these results. You will need [getML](https://getml.com/product) to do so. You can download it for free."
+ "You are encouraged to reproduce these results. You will need [getML](https://getml.com) to do so. You can download it for free."
]
}
],
diff --git a/fastprop_benchmark/air_pollution_prop.ipynb b/fastprop_benchmark/air_pollution_prop.ipynb
index fd154d5..a83c1a5 100644
--- a/fastprop_benchmark/air_pollution_prop.ipynb
+++ b/fastprop_benchmark/air_pollution_prop.ipynb
@@ -42,9 +42,9 @@
"\n",
"A common approach to feature engineering is to generate attribute-value representations from relational data by applying a fixed set of aggregations to columns of interest and perform a feature selection on the (possibly large) set of generated features afterwards. In academia, this approach is called _propositionalization._\n",
"\n",
- "getML's [FastProp](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#fastprop) is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries [featuretools](https://www.featuretools.com/) and [tsfresh](https://tsfresh.readthedocs.io/en/latest/). Both of these libraries use propositionalization approaches for feature engineering.\n",
+ "getML's [FastProp](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-fastprop) is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries [featuretools](https://www.featuretools.com/) and [tsfresh](https://tsfresh.readthedocs.io/en/latest/). Both of these libraries use propositionalization approaches for feature engineering.\n",
"\n",
- "As our example dataset, we use a publicly available dataset on air pollution in Beijing, China (https://archive.ics.uci.edu/dataset/381/beijing+pm2+5+data). For further details about the data set refer to [the full notebook](../air_pollution.ipynb)."
+ "As our example dataset, we use a publicly available dataset on air pollution in Beijing, China (https://archive.ics.uci.edu/dataset/381/beijing+pm2+5+data). For further details about the data set refer to [the full notebook](https://getml.com/latest/examples/enterprise-notebooks/air_pollution)."
]
},
{
diff --git a/fastprop_benchmark/dodgers_prop.ipynb b/fastprop_benchmark/dodgers_prop.ipynb
index deb18bc..1650d9f 100644
--- a/fastprop_benchmark/dodgers_prop.ipynb
+++ b/fastprop_benchmark/dodgers_prop.ipynb
@@ -38,9 +38,9 @@
"\n",
"A common approach to feature engineering is to generate attribute-value representations from relational data by applying a fixed set of aggregations to columns of interest and perform a feature selection on the (possibly large) set of generated features afterwards. In academia, this approach is called _propositionalization._\n",
"\n",
- "getML's [FastProp](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#fastprop) is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries [featuretools](https://www.featuretools.com/) and [tsfresh](https://tsfresh.readthedocs.io/en/latest/). Both of these libraries use propositionalization approaches for feature engineering.\n",
+ "getML's [FastProp](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-fastprop) is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries [featuretools](https://www.featuretools.com/) and [tsfresh](https://tsfresh.readthedocs.io/en/latest/). Both of these libraries use propositionalization approaches for feature engineering.\n",
"\n",
- "In this notebook, we use traffic data that was collected for the Glendale on ramp for the 101 North freeway in Los Angeles. For further details about the data set refer to [the full notebook](../dodgers.ipynb)."
+ "In this notebook, we use traffic data that was collected for the Glendale on ramp for the 101 North freeway in Los Angeles. For further details about the data set refer to [the full notebook](https://getml.com/latest/examples/enterprise-notebooks/dodgers)."
]
},
{
@@ -894,7 +894,7 @@
"source": [
"#### 1.3 Define relational model\n",
"\n",
- "To start with relational learning, we need to specify the data model. We manually replicate the appropriate time series structure by setting time series related join conditions (`horizon`, `memory` and `allow_lagged_targets`). This is done abstractly using [Placeholders](https://docs.getml.com/latest/user_guide/data_model/data_model.html#placeholders)\n",
+ "To start with relational learning, we need to specify the data model. We manually replicate the appropriate time series structure by setting time series related join conditions (`horizon`, `memory` and `allow_lagged_targets`). This is done abstractly using [Placeholders](https://getml.com/latest/user_guide/concepts/data_model#placeholders)\n",
"\n",
"The data model consists of two tables:\n",
"* __Population table__ `traffic_{test/train}`: holds target and the contemporarily available time-based components\n",
diff --git a/fastprop_benchmark/interstate94_prop.ipynb b/fastprop_benchmark/interstate94_prop.ipynb
index ba96c6a..f39e02e 100644
--- a/fastprop_benchmark/interstate94_prop.ipynb
+++ b/fastprop_benchmark/interstate94_prop.ipynb
@@ -38,9 +38,9 @@
"\n",
"A common approach to feature engineering is to generate attribute-value representations from relational data by applying a fixed set of aggregations to columns of interest and perform a feature selection on the (possibly large) set of generated features afterwards. In academia, this approach is called _propositionalization._\n",
"\n",
- "getML's [FastProp](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#fastprop) is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries [featuretools](https://www.featuretools.com/) and [tsfresh](https://tsfresh.readthedocs.io/en/latest/). Both of these libraries use propositionalization approaches for feature engineering.\n",
+ "getML's [FastProp](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-fastprop) is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries [featuretools](https://www.featuretools.com/) and [tsfresh](https://tsfresh.readthedocs.io/en/latest/). Both of these libraries use propositionalization approaches for feature engineering.\n",
"\n",
- "In this notebook, we predict the hourly traffic volume on I-94 westbound from Minneapolis-St Paul. The analysis is built on top of a dataset provided by the [MN Department of Transportation](https://www.dot.state.mn.us), with some data preparation done by [John Hogue](https://github.com/dreyco676/Anomaly_Detection_A_to_Z/). For further details about the data set refer to [the full notebook](../interstate94.ipynb)."
+ "In this notebook, we predict the hourly traffic volume on I-94 westbound from Minneapolis-St Paul. The analysis is built on top of a dataset provided by the [MN Department of Transportation](https://www.dot.state.mn.us), with some data preparation done by [John Hogue](https://github.com/dreyco676/Anomaly_Detection_A_to_Z/). For further details about the data set refer to [the full notebook](https://getml.com/latest/examples/enterprise-notebooks/interstate94)."
]
},
{
diff --git a/fastprop_benchmark/occupancy_prop.ipynb b/fastprop_benchmark/occupancy_prop.ipynb
index 41f0d83..4c135cf 100644
--- a/fastprop_benchmark/occupancy_prop.ipynb
+++ b/fastprop_benchmark/occupancy_prop.ipynb
@@ -49,9 +49,9 @@
"\n",
"A common approach to feature engineering is to generate attribute-value representations from relational data by applying a fixed set of aggregations to columns of interest and perform a feature selection on the (possibly large) set of generated features afterwards. In academia, this approach is called _propositionalization._\n",
"\n",
- "getML's [FastProp](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#fastprop) is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries [featuretools](https://www.featuretools.com/) and [tsfresh](https://tsfresh.readthedocs.io/en/latest/). Both of these libraries use propositionalization approaches for feature engineering.\n",
+ "getML's [FastProp](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-fastprop) is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries [featuretools](https://www.featuretools.com/) and [tsfresh](https://tsfresh.readthedocs.io/en/latest/). Both of these libraries use propositionalization approaches for feature engineering.\n",
"\n",
- "Our use case here is a public domain data set for predicting room occupancy from sensor data. For further details about the data set refer to [the full notebook](../occupancy.ipynb)."
+ "Our use case here is a public domain data set for predicting room occupancy from sensor data. For further details about the data set refer to [the full notebook](https://getml.com/latest/examples/enterprise-notebooks/occupancy)."
]
},
{
diff --git a/fastprop_benchmark/robot_prop.ipynb b/fastprop_benchmark/robot_prop.ipynb
index a091e10..ef7b2a7 100644
--- a/fastprop_benchmark/robot_prop.ipynb
+++ b/fastprop_benchmark/robot_prop.ipynb
@@ -38,7 +38,7 @@
"\n",
"A common approach to feature engineering is to generate attribute-value representations from relational data by applying a fixed set of aggregations to columns of interest and perform a feature selection on the (possibly large) set of generated features afterwards. In academia, this approach is called _propositionalization._\n",
"\n",
- "getML's [FastProp](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#fastprop) is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries [featuretools](https://www.featuretools.com/) and [tsfresh](https://tsfresh.readthedocs.io/en/latest/). Both of these libraries use propositionalization approaches for feature engineering.\n",
+ "getML's [FastProp](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-fastprop) is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries [featuretools](https://www.featuretools.com/) and [tsfresh](https://tsfresh.readthedocs.io/en/latest/). Both of these libraries use propositionalization approaches for feature engineering.\n",
"\n",
"The data set has been generously provided by Erik Berger who originally collected it for his dissertation:\n",
"\n",
diff --git a/interstate94.ipynb b/interstate94.ipynb
index 56f8d82..1a634f5 100644
--- a/interstate94.ipynb
+++ b/interstate94.ipynb
@@ -895,8 +895,8 @@
"\n",
"The `getml.datasets.load_interstate94` method took care of the entire data preparation:\n",
"* Downloads csv's from our servers into python\n",
- "* Converts csv's to getML [DataFrames](https://docs.getml.com/latest/api/getml.data.DataFrame.html#dataframe)\n",
- "* Sets [roles](https://docs.getml.com/latest/user_guide/annotating_data/annotating_data.html#roles) & [units](https://docs.getml.com/latest/user_guide/annotating_data/annotating_data.html#annotating-units) to columns inside getML DataFrames"
+ "* Converts csv's to getML [DataFrames](https://getml.com/latest/reference/data/data_frame#dataframe)\n",
+ "* Sets [roles](https://getml.com/latest/user_guide/concepts/annotating_data#roles) & [units](https://getml.com/latest/user_guide/concepts/annotating_data#annotating-units) to columns inside getML DataFrames"
]
},
{
@@ -968,7 +968,7 @@
"source": [
"__Train/test split__\n",
"\n",
- "We use [getML's split functionality](https://docs.getml.com/latest/api/getml.data.split.html) to retrieve a lazily evaluated split column, that we can supply to the time series api below."
+ "We use [getML's split functionality](https://getml.com/latest/reference/data/split) to retrieve a lazily evaluated split column, that we can supply to the time series api below."
]
},
{
@@ -1484,7 +1484,7 @@
"source": [
"### 1.3 Define relational model\n",
"\n",
- "To start with relational learning, we need to specify the data model. We manually replicate the appropriate time series structure by setting time series related join conditions (`horizon`, `memory` and `allow_lagged_targets`). We use the [high-level time series api](https://docs.getml.com/latest/api/getml.data.TimeSeries.html) for this.\n",
+ "To start with relational learning, we need to specify the data model. We manually replicate the appropriate time series structure by setting time series related join conditions (`horizon`, `memory` and `allow_lagged_targets`). We use the [high-level time series api](https://getml.com/latest/reference/data/time_series) for this.\n",
"\n",
"Under the hood, the time series api abstracts away a self cross join of the population table (`traffic`) that allows getML's feature learning algorithms to learn patterns from past observations."
]
diff --git a/kaggle_notebooks/cora_getml_vs_gnn.ipynb b/kaggle_notebooks/cora_getml_vs_gnn.ipynb
index 83f0e8a..6cc23d0 100644
--- a/kaggle_notebooks/cora_getml_vs_gnn.ipynb
+++ b/kaggle_notebooks/cora_getml_vs_gnn.ipynb
@@ -116,8 +116,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "Launching ./getML --allow-push-notifications=true --allow-remote-ips=false --home-directory=/home/jan-meyer --in-memory=true --install=false --launch-browser=true --log=false in /home/jan-meyer/.getML/getml-1.4.0-x64-linux...\n",
- "Launched the getML engine. The log output will be stored in /home/jan-meyer/.getML/logs/20240827150119.log.\n",
+ "Launching ./getML --allow-push-notifications=true --allow-remote-ips=false --home-directory=/home/user --in-memory=true --install=false --launch-browser=true --log=false in /home/user/.getML/getml-1.4.0-x64-linux...\n",
+ "Launched the getML engine. The log output will be stored in /home/user/.getML/logs/20240827150119.log.\n",
"Loading pipelines... 100% |██████████| [elapsed: 00:07, remaining: 00:00] \n",
"\n",
"Connected to project 'cora'\n"
diff --git a/kaggle_notebooks/epilepsy_recognition.ipynb b/kaggle_notebooks/epilepsy_recognition.ipynb
index 62d0f9f..16f26c4 100644
--- a/kaggle_notebooks/epilepsy_recognition.ipynb
+++ b/kaggle_notebooks/epilepsy_recognition.ipynb
@@ -179,7 +179,7 @@
"\n",
"### Start up getML\n",
"\n",
- "First, we import the necessary libraries and launch the [getML engine](https://docs.getml.com/latest/user_guide/getml_suite/engine.html). The engine runs in the background and takes care of all the heavy lifting for you. This includes things like our powerful database engine and efficient algorithms as well as the [getML monitor](https://docs.getml.com/latest/user_guide/getml_suite/monitor.html), which you can access by pointing your browser to http://localhost:1709/#/"
+ "First, we import the necessary libraries and launch the [getML engine](https://getml.com/latest/user_guide/concepts/getml_suite/#engine-concepts). The engine runs in the background and takes care of all the heavy lifting for you. This includes things like our powerful database engine and efficient algorithms as well as the [getML monitor](https://getml.com/latest/user_guide/concepts/getml_suite/#monitor-concepts), which you can access by pointing your browser to http://localhost:1709/#/"
]
},
{
@@ -223,7 +223,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "All your work is organized into [projects](https://docs.getml.com/latest/user_guide/project_management/project_management.html). You can easily set any name for your current project. The engine will create a new project or use an existing one if the project name already exists. It will also provide you with a direct link to the project within the monitor."
+ "All your work is organized into [projects](https://getml.com/latest/user_guide/concepts/project_management). You can easily set any name for your current project. The engine will create a new project or use an existing one if the project name already exists. It will also provide you with a direct link to the project within the monitor."
]
},
{
@@ -252,7 +252,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "You can manage your projects conveniently through the monitor web interface or directly using Python commands. For example, you can suspend your current project to free resources using [`getml.project.suspend()`](https://docs.getml.com/latest/api/project/getml.project.suspend.html), switch to another project on the fly using [`getml.project.switch('new project name')`](https://docs.getml.com/latest/api/project/getml.project.switch.html), or restart using [`getml.project.restart()`](https://docs.getml.com/latest/api/project/getml.project.restart.html) should something go wrong. You can even save your current project to disk using [`getml.project.save('filename')`](https://docs.getml.com/latest/api/project/getml.project.save.html) and load it with [`getml.project.load('filename')`](https://docs.getml.com/latest/api/project/getml.project.load.html)."
+ "You can manage your projects conveniently through the monitor web interface or directly using Python commands. For example, you can suspend your current project to free resources using [`getml.project.suspend()`](https://getml.com/latest/reference/project/#getml.project.attrs.suspend), switch to another project on the fly using [`getml.project.switch('new project name')`](https://getml.com/latest/reference/project/#getml.project.attrs.switch), or restart using [`getml.project.restart()`](https://getml.com/latest/reference/project/#getml.project.attrs.restart) should something go wrong. You can even save your current project to disk using [`getml.project.save('filename')`](https://getml.com/latest/reference/project/#getml.project.attrs.save) and load it with [`getml.project.load('filename')`](https://getml.com/latest/reference/project/#getml.project.attrs.load)."
]
},
{
@@ -1587,7 +1587,7 @@
"\n",
"Now that we have explored our data, let's do some machine learning. GetML uses a highly sophisticated engine that runs in the background and takes away a lot of hassle in machine learning applications. \n",
"\n",
- "Let's take a look at loading data into your getML project. First, let's learn how we work with data in getML. Data is represented by getML's custom [DataFrame](https://docs.getml.com/latest/api/data/getml.DataFrame.html) that behaves similarly to a pandas DataFrame. However, a [getML.DataFrame](https://docs.getml.com/latest/api/data/getml.DataFrame.html) is a representation of our data inside getML's highly efficient C++ database engine that runs in the background. We can [load data](https://docs.getml.com/latest/user_guide/importing_data/importing_data.html) from various sources such as pandas DataFrames ([`getml.DataFrame.from_pandas`](https://docs.getml.com/latest/api/data/DataFrame/getml.DataFrame.from_pandas.html)), from CSV files ([`getml.DataFrame.from_csv`](https://docs.getml.com/latest/api/data/DataFrame/getml.DataFrame.from_csv.html)), or load from remote databases ([`getml.DataFrame.from_db`](https://docs.getml.com/latest/api/data/DataFrame/getml.DataFrame.from_db.html)) or even S3 buckets ([`getml.DataFrame.from_s3`](https://docs.getml.com/latest/api/data/DataFrame/getml.DataFrame.from_s3.html)).\n",
+ "Let's take a look at loading data into your getML project. First, let's learn how we work with data in getML. Data is represented by getML's custom [DataFrame](https://getml.com/latest/reference/data/data_frame) that behaves similarly to a pandas DataFrame. However, a [getML.DataFrame](https://getml.com/latest/reference/data/data_frame) is a representation of our data inside getML's highly efficient C++ database engine that runs in the background. We can [load data](https://getml.com/latest/user_guide/concepts/importing_data) from various sources such as pandas DataFrames ([`getml.DataFrame.from_pandas`](https://getml.com/latest/reference/data/data_frame/#getml.data.DataFrame.from_pandas)), from CSV files ([`getml.DataFrame.from_csv`](https://getml.com/latest/reference/data/data_frame/#getml.data.DataFrame.from_csv)), or load from remote databases ([`getml.DataFrame.from_db`](https://getml.com/latest/reference/data/data_frame/#getml.data.DataFrame.from_db)) or even S3 buckets ([`getml.DataFrame.from_s3`](https://getml.com/latest/reference/data/data_frame/#getml.data.DataFrame.from_s3)).\n",
"\n",
"Let's create a population DataFrame that contains our main goal: classify a 1s window. This means that we only need a DataFrame that holds the class labels of each window and a unique id, which in this case can just be the `sample_index`."
]
@@ -2218,9 +2218,9 @@
"source": [
"As you can see, our data is now stored inside the engine and represented by a getML.DataFrame (data is of course the same). The Python API provides a link to the getML.DataFrame in the monitor, where you can conveniently explore your data.\n",
"\n",
- "Now we need to [annotate our data](https://docs.getml.com/latest/user_guide/annotating_data/annotating_data.html) so the engine knows what to do with it.\n",
+ "Now we need to [annotate our data](https://getml.com/latest/user_guide/concepts/annotating_data) so the engine knows what to do with it.\n",
"\n",
- "A key aspect of using getML.DataFrame are [roles](https://docs.getml.com/latest/api/getml.data.Roles.html). Every column with relevant data to our data model needs to have a certain role specified. As you can see, both of our columns have the `unused_float` role for now. One of the most important roles is [`getml.data.roles.target`](https://docs.getml.com/latest/api/roles/getml.data.roles.target.html), specifying that the data in this column is our target variable, the value that we want to train our machine learning model on. In our case, the column `y` containing the class label is our target. Let's tell the engine exactly that:"
+ "A key aspect of using getML.DataFrame are [roles](https://getml.com/latest/reference/data/roles_obj). Every column with relevant data to our data model needs to have a certain role specified. As you can see, both of our columns have the `unused_float` role for now. One of the most important roles is [`getml.data.roles.target`](https://getml.com/latest/reference/data/roles/#getml.data.roles.target), specifying that the data in this column is our target variable, the value that we want to train our machine learning model on. In our case, the column `y` containing the class label is our target. Let's tell the engine exactly that:"
]
},
{
@@ -2639,7 +2639,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Now we need to specify the column roles in the peripheral table. This getML.DataFrame contains the `sample_index` as well and we need to set it as our join key, as described above. Subsequently, as this table will contain our actual data in the form of EEG signal values, we specify the role of this column as numerical ([`getml.data.roles.numerical`](https://docs.getml.com/latest/api/roles/getml.data.roles.numerical.html)), something we can train our machine learning model on:"
+ "Now we need to specify the column roles in the peripheral table. This getML.DataFrame contains the `sample_index` as well and we need to set it as our join key, as described above. Subsequently, as this table will contain our actual data in the form of EEG signal values, we specify the role of this column as numerical ([`getml.data.roles.numerical`](https://getml.com/latest/reference/data/roles/#getml.data.roles.numerical)), something we can train our machine learning model on:"
]
},
{
@@ -3336,10 +3336,10 @@
"\n",
"Now that we have our data efficiently stored in getML.DataFrame, we continue to construct our data model.\n",
"\n",
- "This is very easily done by using one of getML's many [DataModels](https://docs.getml.com/latest/user_guide/data_model/data_model.html). We put our time-series data in a relational context and can utilze for example a simple [StarSchema](https://docs.getml.com/latest/api/getml.data.StarSchema.html) data model to accomplish this. Easily put, we see our windows (the time-series data) as splits into many individual samples that are joined onto the window labels. This way, we are effectively thinking of time series as relational data: we are identifying relevant information from our data and aggragate it into a single label. In fact, what we are doing is effectively a self join, because we are joining a table to itself. This allows for very efficient calculation.\n",
+ "This is very easily done by using one of getML's many [DataModels](https://getml.com/latest/user_guide/concepts/data_model). We put our time-series data in a relational context and can utilze for example a simple [StarSchema](https://getml.com/latest/reference/data/star_schema) data model to accomplish this. Easily put, we see our windows (the time-series data) as splits into many individual samples that are joined onto the window labels. This way, we are effectively thinking of time series as relational data: we are identifying relevant information from our data and aggragate it into a single label. In fact, what we are doing is effectively a self join, because we are joining a table to itself. This allows for very efficient calculation.\n",
"\n",
"\n",
- "First, we define a random data [split](https://docs.getml.com/latest/api/getml.data.split.html):"
+ "First, we define a random data [split](https://getml.com/latest/reference/data/split):"
]
},
{
@@ -3357,7 +3357,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Second, we create our data model. We create a StarSchema containing our population getML.DataFrame as the population table and specify the split of our dataset into train and test set. We then [join](https://docs.getml.com/latest/api/StarSchema/getml.data.StarSchema.join.html) our peripheral table to our time series on the join key, in this case `sample_index`:"
+ "Second, we create our data model. We create a StarSchema containing our population getML.DataFrame as the population table and specify the split of our dataset into train and test set. We then [join](https://getml.com/latest/reference/data/star_schema/#getml.data.StarSchema.join) our peripheral table to our time series on the join key, in this case `sample_index`:"
]
},
{
@@ -3697,7 +3697,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "This is an overview of your data model in the getML engine. At the top you can see a visual representation in the form of a diagram. Here you can easily see how your data and the specific joins is structured. Next you are presented the so called staging tables. This is a list of the relevant data frames and staging table names. At last, you can see an overview of all the data [containers](https://docs.getml.com/latest/api/getml.data.Container.html). This includes the split in train and test set of your population table as well as the peripheral tables.\n",
+ "This is an overview of your data model in the getML engine. At the top you can see a visual representation in the form of a diagram. Here you can easily see how your data and the specific joins is structured. Next you are presented the so called staging tables. This is a list of the relevant data frames and staging table names. At last, you can see an overview of all the data [containers](https://getml.com/latest/reference/data/container). This includes the split in train and test set of your population table as well as the peripheral tables.\n",
"\n",
"In this simple example, the diagram consists of a single join of the peripheral table onto the population table via the `sample_index` as a join key. The population table is split into 90% train and 10% test set. The peripheral talbe contains all the EEG signal values and has over 2 million rows."
]
@@ -3708,11 +3708,11 @@
"source": [
"### The getML machine learning pipeline\n",
"\n",
- "Complex machine learning models are represented by getML [pipelines](https://docs.getml.com/latest/api/pipeline/getml.pipeline.Pipeline.html). A pipeline contains the data model (including complex data relations), data [preprocessors](https://docs.getml.com/latest/api_reference/preprocessors.html), [feature learners](https://docs.getml.com/latest/api_reference/feature_learning.html), [predictors](https://docs.getml.com/latest/api_reference/predictors.html) and so on.\n",
+ "Complex machine learning models are represented by getML [pipelines](https://getml.com/latest/reference/pipeline/pipeline). A pipeline contains the data model (including complex data relations), data [preprocessors](https://getml.com/latest/reference/preprocessors), [feature learners](https://getml.com/latest/reference/feature_learning), [predictors](https://getml.com/latest/reference/predictors) and so on.\n",
"\n",
- "In our approach, we will use getML's very own [FastProp](https://docs.getml.com/latest/api/getml.feature_learning.FastProp.html) automatic feature learner for [feature engineering](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html). We specify a loss function suitable for classification. As we are only dealing with a univariate time-series, we want to use all possible aggregation functions.\n",
+ "In our approach, we will use getML's very own [FastProp](https://getml.com/latest/reference/feature_learning/fastboost) automatic feature learner for [feature engineering](https://getml.com/latest/user_guide/concepts/feature_engineering). We specify a loss function suitable for classification. As we are only dealing with a univariate time-series, we want to use all possible aggregation functions.\n",
"\n",
- "We use the highly efficient [XGBoost](https://docs.getml.com/latest/api/getml.predictors.XGBoostClassifier.html) classifier algorithm as a [predictor](https://docs.getml.com/latest/user_guide/predicting/predicting.html)."
+ "We use the highly efficient [XGBoost](https://getml.com/latest/reference/predictors/xgboost_classifier) classifier algorithm as a [predictor](https://getml.com/latest/user_guide/concepts/predicting)."
]
},
{
@@ -3830,7 +3830,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Now, let's look at how well our model performs. Again, getML does everything for you. We [score](https://docs.getml.com/latest/api/pipeline/Pipeline/getml.pipeline.Pipeline.score.html) our pipeline on the test set:"
+ "Now, let's look at how well our model performs. Again, getML does everything for you. We [score](https://getml.com/latest/reference/pipeline/pipeline/#getml.pipeline.Pipeline.score) our pipeline on the test set:"
]
},
{
@@ -3993,7 +3993,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Let's have a look at some key [machine learning metrics](https://docs.getml.com/latest/api/pipeline/getml.pipeline.Scores.html): Accuracy and Area Under Curve (AUC):"
+ "Let's have a look at some key [machine learning metrics](https://getml.com/latest/reference/pipeline/scores_container): Accuracy and Area Under Curve (AUC):"
]
},
{
diff --git a/kaggle_notebooks/getml-and-gnns-a-natural-symbiosis.ipynb b/kaggle_notebooks/getml-and-gnns-a-natural-symbiosis.ipynb
index 622e364..b4a18bb 100644
--- a/kaggle_notebooks/getml-and-gnns-a-natural-symbiosis.ipynb
+++ b/kaggle_notebooks/getml-and-gnns-a-natural-symbiosis.ipynb
@@ -145,8 +145,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "Launching ./getML --allow-push-notifications=true --allow-remote-ips=false --home-directory=/home/jan-meyer --in-memory=true --install=false --launch-browser=true --log=false in /home/jan-meyer/.getML/getml-1.4.0-x64-linux...\n",
- "Launched the getML engine. The log output will be stored in /home/jan-meyer/.getML/logs/20240827150338.log.\n",
+ "Launching ./getML --allow-push-notifications=true --allow-remote-ips=false --home-directory=/home/user --in-memory=true --install=false --launch-browser=true --log=false in /home/user/.getML/getml-1.4.0-x64-linux...\n",
+ "Launched the getML engine. The log output will be stored in /home/user/.getML/logs/20240827150338.log.\n",
"Loading pipelines... 100% |██████████| [elapsed: 00:00, remaining: 00:00] \n",
"\n",
"Connected to project 'getml_gnn_cora'\n"
diff --git a/loans.ipynb b/loans.ipynb
index 0d9be58..4ed0abd 100644
--- a/loans.ipynb
+++ b/loans.ipynb
@@ -135,10 +135,10 @@
"\n",
"The `getml.datasets.load_loans` method took care of the entire data lifting:\n",
"* Downloads csv's from our servers in python\n",
- "* Converts csv's to getML [DataFrames](https://docs.getml.com/latest/api/getml.data.DataFrame.html#dataframe)\n",
- "* Sets [roles](https://docs.getml.com/latest/user_guide/annotating_data/annotating_data.html#roles) to columns inside getML DataFrames\n",
+ "* Converts csv's to getML [DataFrames](https://getml.com/latest/reference/data/data_frame#dataframe)\n",
+ "* Sets [roles](https://getml.com/latest/user_guide/concepts/annotating_data#roles) to columns inside getML DataFrames\n",
"\n",
- "The only thing left is to set [units](https://docs.getml.com/latest/user_guide/annotating_data/annotating_data.html#annotating-units) to columns that the relational learning algorithm is allowed to compare to each other."
+ "The only thing left is to set [units](https://getml.com/latest/user_guide/concepts/annotating_data#annotating-units) to columns that the relational learning algorithm is allowed to compare to each other."
]
},
{
@@ -2766,7 +2766,7 @@
"source": [
"### 1.3 Define relational model\n",
"\n",
- "To start with relational learning, we need to specify an abstract data model. Here, we use the [high-level star schema API](https://docs.getml.com/latest/api/getml.data.StarSchema.html) that allows us to define the abstract data model and construct a [container](https://docs.getml.com/latest/api/getml.data.Container.html) with the concrete data at one-go. While a simple `StarSchema` indeed works in many cases, it is not sufficient for more complex data models like schoflake schemas, where you would have to define the data model and construct the container in separate steps, by utilzing getML's [full-fledged data model](https://docs.getml.com/latest/api/getml.data.DataModel.html) and [container](https://docs.getml.com/latest/api/getml.data.Container.html) APIs respectively."
+ "To start with relational learning, we need to specify an abstract data model. Here, we use the [high-level star schema API](https://getml.com/latest/reference/data/star_schema) that allows us to define the abstract data model and construct a [container](https://getml.com/latest/reference/data/container) with the concrete data at one-go. While a simple `StarSchema` indeed works in many cases, it is not sufficient for more complex data models like schoflake schemas, where you would have to define the data model and construct the container in separate steps, by utilzing getML's [full-fledged data model](https://getml.com/latest/reference/data/data_model) and [container](https://getml.com/latest/reference/data/container) APIs respectively."
]
},
{
diff --git a/occupancy.ipynb b/occupancy.ipynb
index e9f13e2..936f8a1 100644
--- a/occupancy.ipynb
+++ b/occupancy.ipynb
@@ -13,7 +13,7 @@
"source": [
"# Occupancy - A multivariate time series example\n",
"\n",
- "In this tutorial, you will learn how to apply getML to multivariate time series. It also demonstrates how to use getML's [high-level interface for hyperparameter tuning](https://docs.getml.com/latest/user_guide/hyperopt/hyperopt.html#tuning-routines).\n",
+ "In this tutorial, you will learn how to apply getML to multivariate time series. It also demonstrates how to use getML's [high-level interface for hyperparameter tuning](https://getml.com/latest/user_guide/concepts/hyperopt#tuning-routines).\n",
"\n",
"Summary:\n",
"\n",
@@ -1367,7 +1367,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "We also assign roles to each column. To learn more about what roles do and why we need them, check out the [official documentation](https://docs.getml.com/latest/user_guide/annotating_data/annotating_data.html)."
+ "We also assign roles to each column. To learn more about what roles do and why we need them, check out the [official documentation](https://getml.com/latest/user_guide/concepts/annotating_data)."
]
},
{
@@ -2551,7 +2551,7 @@
"source": [
"#### 2.1 getML Pipeline\n",
"\n",
- "We use a [Multirel](https://docs.getml.com/latest/user_guide/feature_engineering/feature_engineering.html#multirel) for generating the features and a simple logistic regression for prediction.\n",
+ "We use a [Multirel](https://getml.com/latest/user_guide/concepts/feature_engineering/#feature-engineering-algorithms-multirel) for generating the features and a simple logistic regression for prediction.\n",
"\n",
"We do not spend much effort on the hyperparameters and largely go with the default values. The only exception is that we add some regularization to the XGBoostClassifiers.\n",
"\n",
@@ -3037,7 +3037,7 @@
"source": [
"#### 2.2 Model training\n",
"\n",
- "We use a routine for automatic [hyperparameter optimization](https://docs.getml.com/latest/api_reference/hyperopt.html) to find the best parameters for the predictor:"
+ "We use a routine for automatic [hyperparameter optimization](https://getml.com/latest/reference/hyperopt) to find the best parameters for the predictor:"
]
},
{
@@ -3505,7 +3505,7 @@
"source": [
"#### 2.4 Studying the features\n",
"\n",
- "It is always a good idea to study the features the relational learning algorithm has extracted. We can do so in the [feature view](https://docs.getml.com/latest/user_guide/getml_suite/monitor.html#the-getml-monitor) of the getML monitor. Open the monitor and select the models tab in the sidebar. You will see an overview over the trained pipelines. Select a pipeline to see the most essential summary plots.\n",
+ "It is always a good idea to study the features the relational learning algorithm has extracted. We can do so in the [feature view](https://getml.com/latest/user_guide/concepts/getml_suite/#monitor-concepts#the-getml-monitor) of the getML monitor. Open the monitor and select the models tab in the sidebar. You will see an overview over the trained pipelines. Select a pipeline to see the most essential summary plots.\n",
"\n",
"If you want to document them inside your notebook, here is how you can do that:"
]
@@ -4460,7 +4460,7 @@
"\n",
"This tutorial demonstrates that relational learning is a powerful tool for time series. We able to outperform the benchmarks for a scientific paper on a simple public domain time series data set using relatively little effort.\n",
"\n",
- "If you want to learn more about getML, check out the [official documentation](https://getml.com/product)."
+ "If you want to learn more about getML, check out the [official documentation](https://getml.com)."
]
}
],