diff --git a/RUNNING-DOCS-LOCALLY.md b/RUNNING-DOCS-LOCALLY.md index 0a338306..59c9d8e3 100644 --- a/RUNNING-DOCS-LOCALLY.md +++ b/RUNNING-DOCS-LOCALLY.md @@ -14,9 +14,9 @@ To run these docs locally you'll need: If you want to fully render all documentation locally you need to install the following plugins with `pip install`: -* [glightbox](https://pypi.org/project/mkdocs-glightbox/0.1.0/) -* [multirepo](https://pypi.org/project/mkdocs-multirepo/) -* [redirects](https://pypi.org/project/mkdocs-redirects/) +* [mkdocs-glightbox](https://pypi.org/project/mkdocs-glightbox/0.1.0/) +* [mkdocs-multirepo](https://pypi.org/project/mkdocs-multirepo/) +* [mkdocs-redirects](https://pypi.org/project/mkdocs-redirects/) You also need to sign up to the [Insiders Programme](https://squidfunk.github.io/mkdocs-material/insiders/). diff --git a/docs/apis/data-catalogue-api/intro.md b/docs/apis/data-catalogue-api/intro.md index 9dfefd24..89fc9d42 100644 --- a/docs/apis/data-catalogue-api/intro.md +++ b/docs/apis/data-catalogue-api/intro.md @@ -1,37 +1,25 @@ # Introduction -The Data Catalogue HTTP API allows you to fetch data stored in the Quix -platform. You can use it for exploring the platform, prototyping -applications, or working with stored data in any language with HTTP -capabilities. +The Data Catalogue HTTP API allows you to fetch data stored in the Quix platform. You can use it for exploring the platform, prototyping applications, or working with stored data in any language with HTTP capabilities. -The API is fully described in our [Swagger -documentation](get-swagger.md). Read on for -a guide to using the API, including real-world examples you can execute -from your language of choice, or via the command line using `curl`. +The API is fully described in our [Swagger documentation](get-swagger.md). Read on for a guide to using the API, including real-world examples you can invoke from your language of choice, or using the command line using `curl`. ## Preparation -Before using any of the endpoints, you’ll need to know how to -[authenticate your requests](authenticate.md) and -how to [form a typical request to the -API](request.md). +Before using any of the endpoints, you’ll need to know how to [authenticate your requests](authenticate.md) and how to [form a typical request to the API](request.md). -You’ll also need to have some data stored in the Quix platform for API -use to be meaningful. You can use any Source from our [Code Samples](../../platform/samples/samples.md) to do this using the Quix -portal. +You’ll also need to have some data stored in the Quix platform for API use to be meaningful. You can use any Source from our [Code Samples](../../platform/samples/samples.md) to do this using the Quix portal. ## Further documentation -| | | | -| ------------------------------------------------------------------ | ------------------ | ----------------------------------------- | -| Documentation | Endpoint | Examples | +| Documentation | Endpoint | Examples | +| -------------------------------------------- | ------------------ | ----------------------------------------- | | [Streams, paged](streams-paged.md) | `/streams` | Get all streams in groups of ten per page | | [Streams, filtered](streams-filtered.md) | `/streams` | Get a single stream, by ID | -| | | Get only the streams with LapNumber data | +| | | Get only the streams with LapNumber data | | [Streams & models](streams-models.md) | `/streams/models` | Get stream hierarchy | | [Raw data](raw-data.md) | `/parameters/data` | Get all the `Speed` readings | -| | | Get `Speed` data between timestamps | +| | | Get `Speed` data between timestamps | | [Aggregated data by time](aggregate-time.md) | `/parameters/data` | Downsample or upsample data | | [Aggregated by tags](aggregate-tags.md) | `/parameters/data` | Show average Speed by LapNumber | | [Tag filtering](filter-tags.md) | `/parameters/data` | Get data for just one Lap | diff --git a/docs/apis/streaming-writer-api/intro.md b/docs/apis/streaming-writer-api/intro.md index ebcf1906..777b8fdd 100644 --- a/docs/apis/streaming-writer-api/intro.md +++ b/docs/apis/streaming-writer-api/intro.md @@ -1,14 +1,9 @@ # Introduction -The Streaming Writer API allows you to stream data into the Quix -platform via HTTP endpoints or SignalR. It’s an alternative to using our -C\# and Python client libraries. You can use the Streaming Writer API from any -HTTP-capable language. - -The API is fully documented in our [Swagger -documentation](get-swagger.md). Read on for a -guide to using the API, including real-world examples you can execute -from your language of choice, or via the command line using curl. +The Streaming Writer API allows you to stream data into the Quix platform via HTTP endpoints or SignalR. It’s an alternative to using our C# and Python client libraries. You can use the Streaming Writer API from any HTTP-capable language. + +The API is fully documented in our [Swagger documentation](get-swagger.md). Read on for a guide to using the API, including real-world examples you can invoke +from your language of choice, or using the command line using curl. ## Preparation diff --git a/docs/index.md b/docs/index.md index 85d8580a..fe58da6f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -117,7 +117,7 @@ Read more about the Quix Streams Client Library and APIs. --- - Query historic time-series data in Quix using HTTP interface. + Query historical time-series data in Quix using HTTP interface. [:octicons-arrow-right-24: Learn more](./apis/data-catalogue-api/intro.md) diff --git a/docs/platform/MLOps.md b/docs/platform/MLOps.md index 0979c75f..0e9ff23b 100644 --- a/docs/platform/MLOps.md +++ b/docs/platform/MLOps.md @@ -19,18 +19,18 @@ a seamless journey from concept to production. The key steps are: Any member of any team can quickly access data in the Catalogue without support from software or regulatory teams. -## Develop features in historic data +## Develop features in historical data Use Visualise to discover, segment, label and store significant features in the catalogue. -## Build & train models on historic data +## Build & train models on historical data Use Develop and Deploy to: - Write model code in Python using their favourite IDE. - - Train models on historic data. + - Train models on historical data. - Evaluate results against raw data and results from other models. diff --git a/docs/platform/definitions.md b/docs/platform/definitions.md index 0a09e94e..e38dff66 100644 --- a/docs/platform/definitions.md +++ b/docs/platform/definitions.md @@ -22,7 +22,7 @@ Workspaces are collaborative. Multiple users, including developers, data scienti ## Project -A set of code in Quix Platform that can be edited, compiled, executed, and deployed as one Docker image. Projects in Quix Platform are fully version controlled. You can also tag your code as an easy way to manage releases of your project. +A set of code in Quix Platform that can be edited, compiled, run, and deployed as one Docker image. Projects in Quix Platform are fully version controlled. You can also tag your code as an easy way to manage releases of your project. ## Deployment @@ -153,7 +153,7 @@ A [WebSockets API](../apis/streaming-reader-api/intro.md) used to stream any dat ### Data Catalogue API -An [HTTP API](../apis/data-catalogue-api/intro.md) used to query historic data in the Data Catalogue. Most commonly used for dashboards, analytics and training ML models. Also useful to call historic data when running an ML model, or to call historic data from an external application. +An [HTTP API](../apis/data-catalogue-api/intro.md) used to query historical data in the Data Catalogue. Most commonly used for dashboards, analytics and training ML models. Also useful to call historical data when running an ML model, or to call historical data from an external application. ### Portal API diff --git a/docs/platform/how-to/jupyter-nb.md b/docs/platform/how-to/jupyter-nb.md index ccbe63df..8317aee8 100644 --- a/docs/platform/how-to/jupyter-nb.md +++ b/docs/platform/how-to/jupyter-nb.md @@ -52,7 +52,7 @@ You need to be logged into the platform for this: ![how-to/jupyter-wb/connect-python.png](../../platform/images/how-to/jupyter-wb/connect-python.png) -Copy the Python code to your Jupyter notebook and execute. +Copy the Python code to your Jupyter notebook and run. ![how-to/jupyter-wb/jupyter-results.png](../../platform/images/how-to/jupyter-wb/jupyter-results.png) diff --git a/docs/platform/how-to/webapps/write.md b/docs/platform/how-to/webapps/write.md index 7e46be32..ad8d23fa 100644 --- a/docs/platform/how-to/webapps/write.md +++ b/docs/platform/how-to/webapps/write.md @@ -237,7 +237,7 @@ req.end(); ``` In the preceding example, tags in the event data request are optional. -Tags add context to your data points and help you to execute efficient +Tags add context to your data points and help you to run efficient queries over them on your data like using indexes in traditional databases. diff --git a/docs/platform/tutorials/data-science/data-science.md b/docs/platform/tutorials/data-science/data-science.md index 04d2c9eb..bfb0e044 100644 --- a/docs/platform/tutorials/data-science/data-science.md +++ b/docs/platform/tutorials/data-science/data-science.md @@ -14,7 +14,7 @@ In other words, you will complete all the typical phases of a data science proje - Store the data efficiently - - Train some ML models with historic data + - Train some ML models with historical data - Deploy the ML models into production in real time @@ -128,9 +128,9 @@ You now have a working real time stream of bike data. Now use the OpenWeather ac ## 4. View and store the data -With Quix it's easy to visualize your data in a powerful and flexible way, you can see the real-time data and view historic data. +With Quix it's easy to visualize your data in a powerful and flexible way, you can see the real-time data and view historical data. -At it's heart Quix is a real-time data platform, so if you want to see data-at-rest for a topic, you must turn on data persistence for that topic (You'll do this [below](#historic)). +At it's heart Quix is a real-time data platform, so if you want to see data-at-rest for a topic, you must turn on data persistence for that topic (You'll do this [below](#historical)). ### Real-time @@ -148,9 +148,9 @@ At it's heart Quix is a real-time data platform, so if you want to see data-at-r If you don't see any streams or parameters, just wait a moment or two. The next time data arrives these lists will be automatically populated. -### Historic +### Historical -In order to train a machine learning model we will need to store the data we are ingesting so that we start building a historic dataset. However topics are real time infrastructures, not designed for data storage. To solve this, Quix allows you to send the data going through a topic to an efficient real time database if you need it: +In order to train a machine learning model we will need to store the data we are ingesting so that we start building a historical dataset. However topics are real time infrastructures, not designed for data storage. To solve this, Quix allows you to send the data going through a topic to an efficient real time database if you need it: 1. Navigate to the topics page using the left hand navigation @@ -172,7 +172,7 @@ Follow the along and we'll show you how to get data out of Quix so you can train We mentioned earlier in [Weather real time stream](#3-weather-real-time-stream) that free access to the OpenWeather API only allows us to consume new data every 30 minutes therefore, at this point you will have a limited data set. -You can leave the data consumption process running overnight or for a few days to gather more data, but for the time being there's no problem in continuing with the tutorial with your limited historic data. +You can leave the data consumption process running overnight or for a few days to gather more data, but for the time being there's no problem in continuing with the tutorial with your limited historical data. #### Get the data @@ -204,7 +204,7 @@ You can leave the data consumption process running overnight or for a few days t ### Train the model -At this point, you are generating historic data and know how to query it. You can train your ML models as soon as you've gathered enough data. +At this point, you are generating historical data and know how to query it. You can train your ML models as soon as you've gathered enough data. !!! example "Need help?" @@ -212,7 +212,7 @@ At this point, you are generating historic data and know how to query it. You ca We walk you through the process of getting the code to access the data (as described above), running the code in a Jupyter notebook, training the model and uploading your pickle file to Quix. -However, it would take several weeks to accumulate enough historic data to train a model, so let's continue the tutorial with some pre-trained models we have provided. We've done it using the very same data flow you've just built, and can find the Jupyter notebook code we used [here](https://github.com/quixio/NY-bikes-tutorial/blob/1stversion/notebooks-and-sample-data/04%20-%20Train%20ML%20models.ipynb){target=_blank}. +However, it would take several weeks to accumulate enough historical data to train a model, so let's continue the tutorial with some pre-trained models we have provided. We've done it using the very same data flow you've just built, and can find the Jupyter notebook code we used [here](https://github.com/quixio/NY-bikes-tutorial/blob/1stversion/notebooks-and-sample-data/04%20-%20Train%20ML%20models.ipynb){target=_blank}. ## 6. Run the model diff --git a/docs/platform/tutorials/data-stream-processing/data-stream-processing.md b/docs/platform/tutorials/data-stream-processing/data-stream-processing.md index 4b685f94..890ec9d1 100644 --- a/docs/platform/tutorials/data-stream-processing/data-stream-processing.md +++ b/docs/platform/tutorials/data-stream-processing/data-stream-processing.md @@ -258,6 +258,6 @@ With the microservices for control and input processing deployed along with the ## Thanks -Thanks for following the tutorial, hopefully you learnt something about Quix and had some fun doing it! +Thanks for following the tutorial, hopefully you learned something about Quix and had some fun doing it! If you need any help, got into difficulties or just want to say hi then please join our [Slack community](https://quix.io/slack-invite?_ga=2.132866574.1283274496.1668680959-1575601866.1664365365){target=_blank}. \ No newline at end of file diff --git a/docs/platform/tutorials/eventDetection/crash-detection.md b/docs/platform/tutorials/eventDetection/crash-detection.md index c93bfb51..6904a412 100644 --- a/docs/platform/tutorials/eventDetection/crash-detection.md +++ b/docs/platform/tutorials/eventDetection/crash-detection.md @@ -1,6 +1,6 @@ # 2. Event Detection -Our event detection pipeline is centered around this service, which executes an ML model to detect whether a vehicle has been involved in an accident. +Our event detection pipeline is centered around this service, which runs an ML model to detect whether a vehicle has been involved in an accident. In reality our ML model was trained to detect the difference between a phone being shaken versus just being used normally. You actually don’t have to use an ML model at all! There are various ways this service could have been written, for example, you could detect a change in the speed or use the speed and another parameter to determine if an event has occurred. diff --git a/docs/platform/tutorials/train-and-deploy-ml/conclusion.md b/docs/platform/tutorials/train-and-deploy-ml/conclusion.md new file mode 100644 index 00000000..e359ce56 --- /dev/null +++ b/docs/platform/tutorials/train-and-deploy-ml/conclusion.md @@ -0,0 +1,23 @@ +# Conclusion + +In this tutorial you've learned how to use Quix to generate real-time data. You've also learned how to import that data into Jupyter Notebook using the Quix code generator. You then saw how to deploy your ML model to the Quix Platform, and visualize its output in real time. + +![Data explorer](./images/visualize-result.png) + +The objective of the tutorial was not to create the most accurate model, but to step you through the overall process, and show you some of the useful features of Quix. It shows one possible workflow, where you train your ML model in Jupyter Notebook, and the integration between Quix and Jupyter was demonstrated. It is also possible to train your ML model directly in Quix, on live data, or persisted data using the [replay functionality](../../how-to/replay.md). + +## Next Steps + +Here are some suggested next steps to continue on your Quix learning journey: + +* Visit the [Quix Code Samples GitHub](https://github.com/quixio/quix-samples){target=_blank}. If you decide to build your own connectors and apps, you can contribute something to the Quix Code Samples. Fork our Code Samples repo and submit your code, updates, and ideas. + +* [Sentiment analysis tutorial](../sentiment-analysis/index.md) - In this tutorial you learn how to build a sentiment analysis pipeline, capable of analyzing real-time chat. + +* [Data science tutorial](../data-science/data-science.md) - In this tutorial you use data science to build a real-time bike availability pipeline. + +What will you build? Let us know! We’d love to feature your project or use case in our [newsletter](https://www.quix.io/community/). + +## Getting help + +If you need any assistance, we're here to help in [The Stream](https://join.slack.com/t/stream-processing/shared_invite/zt-13t2qa6ea-9jdiDBXbnE7aHMBOgMt~8g){target=_blank}, our free Slack community. Introduce yourself and then ask any questions in `quix-help`. diff --git a/docs/platform/tutorials/train-and-deploy-ml/connect-python.png b/docs/platform/tutorials/train-and-deploy-ml/connect-python.png deleted file mode 100644 index 193eca09..00000000 Binary files a/docs/platform/tutorials/train-and-deploy-ml/connect-python.png and /dev/null differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/create-data.md b/docs/platform/tutorials/train-and-deploy-ml/create-data.md new file mode 100644 index 00000000..1f87b866 --- /dev/null +++ b/docs/platform/tutorials/train-and-deploy-ml/create-data.md @@ -0,0 +1,43 @@ +# Get your data + +In this part of the tutorial you learn how to obtain some real-time data to work with in the rest of the tutorial. You use a demo data source that generates Formula 1 race car data from a computer game. You use this data as the basis to build a ML model to predict braking patterns. + +## Create a persisted topic + +To make things a little easier, first create a **persisted topic** to receive the generated data. + +1. Login to the Quix Portal. + +2. Click `Topics` on the left-hand menu. + +3. Click the `Add new`, which is located top right. + +4. Enter a topic name of `f1-data`. + +5. Leave other values in the `Create new topic` dialog at their defaults. + +6. Click `Done`. Now wait while the topic is created for you. + +7. Once the topic has been created, click the persistence slider button to ensure your data is persisted, as shown in the following screenshot: + + ![Enable topic persistence](./images/enable-topic-persistence.png) + +## Generate data from the demo data source + +Now generate the actual data for use later in the tutorial by completing the following steps: + +1. Click `Code Samples` on the left-hand sidebar. + +2. Find the `Demo Data` source. This service streams F1 Telemetry data into a topic from a recorded game session. + +3. Click the `Setup & deploy` button in the `Demo Data` panel. + +4. You can leave `Name` as the default value. + +5. Make sure `Topic` is set to `f1-data` and then click `Deploy`. + +Once this service is deployed it will run as a [job](../../definitions.md#job) and generate real-time data to the `f1-data`, and this will be persisted. + +This data is retrieved later in this tutorial using Python code that uses the [Data Catalogue API](../../../apis/data-catalogue-api/intro.md), generated for you by Quix. + +[Import data into Jupyter Notebook :material-arrow-right-circle:{ align=right }](./import-data.md) diff --git a/docs/platform/tutorials/train-and-deploy-ml/deploy-ml.md b/docs/platform/tutorials/train-and-deploy-ml/deploy-ml.md index 34aea613..78b2acaf 100644 --- a/docs/platform/tutorials/train-and-deploy-ml/deploy-ml.md +++ b/docs/platform/tutorials/train-and-deploy-ml/deploy-ml.md @@ -1,108 +1,76 @@ -# Run ML model in realtime environment +# Deploy your ML model -In this article, you will learn how to use pickle file trained on -historic data in a realtime environment. +In this part of the tutorial, you deploy the Pickle file [containing your ML model](./train-ml.md) to the Quix Platform. Your ML code then uses this model to predict braking in real time. -Ensure you have completed the previous stage first, if not find it [here](train-ml-model.md). +![What you'll build](./images/run-live.png) -## Watch -If you prefer watching instead of reading, we've recorded a short video: - +You'll also use the Quix Data Explorer to visualize the braking prediction in real time. -## Why this is important +## Create a transform -With the Quix platform, you can run and deploy ML models to the leading -edge reacting to data coming from the source with milliseconds latency. +Ensure you are logged into the Quix Portal, then follow these steps to create a transform that uses your model: -## End result +1. Click `Code Samples` in the left-hand sidebar. -At the end of this article, we will end up with a **live model** using the **pickle file** from [How to train ML model](train-ml-model.md) to process live data on the edge. +2. Filter the Code Samples by selecting `Python` under `LANGUAGES` and `Transformation` under `PIPELINE STAGE`. -![What you'll build](run-live.png) +3. Locate the `Event Detection` item. -## Preparation +4. Click `Preview code` in the `Event Detection` panel. You can browse the files to ensure you have the correct sample. -You’ll need to complete the [How to train ML model](train-ml-model.md) article to get pickle file with trained model logic. +5. Click `Edit code`. -## Run the model +6. Change the name to `Prediction Model`. -Now let's run the model you created in the previous article. If you have your own model and already know how to run the Python to execute it then these steps might also be useful for you. +7. Ensure the input is `f1-data`. -Ensure you are logged into the Quix Portal +8. Leave output as `hard-braking` (its default value). -1. Navigate to the `Code Samples` - -2. Filter the Code Samples by selecting `Python` under languages and `Transformation` under pipeline stage - -3. Select the `Event Detection` item +9. Click `Save as Project`. The code is now saved to your workspace. !!! tip - - If you can't see `Event Detection` you can also use search to find it - !!! info - - Usually, after clicking on the `Event Detection` you can look at the code and the readme to ensure it's the correct sample for your needs. - -4. Now click Edit code - -5. Change the name to "Prediction Model" - -6. Ensure the input is "f1-data" + You can see a list of projects at any time by clicking `Projects` in the left-hand navigation. -7. Ensure the output is "brake-prediction" +## Upload the model - !!! info - - The platform will automatically create any topics that don't already exist - -!!! success - - The code from the Code Samples sample is now saved to your workspace. - - You can edit and run the code from here or clone it to your computer and work locally. - -### Upload the model +Now you need to upload your ML model and edit your transform code to run the model. -Now you need to upload the ML model created in the previous article and edit this code to run the model. +1. Click on `Projects` and select `Prediction Model` to display your project code. -1. Click the upload file icon at the top of the file list +2. Click the `Upload File` icon at the top of the file list, as shown in the following screenshot: -2. Find the file saved in the previous article. + ![Upload file to project](./images/upload-file-to-project.png) - !!! hint - - It's called 'decision_tree_5_depth.sav' and should be in "C:\Users\[USER]\" on Windows +3. Find the Pickle file containing your ML model. It's named `decision_tree_5_depth.sav` and is in the same directory as your Jupyter Notebook files. !!! warning - - When you click off the file e.g. onto quix_function.py, the editor might prompt you to save the .sav file. - - Click "Do not commit" + + When you click off the file, for example onto `quix_function.py`, the editor may prompt you to save the `.sav` file. **Click `Discard changes`**. -3. Click quix_function.py in the file list (remember do not commit changes to the model file) +4. Click `quix_function.py` in the file list (remember, **do not** commit changes to the model file). -### Modify the code +## Modify the transform code -1. Add the following statements to import the required libraries +You need to modify your transform code to work with the ML model in the Pickle file. In the file `quix_function.py`: - ``` py +1. Add the following statements to import the required libraries: + + ``` python import pickle import math ``` -2. In the `__init__` function add the following lines to load the model +2. In the `__init__` function add the following lines to load the model: - ``` py + ``` python ## Import ML model from file self.model = pickle.load(open('decision_tree_5_depth.sav', 'rb')) ``` -3. Under the `__init__` function add the following new function - - This will pre-process the data, a necessary step before passing it to the model. +3. Under the `__init__` function add the following new function to preprocess the data: - ``` py + ``` python ## To get the correct output, we preprocess data before we feed them to the trained model def preprocess(self, df): @@ -126,11 +94,11 @@ Now you need to upload the ML model created in the previous article and edit thi return df ``` -4. Delete the `on_pandas_frame_handler` function and paste this code in it's place. +4. Replace the `on_dataframe_handler` function with the following code: - ``` py + ``` python # Callback triggered for each new parameter data. - def on_pandas_frame_handler(self, df: pd.DataFrame): + def on_dataframe_handler(self, stream_consumer: qx.StreamConsumer, df: pd.DataFrame): # if no speed column, skip this record if not "Speed" in df.columns: @@ -144,79 +112,75 @@ Now you need to upload the ML model created in the previous article and edit thi features = ["Motion_WorldPositionX_cos", "Motion_WorldPositionX_sin", "Steer", "Speed", "Gear"] X = df[features] - # Lets shift data into the future by 5 seconds. (Note that time column is in nanoseconds). - output_df["time"] = df["time"].apply(lambda x: int(x) + int((5 * 1000 * 1000 * 1000))) + # Shift data into the future by 5 seconds. (Note that time column is in nanoseconds). + output_df["timestamp"] = df["timestamp"].apply(lambda x: int(x) + int((5 * 1000 * 1000 * 1000))) output_df["brake-prediction"] = self.model.predict(X) print("Prediction") print(output_df["brake-prediction"]) # Merge the original brake value into the output data frame - output_df = pd.concat([df[["time", "Brake"]], output_df]).sort_values("time", ascending=True) - + output_df = pd.concat([df[["timestamp", "Brake"]], output_df]).sort_values("timestamp", ascending=True) - self.output_stream.parameters.buffer.write(output_df) # Send filtered data to output topic + self.producer_stream.timeseries.buffer.publish(output_df) # Send filtered data to output topic ``` -### Update requirements +5. Commit your changes to the `quix_function.py` file by clicking `Commit` or using ++ctrl+s++ or ++command+s++. -Click on the requirements.txt file and add `sklearn` on a new line +## Update the `requirements.txt` file +Click on the `requirements.txt` file and add `scikit-learn` on a new line. Commit your change. !!! success You have edited the code to load and run the model. +## Run the code -### Run the code - -The fastest way to run the code is to click Run in the top right hand corner. +The simplest way to run the code is to click the `Run` button in the top right-hand corner. This will install any dependencies into a sandboxed environment and then run the code. -In the output console you will see the result of the prediction. +The output console displays the result of the prediction. -In the next few steps you will deploy the code and then see a visualization of the output. +In the next few steps you deploy the code and then see a visualization of the output. ## Deploy -1. Click Stop if you haven't already done so. - -2. To deploy the code, click Deploy. +To deploy your transform: -3. On the dialog that appears click Deploy. +1. Click `Stop` if you haven't already done so. -Once the code has been built, deployed it will be started automatically. +2. To deploy the code, click `Deploy`. -!!! success +3. In the dialog that appears click `Deploy`. - Your code is now running in a fully production ready ecosystem. +Once the code has built, it is started automatically. -## Visualize whats happening +!!! success -To see the output of your model in real time you will use the Data explorer. + Your code is now running in a production-ready ecosystem. -1. Click the Data explorer button on the left hand menu. +## Visualize your data -2. If it's not already selected click the Live data tab at the top +To see the output of your model in real time you can use the Data Explorer. To use the Data Explorer: -3. Ensure the `brake-prediciton` topic is selected +1. Click the Data Explorer button on the left-hand sidebar. -4. Select a stream (you should only have one) +2. If it's not already selected click the `Live` data tab at the top. -5. Select `brake-prediction` and `brake` from the parameters list +3. Ensure the `hard-braking` topic is selected from the `Select a topic` drop-down list. -!!! success +4. Select a stream (you should only have one). - You should now see a graphical output for the prediction being output by the model as well as the actual brake value +5. Select `brake-prediction` and `brake` from the parameters list. - ![Data explorer](visualize-result.png) +You now see a graphical output for the prediction being output by the model as well as the actual brake value, as illustrated in the following screenshot +![Data explorer](./images/visualize-result.png) !!! note - Don't forget this exercise was to deploy an ML model in the Quix platform. - - We didn't promise to train a good model. So the prediciton may not always match the actual brake value. - + Don't forget the purpose of this tutorial was to show you how to deploy an ML model in Quix Platform, rather than to train an accurate model. So the prediction may not always match the actual brake value. +[Conclusion and next steps :material-arrow-right-circle:{ align=right }](conclusion.md) diff --git a/docs/platform/tutorials/train-and-deploy-ml/experiments-comparison.png b/docs/platform/tutorials/train-and-deploy-ml/experiments-comparison.png deleted file mode 100644 index 31b10057..00000000 Binary files a/docs/platform/tutorials/train-and-deploy-ml/experiments-comparison.png and /dev/null differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/experiments.png b/docs/platform/tutorials/train-and-deploy-ml/experiments.png deleted file mode 100644 index e4e2f0c1..00000000 Binary files a/docs/platform/tutorials/train-and-deploy-ml/experiments.png and /dev/null differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/images/aggregation-slider-off.png b/docs/platform/tutorials/train-and-deploy-ml/images/aggregation-slider-off.png new file mode 100644 index 00000000..e641a763 Binary files /dev/null and b/docs/platform/tutorials/train-and-deploy-ml/images/aggregation-slider-off.png differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/brake-shifted.png b/docs/platform/tutorials/train-and-deploy-ml/images/brake-shifted.png similarity index 100% rename from docs/platform/tutorials/train-and-deploy-ml/brake-shifted.png rename to docs/platform/tutorials/train-and-deploy-ml/images/brake-shifted.png diff --git a/docs/platform/tutorials/train-and-deploy-ml/prediction.png b/docs/platform/tutorials/train-and-deploy-ml/images/braking-prediction.png similarity index 100% rename from docs/platform/tutorials/train-and-deploy-ml/prediction.png rename to docs/platform/tutorials/train-and-deploy-ml/images/braking-prediction.png diff --git a/docs/platform/tutorials/train-and-deploy-ml/images/compare-experiments.png b/docs/platform/tutorials/train-and-deploy-ml/images/compare-experiments.png new file mode 100644 index 00000000..a3dc3e79 Binary files /dev/null and b/docs/platform/tutorials/train-and-deploy-ml/images/compare-experiments.png differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/images/connect-python.png b/docs/platform/tutorials/train-and-deploy-ml/images/connect-python.png new file mode 100644 index 00000000..839ff101 Binary files /dev/null and b/docs/platform/tutorials/train-and-deploy-ml/images/connect-python.png differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/images/enable-topic-persistence.png b/docs/platform/tutorials/train-and-deploy-ml/images/enable-topic-persistence.png new file mode 100644 index 00000000..abb61cd2 Binary files /dev/null and b/docs/platform/tutorials/train-and-deploy-ml/images/enable-topic-persistence.png differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/images/jupyter-python-code.png b/docs/platform/tutorials/train-and-deploy-ml/images/jupyter-python-code.png new file mode 100644 index 00000000..7fd4294d Binary files /dev/null and b/docs/platform/tutorials/train-and-deploy-ml/images/jupyter-python-code.png differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/images/jupyter-python3-selection.png b/docs/platform/tutorials/train-and-deploy-ml/images/jupyter-python3-selection.png new file mode 100644 index 00000000..d410ca62 Binary files /dev/null and b/docs/platform/tutorials/train-and-deploy-ml/images/jupyter-python3-selection.png differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/images/jupyter-results.png b/docs/platform/tutorials/train-and-deploy-ml/images/jupyter-results.png new file mode 100644 index 00000000..3bfb3f86 Binary files /dev/null and b/docs/platform/tutorials/train-and-deploy-ml/images/jupyter-results.png differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/run-live.png b/docs/platform/tutorials/train-and-deploy-ml/images/run-live.png similarity index 100% rename from docs/platform/tutorials/train-and-deploy-ml/run-live.png rename to docs/platform/tutorials/train-and-deploy-ml/images/run-live.png diff --git a/docs/platform/tutorials/train-and-deploy-ml/images/select-experiments.png b/docs/platform/tutorials/train-and-deploy-ml/images/select-experiments.png new file mode 100644 index 00000000..c0d89533 Binary files /dev/null and b/docs/platform/tutorials/train-and-deploy-ml/images/select-experiments.png differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/images/time-slider.png b/docs/platform/tutorials/train-and-deploy-ml/images/time-slider.png new file mode 100644 index 00000000..89979061 Binary files /dev/null and b/docs/platform/tutorials/train-and-deploy-ml/images/time-slider.png differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/train.png b/docs/platform/tutorials/train-and-deploy-ml/images/train.png similarity index 100% rename from docs/platform/tutorials/train-and-deploy-ml/train.png rename to docs/platform/tutorials/train-and-deploy-ml/images/train.png diff --git a/docs/platform/tutorials/train-and-deploy-ml/images/upload-file-to-project.png b/docs/platform/tutorials/train-and-deploy-ml/images/upload-file-to-project.png new file mode 100644 index 00000000..2ed42004 Binary files /dev/null and b/docs/platform/tutorials/train-and-deploy-ml/images/upload-file-to-project.png differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/images/visualize-result.png b/docs/platform/tutorials/train-and-deploy-ml/images/visualize-result.png new file mode 100644 index 00000000..1c40257b Binary files /dev/null and b/docs/platform/tutorials/train-and-deploy-ml/images/visualize-result.png differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/import-data.md b/docs/platform/tutorials/train-and-deploy-ml/import-data.md new file mode 100644 index 00000000..edc11b62 --- /dev/null +++ b/docs/platform/tutorials/train-and-deploy-ml/import-data.md @@ -0,0 +1,82 @@ +# Import data into Jupyter Notebook + +From a Jupyter Notebook, you retrieve the data that was generated in Quix in the [previous part](./create-data.md), and which was persisted into the [Quix Data Catalogue](../../../apis/data-catalogue-api/intro.md). + +## Run Jupyter Notebook + +Make sure you have reviewed the [prerequisites](./index.md#prerequisites), and have Jupyter Notebook already installed. + +1. Now run Jupyter Notebook by entering the following command into your terminal: + +``` shell +jupyter notebook +``` + +2. Navigate to `http://localhost:8888` with your web browser. + +3. Select `New` and then `Python 3 (ipykernel)` from the menu, as shown here: + + ![Jupyter Notebook Python 3 selection](./images/jupyter-python3-selection.png) + +!!! tip + + If you don’t see **Python 3** in the `New` menu, run the following commands in your Python environment: + + ``` python + pip install ipykernel + python -m ipykernel install --user + ``` + +## Obtain the training data + +In the [previous part](./create-data.md) you generated some real-time data in the Quix Portal. + +The Quix Portal has a code generator that can generate code to connect your Jupyter Notebook to Quix. To use the code generator to retrieve data from Quix and import it into your Notebook: + +1. Make sure you are logged into the Quix Portal. + +2. Select your Workspace (you probably only have one currently). + +3. Click `Data Explorer` in the left-hand sidebar. + +4. Click `Add Query` to add a query to visualize some data. + +5. Select the F1 Game stream in the `Add Query` wizard, and click `Next`. + +6. In the `Select parameters and events` step of the wizard, select the `Brake`, `WorldPositionX`, `Steer`, `Speed`, and `Gear` parameters. + +7. Click `Done`. + +8. Turn off aggregation using the slider button, as illustrated in the following screenshot: + + ![Turn aggregation slider button off](./images/aggregation-slider-off.png) + +9. Use the time slider to select about ten minutes of data, as shown in the following screenshot: + + ![Time slider](./images/time-slider.png) + + This is a precaution, as if your try to import too much data into Jupyter Notebook you may get a `IOPub data rate exceeded.` error. Alternatively, you can increase your capacity by setting the config variable `--NotebookApp.iopub_data_rate_limit` in Jupyter. + +10. Select the `Code` tab. + +11. Select `Python` from the the `LANGUAGE` drop-down. + + ![Generated code to retrieve data](./images/connect-python.png) + +12. Copy all the code in the `Code` tab to your clipboard. + +13. Paste the Python code from your clipboard to your Jupyter Notebook: + + ![Python code in Jupyter Notebook](./images/jupyter-python-code.png) + +14. Click `Run`. + + The code prints out the pandas data frame containing the retrieved data, as shown in the following screenshot: + + ![Results from data fetch](./images/jupyter-results.png) + +!!! tip + + If you want to use this generated code for more than 30 days, replace the temporary token with a **PAT token**. See [authenticate your requests](../../../apis/data-catalogue-api/authenticate.md) for how to do that. + +[Train your ML model :material-arrow-right-circle:{ align=right }](./train-ml.md) \ No newline at end of file diff --git a/docs/platform/tutorials/train-and-deploy-ml/index.md b/docs/platform/tutorials/train-and-deploy-ml/index.md new file mode 100644 index 00000000..684839b0 --- /dev/null +++ b/docs/platform/tutorials/train-and-deploy-ml/index.md @@ -0,0 +1,35 @@ +# Real-time Machine Learning (ML) pipelines + +In this tutorial, you learn how to extract data from Quix to train your Machine Learning (ML) model in Jupyter Notebook. You then learn how to deploy this model to the Quix Platform, so ML can be used to process your data in real time. + +## Video + +If you'd like to watch a video before stepping through this tutorial, you can view the following video on the [Quix YouTube channel](https://www.youtube.com/@quix1570){target=_blank}: + + + +## Prerequisites + +To complete this tutorial you need the following: + +* A free [Quix account](https://portal.platform.quix.ai/self-sign-up/){target=_blank}. +* [Python3](https://www.python.org/downloads/){target=_blank} installed. +* [Jupyter Notebook](https://jupyter.org/){target=_blank} to train your model and load data for training. See [How to work with Jupyter Notebook](../../how-to/jupyter-nb.md) for further information. + +There are some other libraries that need to be installed, but instructions on how to do this are given when required. + +## The parts of the tutorial + +This tutorial is divided up into several parts, to make the learning experience more manageable. The parts are summarized here: + +1. **Create your data** - You learn how to create some data to work with in the rest of the tutorial. + +2. **Import data** - You learn how Quix makes it easy to import your data into Jupyter Notebook, by providing you with ready-to-use code. + +3. **Train your ML model** - You learn how to train an ML model. For this tutorial, this is done in Jupyter, but could also be done in Quix. + +4. **Deploy your ML model** - You learn how to deploy your ML model to the Quix Platform. + +5. **Summary** - In this [concluding](conclusion.md) part you are presented with a summary of the work you have completed, and also some next steps for more advanced learning about the Quix Platform. + +[Get some data :material-arrow-right-circle:{ align=right }](./create-data.md) diff --git a/docs/platform/tutorials/train-and-deploy-ml/jupyter-results.png b/docs/platform/tutorials/train-and-deploy-ml/jupyter-results.png deleted file mode 100644 index f7d95dad..00000000 Binary files a/docs/platform/tutorials/train-and-deploy-ml/jupyter-results.png and /dev/null differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/train-ml-model.md b/docs/platform/tutorials/train-and-deploy-ml/train-ml-model.md deleted file mode 100644 index ca5f90b7..00000000 --- a/docs/platform/tutorials/train-and-deploy-ml/train-ml-model.md +++ /dev/null @@ -1,301 +0,0 @@ -# How to train an ML model - -In this article, you will learn how to manage ML model training with -Quix. In this example, we will train a model to predict car braking on a -racing circuit 5 seconds ahead of time. - -## Watch -If you prefer watching instead of reading, we've recorded a short video: - - -## Why this is important - -With the Quix platform, you can leverage historic data to train your -model to react to data coming from source with milliseconds latency. - -## End result - -At the end of this article, we will end up with a **pickle file** -trained on historic data. - -![](train.png) - -## Preparation - -You will need Python3 installed. - -You’ll need some data stored in the Quix platform. You can use any of -our Data Sources available in the Code Samples, or just follow the -onboarding process when you [sign-up to Quix](https://portal.platform.quix.ai/self-sign-up/){target=_blank} - -!!! tip - - If in doubt, login to the Quix Portal, navigate to the `Code Samples` and deploy `Demo Data - Source`. - - This will provide you with some real-time data for your experiments. - -You’ll also need a Jupyter notebook environment to run your experiments -and load data for training. Please use ["How to work with Jupyter notebook"](../../how-to/jupyter-nb.md). - -### Install required libraries - -``` shell -python3 -m pip install seaborn -python3 -m pip install sklearn -python3 -m pip install mlflow -python3 -m pip install matplotlib -``` - -!!! note - - If you get an 'Access Denied' error installing mlflow try adding '--user' to the install command or run the installer from an Anaconda Powershell Prompt (with --user) - -!!! tip - - If you don’t see **Python3** kernel in your Jupyter notebook, execute the following commands in your python environment: - ``` python - python3 -m pip install ipykernel - python3 -m ipykernel install --user - ``` - -### Necessary imports - -To execute all code blocks below, you need to start with importing these -libraries. Add this code to the top of you Jupyter notebook. - -``` python -import math -import matplotlib.pyplot as plt -import mlflow -import numpy as np -import pandas as pd -import pickle -import seaborn as sns - -from sklearn import tree -from sklearn.model_selection import KFold -from sklearn.metrics import confusion_matrix, accuracy_score -from sklearn.tree import DecisionTreeClassifier -``` - -## Training ML model - -### Getting training data - -The Quix web application has a python code generator to help you connect your Jupyter notebook with Quix. - -You need to be logged into the platform for this: - -1. Select workspace (you likley only have one) - -2. Go to the Data Explorer - -3. Add a query to visualize some data. Select parameters, events, aggregation and time range - - !!! note - Select `Brake`, `Motion_WorldPositionX`, `Steer`, `Speed`, `Gear` parameters and turn off aggregation! - -4. Select the **Code** tab - -5. Ensure **Python** is the selected language - -![](connect-python.png) - -Copy the Python code to your Jupyter notebook and execute. - -![](jupyter-results.png) - -!!! tip - - If you want to use this generated code for a long time, replace the temporary token with a **PAT token**. See [authenticate your requests](../../../apis/data-catalogue-api/authenticate.md) for how to do that. - -### Preprocessing of features - -We will prepare data for training by applying some transformation on the downloaded data. - -Execute this in your notebook: - -``` python -## Convert motion to continuous values -df["Motion_WorldPositionX_sin"] = df["Motion_WorldPositionX"].map(lambda x: math.sin(x)) -df["Motion_WorldPositionX_cos"] = df["Motion_WorldPositionX"].map(lambda x: math.cos(x)) -``` - -#### Preprocessing of label - -Here we simplify braking to a boolean value. - -``` python -## Conversion of label -df["Brake_bool"] = df["Brake"].map(lambda x: round(x)) -``` - -### Generate advanced brake signal for training - -Now we need to shift breaking 5 seconds ahead to train the model to predict breaking 5 seconds ahead. - -``` python -## Offset dataset and trim it -NUM_PERIODS = -round(5e9/53852065.77281786) - -df["Brake_shifted_5s"] = df["Brake_bool"].shift(periods=NUM_PERIODS) -df = df.dropna(axis='rows') -``` - -Lets review it in plot: - -``` python -plt.figure(figsize=(15, 8)) -plt.plot(df["Brake_shifted_5s"]) -plt.plot(df["Brake_bool"]) -plt.legend(['Shifted', 'Unshifted']) -``` - -![](brake-shifted.png) - -### Fit, predict and score a model - -Calculate class weighting in case we gain any accuracy by performing -class balancing. - -``` python -Y = df["Brake_shifted_5s"] - -cw = {} -for val in set(Y): - cw[val] = np.sum(Y != val) - -print(cw) -``` - -#### Experiment - -In the following code snippet we are executing an experiment using -**MLflow**. Notice in last 3 lines that each experiment is logging -**MLflow** metrics for experiments comparison later. - -``` python -model_accuracy = pd.DataFrame(columns=[ - 'Baseline Training Accuracy', - 'Model Training Accuracy', - 'Baseline Testing Accuracy', - 'Model Testing Accuracy', -]) - -kfold = KFold(5, shuffle=True, random_state=1) - -with mlflow.start_run(): - class_weight = None - max_depth = 5 - features = ["Motion_WorldPositionX_cos", "Motion_WorldPositionX_sin", "Steer", "Speed", "Gear"] - - mlflow.log_param("class_weight", class_weight) - mlflow.log_param("max_depth", max_depth) - mlflow.log_param("features", features) - mlflow.log_param("model_type", "DecisionTreeClassifier") - - X = df[features] - decision_tree = DecisionTreeClassifier(class_weight=class_weight, max_depth=max_depth) - - for train, test in kfold.split(X): - X_train = X.iloc[train] - Y_train = Y.iloc[train] - X_test = X.iloc[test] - Y_test = Y.iloc[test] - - # Train model - decision_tree.fit(X_train, Y_train) - Y_pred = decision_tree.predict(X_test) - - # Assess accuracy - train_accuracy = round(decision_tree.score(X_train, Y_train) * 100, 2) - test_accuracy = round(decision_tree.score(X_test, Y_test) * 100, 2) - - Y_baseline_zeros = np.zeros(Y_train.shape) - baseline_train_accuracy = round(accuracy_score(Y_train, Y_baseline_zeros) * 100, 2) - Y_baseline_zeros = np.zeros(Y_test.shape) - baseline_test_accuracy = round(accuracy_score(Y_test, Y_baseline_zeros) * 100, 2) - - model_accuracy = model_accuracy.append({ - "Baseline Training Accuracy": baseline_train_accuracy, - "Model Training Accuracy": train_accuracy, - "Baseline Testing Accuracy": baseline_test_accuracy, - "Model Testing Accuracy": test_accuracy - }, ignore_index=True) - - mlflow.log_metric("train_accuracy", model_accuracy["Model Training Accuracy"].mean()) - mlflow.log_metric("test_accuracy", model_accuracy["Model Testing Accuracy"].mean()) - mlflow.log_metric("fit_quality", 1/abs(model_accuracy["Model Training Accuracy"].mean() - model_accuracy["Model Testing Accuracy"].mean())) -``` - -We review experiment model accuracy: - -``` python -model_accuracy -``` - -| | | | | | -| ----- | -------------------------- | ----------------------- | ------------------------- | ---------------------- | -| Depth | Baseline Training Accuracy | Model Training Accuracy | Baseline Testing Accuracy | Model Testing Accuracy | -| 0 | 88.97 | 97.93 | 86.49 | 86.49 | -| 1 | 87.59 | 97.24 | 91.89 | 83.78 | -| 2 | 89.04 | 96.58 | 86.11 | 88.89 | -| 3 | 88.36 | 97.95 | 88.89 | 83.33 | -| 4 | 88.36 | 97.95 | 88.89 | 80.56 | - -Table with model accuracy preview - -#### Prediction preview - -Let’s plot actual versus predicted braking using a trained model: - -``` python -f, (ax1, ax2) = plt.subplots(2, 1, sharey=True, figsize=(50,8)) -ax1.plot(Y) -ax1.plot(X["Speed"]/X["Speed"].max()) - -ax2.plot(decision_tree.predict(X)) -ax2.plot(X["Speed"]/X["Speed"].max()) -``` - -![](prediction.png) - -### Saving model - -When you are confident with the results, save the model into a file. - -``` python -pickle.dump(decision_tree, open('./decision_tree_5_depth.sav', 'wb')) -``` - -!!! tip - - Pickle file will be located in folder where jupyter notebook command was executed - -## MLflow - -To help you with experiments management, you can review experiments in -MLflow. - -!!! warning - - MLflow works only on MacOS, Linux or Windows linux subsystem (WSL). - -!!! tip - - To have some meaningful data, run the experiment with 3 different `max_depth` parameter. - -Let’s leave Jupyter notebook for now and go back to command line and run -MLflow server: - -``` python -mlflow ui -``` - -Select experiments to compare: - -![](experiments.png) - -Plot metrics from experiments: - -![](experiments-comparison.png) \ No newline at end of file diff --git a/docs/platform/tutorials/train-and-deploy-ml/train-ml.md b/docs/platform/tutorials/train-and-deploy-ml/train-ml.md new file mode 100644 index 00000000..e4667803 --- /dev/null +++ b/docs/platform/tutorials/train-and-deploy-ml/train-ml.md @@ -0,0 +1,256 @@ +# Train your ML model + +In this part you learn how to train an ML model in Jupyter Notebook, using the data you imported in the [previous part](./import-data.md). + +You write code to train your model on this data. You save your model to a Pickle file, which you then deploy in the [next part](./deploy-ml.md) of this tutorial. + +![](./images/train.png) + +This is just one approach that you might use if you are already familiar with Jupyter Notebook. You might also train your ML model directly in Quix using live data, or data played back using the [replay feature](../../how-to/replay.md). + +## Install the required libraries + +You now need to install some Python libraries on your system. This is the system where you are running Jupyter Notebook. If you do not have Jupyter Notebook installed, please refer to the [prerequisites](./index.md#prerequisites) for this tutorial. + +The following libraries are required to create the ML model and visualize your data. To install the libraries enter the following commands into a terminal on the system where you are running Jupyter Notebook: + +``` shell +pip install seaborn +pip install scikit-learn +pip install mlflow +pip install matplotlib +``` + +!!! note + + If you get an 'Access Denied' error when installing `mlflow`, try adding `--user` to the install command, for example `python3 -m pip install mlflow --user`. Alternatively, run the installer from an Anaconda Powershell Prompt with `--user`. + +## Include required imports + +To run the code in this tutorial, you need to import the installed libraries. + +1. Add the following code to your Jupyter Notebook: + + ``` python + import math + import matplotlib.pyplot as plt + import mlflow + import numpy as np + import pandas as pd + import pickle + import seaborn as sns + + from sklearn import tree + from sklearn.model_selection import KFold + from sklearn.metrics import confusion_matrix, accuracy_score + from sklearn.tree import DecisionTreeClassifier + ``` + +2. Click `Run` and ensure there are no errors. If you receive any errors, make sure you have installed the required libraries. + +You have now installed all the required libraries and tested their presence using the import statements. + +## Preprocessing of parameters + +You now need to prepare data for training by applying some transformations to the retrieved data. In this case you convert the world X coordinates into continuous values. + +Run the following code in your Jupyter Notebook: + +``` python +## Convert motion to continuous values +df["Motion_WorldPositionX_sin"] = df["Motion_WorldPositionX"].map(lambda x: math.sin(x)) +df["Motion_WorldPositionX_cos"] = df["Motion_WorldPositionX"].map(lambda x: math.cos(x)) +``` + +## Convert the brake values to boolean + +Braking values are fully off `0`, or fully on `1`, or somewhere in between for partial braking. + +You can convert the braking value to a boolean value, using the Python `round()` function. Run the following code: + +``` python +## Conversion of label +df["Brake_bool"] = df["Brake"].map(lambda x: round(x)) +``` + +The lambda function applies rounding to all braking values in the data frame. + +## Generate advanced brake signal for training + +To train the model to predict braking 5 seconds ahead, you need to shift the braking values to 5 seconds ahead. Run the following code: + +``` python +## Offset dataset and trim it +NUM_PERIODS = -round(5e9/53852065.77281786) + +df["Brake_shifted_5s"] = df["Brake_bool"].shift(periods=NUM_PERIODS) +df = df.dropna(axis='rows') # clean out null values +``` + +To plot the resulting data, run the following code in your Jupyter Notebook: + +``` python +plt.figure(figsize=(15, 8)) +plt.plot(df["Brake_shifted_5s"]) +plt.plot(df["Brake_bool"]) +plt.legend(['Shifted', 'Unshifted']) +``` + +The resulting plot is as follows: + +![](./images/brake-shifted.png) + +You can see from the plot that the blue data (shifted) has been shifted 5 seconds ahead of the orange data (unshifted). + +## Fit, predict, and score a model + +You now calculate class weighting to gain accuracy by performing class balancing: + +``` python +Y = df["Brake_shifted_5s"] + +cw = {} +for val in set(Y): + cw[val] = np.sum(Y != val) + +print(cw) +``` + +This prints a value such as: + +``` +{0.0: 2859, 1.0: 19944} +``` + +## Training the model + +In the following code snippet you run an experiment using **MLflow**. Notice in last three lines that each experiment is logging **MLflow** metrics for comparison purposes. Run the following code in your Jupyter Notebook: + +``` python +model_accuracy = pd.DataFrame(columns=[ + 'Baseline Training Accuracy', + 'Model Training Accuracy', + 'Baseline Testing Accuracy', + 'Model Testing Accuracy', +]) + +kfold = KFold(5, shuffle=True, random_state=1) + +with mlflow.start_run(): + class_weight = None + max_depth = 5 + features = ["Motion_WorldPositionX_cos", "Motion_WorldPositionX_sin", "Steer", "Speed", "Gear"] + + mlflow.log_param("class_weight", class_weight) + mlflow.log_param("max_depth", max_depth) + mlflow.log_param("features", features) + mlflow.log_param("model_type", "DecisionTreeClassifier") + + X = df[features] + decision_tree = DecisionTreeClassifier(class_weight=class_weight, max_depth=max_depth) + + for train, test in kfold.split(X): + X_train = X.iloc[train] + Y_train = Y.iloc[train] + X_test = X.iloc[test] + Y_test = Y.iloc[test] + + # Train model + decision_tree.fit(X_train, Y_train) + Y_pred = decision_tree.predict(X_test) + + # Assess accuracy + train_accuracy = round(decision_tree.score(X_train, Y_train) * 100, 2) + test_accuracy = round(decision_tree.score(X_test, Y_test) * 100, 2) + + Y_baseline_zeros = np.zeros(Y_train.shape) + baseline_train_accuracy = round(accuracy_score(Y_train, Y_baseline_zeros) * 100, 2) + Y_baseline_zeros = np.zeros(Y_test.shape) + baseline_test_accuracy = round(accuracy_score(Y_test, Y_baseline_zeros) * 100, 2) + + model_accuracy_i = pd.DataFrame({ + "Baseline Training Accuracy": [baseline_train_accuracy], + "Model Training Accuracy": [train_accuracy], + "Baseline Testing Accuracy": [baseline_test_accuracy], + "Model Testing Accuracy": [test_accuracy]}) + model_accuracy = pd.concat([model_accuracy, model_accuracy_i]).reset_index(drop=True) + + mlflow.log_metric("train_accuracy", model_accuracy["Model Training Accuracy"].mean()) + mlflow.log_metric("test_accuracy", model_accuracy["Model Testing Accuracy"].mean()) + mlflow.log_metric("fit_quality", 1/abs(model_accuracy["Model Training Accuracy"].mean() - model_accuracy["Model Testing Accuracy"].mean())) +``` + +You can review the model accuracy of the experiment by typing the following Python into your Jupyter Notebook: + +``` python +model_accuracy +``` + +This displays data on the model accuracy: + +| Depth | Baseline Training Accuracy | Model Training Accuracy | Baseline Testing Accuracy | Model Testing Accuracy | +| ----- | -------------------------- | ----------------------- | ------------------------- | ---------------------- | +| 0 | 88.97 | 97.93 | 86.49 | 86.49 | +| 1 | 87.59 | 97.24 | 91.89 | 83.78 | +| 2 | 89.04 | 96.58 | 86.11 | 88.89 | +| 3 | 88.36 | 97.95 | 88.89 | 83.33 | +| 4 | 88.36 | 97.95 | 88.89 | 80.56 | + +## Prediction preview + +You can plot the actual braking against the predicted braking using the trained model with the following code: + +``` python +f, (ax1, ax2) = plt.subplots(2, 1, sharey=True, figsize=(50,8)) +ax1.plot(Y) +ax1.plot(X["Speed"]/X["Speed"].max()) + +ax2.plot(decision_tree.predict(X)) +ax2.plot(X["Speed"]/X["Speed"].max()) +``` + +Running this code in your Jupyter Notebook results in the following plot: + +![Braking prediction plot](./images/braking-prediction.png) + +The blue line is the predicted braking, the orange is the actual speed of the car. You can see the speed of the car slows, in correlation to the predicted braking. + +## Saving the model + +When you are confident with the results, save the model into a Pickle file: + +``` python +pickle.dump(decision_tree, open('./decision_tree_5_depth.sav', 'wb')) +``` + +!!! tip + + The Pickle file will be located in folder where Jupyter Notebook command was invoked. + +## MLflow + +To help you with managing your experiments, you can review them in MLflow: + +!!! warning + + MLflow works only on MacOS, Linux, or Windows Linux Subsystem (WSL). + +!!! tip + + To have some meaningful data, run the experiment with three different values for the `max_depth` parameter. + +In a terminal window, run the following command to launch the MLflow server: + +``` python +mlflow ui +``` + +In the UI you can select the experiments to compare: + +![Select experiment to compare](./images/select-experiments.png) + +You can also plot metrics from experiments: + +![Experiments compared](./images/compare-experiments.png) + +[Deploy your Machine Learning (ML) model :material-arrow-right-circle:{ align=right }](deploy-ml.md) diff --git a/docs/platform/tutorials/train-and-deploy-ml/visualize-result.png b/docs/platform/tutorials/train-and-deploy-ml/visualize-result.png deleted file mode 100644 index 64c54ac4..00000000 Binary files a/docs/platform/tutorials/train-and-deploy-ml/visualize-result.png and /dev/null differ diff --git a/docs/platform/what-is-quix.md b/docs/platform/what-is-quix.md index 0789df1b..0e1f1d2b 100644 --- a/docs/platform/what-is-quix.md +++ b/docs/platform/what-is-quix.md @@ -60,7 +60,7 @@ Quix provides four APIs to help you work with streaming data. These include: * [**Stream Reader API**](../apis/streaming-reader-api/intro.md): enables you to push live data from a Quix topic to your application, ensuring low latency by avoiding any disk operations. -* [**Data Catalogue API**](../apis/data-catalogue-api/intro.md): enables you to query historic data streams in the data catalogue, in order to train ML models, build dashboards, and export data to other systems. +* [**Data Catalogue API**](../apis/data-catalogue-api/intro.md): enables you to query historical data streams in the data catalogue, in order to train ML models, build dashboards, and export data to other systems. * [**Portal API**](../apis/portal-api.md): enables you to automate Quix Portal tasks such as creating workspaces, topics, and deployments. diff --git a/mkdocs.yml b/mkdocs.yml index 940cfbd8..0cd53df9 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -54,10 +54,14 @@ nav: - 'No code sentiment analysis': 'platform/tutorials/nocode-sentiment/nocode-sentiment-analysis.md' - 'RSS Processing': 'platform/tutorials/rss-tutorial/rss-processing-pipeline.md' - 'Currency Alerting': 'platform/tutorials/currency-alerting/currency-alerting.md' - - 'Train and deploy an ML model': - - 'How to train an ML model': 'platform/tutorials/train-and-deploy-ml/train-ml-model.md' - - 'Run ML model in realtime environment': 'platform/tutorials/train-and-deploy-ml/deploy-ml.md' - - 'Data Science': 'platform/tutorials/data-science/data-science.md' + - 'Real-time Machine Learning (ML) pipelines': + - platform/tutorials/train-and-deploy-ml/index.md + - 'Create your data': 'platform/tutorials/train-and-deploy-ml/create-data.md' + - 'Import data into Jupyter': 'platform/tutorials/train-and-deploy-ml/import-data.md' + - 'Train your ML model': 'platform/tutorials/train-and-deploy-ml/train-ml.md' + - 'Deploy your ML model': 'platform/tutorials/train-and-deploy-ml/deploy-ml.md' + - 'Conclusion': 'platform/tutorials/train-and-deploy-ml/conclusion.md' + - 'Data Science': 'platform/tutorials/data-science/index.md' - 'Data Stream Processing': 'platform/tutorials/data-stream-processing/data-stream-processing.md' - 'Slack Alerting': 'platform/tutorials/slack-alerting/slack-alerting.md' - 'Code Samples': 'platform/samples/samples.md' @@ -129,6 +133,7 @@ plugins: 'sdk/introduction.md': 'client-library-intro.md' 'platform/tutorials/imageProcessing/imageProcessing.md': 'platform/tutorials/image-processing/index.md' 'platform/intro.md': 'platform/what-is-quix.md' + 'platform/tutorials/train-and-deploy-ml/train-ml-model.md': 'platform/tutorials/train-and-deploy-ml/index.md' theme: name: 'material'