diff --git a/docs/platform/tutorials/currency-alerting/currency-alerting.md b/docs/platform/tutorials/currency-alerting/currency-alerting.md index 8c3e6e69..84229f1c 100644 --- a/docs/platform/tutorials/currency-alerting/currency-alerting.md +++ b/docs/platform/tutorials/currency-alerting/currency-alerting.md @@ -202,7 +202,6 @@ To learn more, try one of these tutorials: * [Build a live video processing pipeline using the Transport for London (TfL) traffic cameras and the YOLO ML model for object detection](../image-processing/index.md) * [Perform sentiment analysis on a stream of Tweets about a given subject](../sentiment-analysis/index.md) * [Gather and processes data from an RSS feed and get an alert when specific criteria are met](../rss-tutorial/rss-processing-pipeline.md) -* [Stream and visualize real-time telemetry data with an Android app and Streamlit](../telemetry-data/telemetry-data.md) !!! tip "Getting Help" diff --git a/docs/platform/tutorials/data-science/1-bikedata.md b/docs/platform/tutorials/data-science/1-bikedata.md new file mode 100644 index 00000000..8d5562d5 --- /dev/null +++ b/docs/platform/tutorials/data-science/1-bikedata.md @@ -0,0 +1,21 @@ +# CitiBike data + +Start by getting the real-time bicycle data. Use the Quix CitiBikes connector to get real-time bicycle availability data (it doesn't require a sign up or any keys). + +You won't need to write lots of code, as you will use the Quix Code Samples to deploy a prebuilt service that streams data from the New York CitiBikes API: + +1. Navigate to `Code Samples` using the left-hand menu and search for `New York` then select the `New York Bikes` tile. + + ![NY Bikes sample tile](./images/ny-bikes-library-tile.png){width=200px} + +2. Click `Setup and deploy`: + + a. Leave the `Name` as it is. + + b. Ensure the `output` is set to `bikes-topic`. + +3. Click `Deploy`. + + The precompiled service is deployed to your workspace and begins running immediately. + +[Part 2 - Weather data :material-arrow-right-circle:{ align=right }](2-weatherdata.md) \ No newline at end of file diff --git a/docs/platform/tutorials/data-science/2-weatherdata.md b/docs/platform/tutorials/data-science/2-weatherdata.md new file mode 100644 index 00000000..8f50e6d0 --- /dev/null +++ b/docs/platform/tutorials/data-science/2-weatherdata.md @@ -0,0 +1,51 @@ +# Weather data + +You now have a working real-time stream of bicycle data. Next, you will integrate the data from a free weather API, adding current and forecasted weather data. + +## Create a free Visual Crossing account + +!!! info + + [Visual Crossing](https://www.visualcrossing.com/){target=_blank} is a leading provider of weather data and enterprise analysis tools to data scientists, business analysts, professionals, and academics. + +1. Go to the [Visual Crossing sign up page](https://www.visualcrossing.com/sign-up){target=_blank}. + +2. Follow the instructions to create your account. + +3. Go to the [Account](https://www.visualcrossing.com/account){target=_blank} page to copy your key. + + Keep it safe for later. + +## Weather real-time stream + +You can now deploy the VisualCrossing connector from the Quix Code Samples: + +1. Search the Code Samples for `weather` and select the `VisualCrossing Weather` tile. + +2. Click `Setup and deploy`. + +3. Leave the `Name` as it is. + +4. Ensure the `output` is set to `weather-topic`. + +5. Paste your API key into the `api_token` field. This is the API key you obtained from your Visual Crossing account page. + +6. Click `Deploy`. + + The precompiled service is deployed to your workspace and begins running immediately. + + !!! warning "Visual Crossing usage limitation" + + The Visual Crossing API has limits on how much data you can access for free and the real weather only changes in real-time (this means slowly). + + The free Visual Crossing account is limited to 1000 records per day so in order to prevent your account from being rate limited, the connector is coded to consume data every 2 minutes, however, you can trigger collection of new data by restarting the service as needed. You will do this several times throughout this tutorial. + +## Summary + +At this stage you have two services running. + +One is publishing `New York CitiBike` data to a topic called `bikes-topic` and another is publishing `Visual Crossing` weather data to a topic called `weather-topic`. + +![Successfully deployed pipeline](./images/early-success.png) + +[Part 3 - Data views :material-arrow-right-circle:{ align=right }](3-data.md) diff --git a/docs/platform/tutorials/data-science/3-data.md b/docs/platform/tutorials/data-science/3-data.md new file mode 100644 index 00000000..49b8e66b --- /dev/null +++ b/docs/platform/tutorials/data-science/3-data.md @@ -0,0 +1,53 @@ +# View and store the data + +With the Quix Platform it's easy to visualize your data in a powerful and flexible way, you can see the data in real time, as well as viewing historical data. + +The Quix Platform was designed for real-time data, so if you want to see data-at-rest for any topic you must turn on data persistence for that specific topic. You'll do this in the [historical data](#historical-data) section. + +## Real-time data + +Follow these steps to view real-time data as it arrives in your topics: + +1. On the `Pipeline` page, click on the arrow coming out of the `New York Bikes` service. If there is data being emitted this arrow is green, otherwise it is gray. + +2. Click `Explore live data` on the context menu. + +3. Select a stream from the streams listed under `SELECT STREAMS`. + + !!! note + + If there are no streams under `SELECT STREAMS`, wait a few moments, the New York CitiBike API is queried every few seconds. + +4. Select the parameters in the `SELECT PARAMETERS OR EVENTS` list. + +5. Select the `Table` tab in the top middle of the page. + +6. After a few moments you will see data being shown in the table. + + ![CitiBike data](./images/data.png) + +!!! tip "Be patient" + + If you don't see any `streams`, `parameters` or data, just wait a moment or two. The next time data arrives these will be populated automatically. + +Now you know how to observe data arriving into your topics. You can also explore the `Waveform` tab to see numeric data in a graphical form and the `Messages` tab to see the raw, [JSON format](https://en.wikipedia.org/wiki/JSON){target=_blank}, messages. + +## Historical data + +In order to train a machine learning model on historical data, the live real-time data being ingested needs to be stored. However, topics are real time and therefore not designed for data storage. To solve this, Quix allows you to store the data going through a topic to an efficient real-time database if you need it. + +Enable persistence on your topics: + +1. Navigate to the `Topics` page using the left-hand navigation. + +2. Locate the topic(s) you want to store data for (in this case `bikes-topic` and `weather-topic`). + +3. For each topic, click the toggle in the `Persistence` column to `on`. + +4. Finally, go back to the `Pipeline` page. + + Now, to ensure there is some historical data stored in the weather topic, stop and then start the `VisualCrossing Weather` service. This will force the service to collect and publish fresh data to the `weather-topic` without waiting for 30 minutes. + + You will need this historical data in the next section, where you will learn how to retrieve data for training a model. + +[Part 4 - Get data to train a model :material-arrow-right-circle:{ align=right }](4-train.md) diff --git a/docs/platform/tutorials/data-science/4-train.md b/docs/platform/tutorials/data-science/4-train.md new file mode 100644 index 00000000..fe5674ae --- /dev/null +++ b/docs/platform/tutorials/data-science/4-train.md @@ -0,0 +1,100 @@ +# Training data + +Quix gives you the freedom to train the ML model your own way. If you already have tools and processes for doing that then you can train the model and use it in the Quix Platform where you can run it in real-time. + +Follow along and learn how to retrieve historical data from your topics, so you can train your model. + +## Limited data set + +You've already read about the limitations of the free Visual Crossing API. Namely, that it only allows requests for new data 1000 times per day and so the Quix Code Sample only requests data very 2 minutes, therefore at this point in the tutorial you may have a limited data set. + +Continue following the tutorial to see how to access the accumulated historical data, however, after a few hours you will have more data to examine at which time you can repeat the steps again. + +## Get the data + +To access historical data: + +1. Click `Persisted data` in the left-hand navigation. + +2. Select the `bikes-topic` in the left-hand panel. + +3. Mouse over a stream name in the table and click the `Visualize stream` icon. + + The page will navigate to the `Data explorer`. + +4. You will see one of two scenarios represented in the `query builder` on the left-hand side. + + Select the tab most applicable to what you see: + +
+ + === "Pre-filled" + + If you see a prepopulated query builder: + + ![Populated query builder](./images/query-a.png){width=250px} + + Follow these steps: + + 1. Select the `+` under `SELECT (Parameters & Events)`. + + 2. Select `total_num_bikes_available` from the list. + + 3. Again select the `+` under `SELECT (Parameters & Events)`. + + 4. Select `num_docks_available` from the list. + + + === "Empty" + + If you see an empty query builder: + + ![Un-populated query builder](./images/query-b.png){width=250px} + + Follow these steps: + + 1. Click `Add Query`. + + 2. Select `bikes-topic` under `From topic`. + + 3. Select the `New York Total Bikes Real Time` stream. + + 4. Click `Next`. + + 5. Select both parameters, that is both `num_docks_available` and `total_num_bikes_available`. + + 6. Click `Done`. + +
+ + Whichever options you used, you should be looking at a visualization of the two selected parameters: + + ![Data explorer](./images/data-explorer.png){width=600px} + + Note that your data won't look the same as ours, so don't be concerned if they aren't identical. + +8. Switch off `aggregation` to see all of the data. + +9. Select the `Code` tab to view the code to access this data set from outside of Quix. + + ![Data explorer settings](./images/data-explorer-settings.png) + + !!! hint + + You can copy and paste this code into a [`Jupyter Notebook`](https://jupyter.org/){target=_blank} or [`Google Colab Notebook`](https://colab.research.google.com/){target=_blank} and run it to get your data there. + + ![Collab Notebook](./images/results.png){width=450px} + +## Train the model + +At this point, you are collecting historical data and you know how to query it for use outside the Quix Platform to train your ML models. + +???- example "Need help training a model?" + + Follow our "How to train an ML model" tutorial [here](../train-and-deploy-ml/train-ml-model.md) + + You are walked through the process of getting the code to access the data (as described above), running the code in a Jupyter notebook, training the model and uploading your pickle file to Quix. + +It would take several weeks to accumulate enough historical data to train a model, so you will continue the tutorial with some pre-trained models already built by the Quix team. This was done using the very same data flow you've just built. You can find the Jupyter notebook code used to train the model in the [Quix GitHub repo](https://github.com/quixio/NY-bikes-tutorial/blob/1stversion/notebooks-and-sample-data/04%20-%20Train%20ML%20models.ipynb){target=_blank}. + +[Part 5 - Run the model :material-arrow-right-circle:{ align=right }](5-run.md) diff --git a/docs/platform/tutorials/data-science/5-run.md b/docs/platform/tutorials/data-science/5-run.md new file mode 100644 index 00000000..132a96ec --- /dev/null +++ b/docs/platform/tutorials/data-science/5-run.md @@ -0,0 +1,95 @@ +# 5. Run the model + +Quix has has already trained model artifacts and these have been included as pickle files in the prediction code project. This project is included in the open source Code Samples. You will use the Code Sample to run the model. + +## Prediction service code + +Get the code for the prediction service: + +1. Click on `Code Samples` in the left-hand navigation. + +2. Search for `New York` and click the `New York Bikes - Prediction` tile. + +3. Click `Edit code`. + +4. Leave the `Name` as it is. + +5. Ensure the `bike_input` is set to `bikes-topic`. + +6. Ensure the `weather_input` is set to `weather-topic`. + +7. Ensure the `output` is set to `NY-bikes-prediction`. + +8. Click `Save as project`. + + This will save the code for this service to your workspace. + +!!! note "Free Models" + + Look in the `MLModels` folder for the Quix pretrained ML models. You can upload your own and compare them to ours. Let us know how they compare. + +## Run in the dev environment + +You can now run the prediction model from this 'dev' environment to make sure it's working before deploying it to an always ready, production environment. + +1. Click `Run` in the top right-hand corner. + +2. Observe the `Console` tab at the bottom of the screen. + + - Any packages that are needed will be installed. + + - Any topics that didn't previously exist will be created. + + - Then the code will run. + + - You will see a line similar to this in the console output. + + ```shell + Current n bikes: 23742 Forecast 1h: 23663 Forecast 1 day: 22831 + ``` + + !!! note "Note about data" + + For a new prediction to be generated, the service has to receive data from both bikes (updated often) and weather feeds (only updated every 30 mins). + + When you test the model, you may want to force the weather service to produce some new data (to avoid waiting for 30 mins) by restarting the service: stop it and then re-deploy it. By doing this it will start generating predictions sooner. + +## Deploy the service + +Having verified that the code runs, you can now deploy it to the Quix serverless environment. Once deployed, it will run continuously, gathering data from the sources and producing predictions. + +1. Click `Running` to stop the code running. + +2. Click `Deploy` in the top right-hand corner near `Run`. + +3. On the `Deployment settings`, increase the memory to at least 1.5GB. + +4. Click `Deploy`. + + You will be redirected to the pipeline page and the code will be built, deployed and started. + +## See the model output + +Once the prediction service has started you can once more restart the `VisualCrossing Weather` service and view the data. + +You should be familiar with some of the following steps: + +1. Restart the `VisualCrossing Weather` service. + +2. Click `Persisted streams` in the left-hand menu. + +3. Click the toggle switch next to the `ny-bikes-prediction` topic to persist the data (wait for this to complete). + +4. Mouse over the `stream name` of one of the rows in the table. + +5. Click the `Visualize stream` button. + +6. Select both of the parameters (`timestamp_ny_prediction` and `forecast_1d`). + +7. You can select the `Waveform` tab to see a graphical representation of the forecast or select the `Table` tab to see the raw data. + +## Summary + +Congratulations, you have completed all the steps of this tutorial. The following page summarizes your learning and provides some suggestions for next steps to try. + +[Conclusion and next steps :material-arrow-right-circle:{ align=right }](6-conclusion.md) diff --git a/docs/platform/tutorials/data-science/6-conclusion.md b/docs/platform/tutorials/data-science/6-conclusion.md new file mode 100644 index 00000000..b2c86a6f --- /dev/null +++ b/docs/platform/tutorials/data-science/6-conclusion.md @@ -0,0 +1,25 @@ +# 6. Conclusion + +In this tutorial you have learned how to ingest data, using prebuilt code samples for real-time processing, how to view the data in different ways, and how to query the data to enable model training. + +## Code Samples used + +Here is a list of the open source Quix Code Samples used in this tutorial, with links to their code in GitHub: + +* [New York Bikes](https://github.com/quixio/quix-samples/tree/main/python/sources/NY-Citibikes){target=_blank} +* [VisualCrossing Weather](https://github.com/quixio/quix-samples/tree/main/python/sources/visualcrossing-weather){target=_blank} +* [New York Bikes - Prediction](https://github.com/quixio/quix-samples/tree/main/python/transformations/NY-Bikes-Predictions){target=_blank} + +## Next steps + +Here are some suggested next steps to continue on your Quix learning journey: + +* Try the [Event detection tutorial](../eventDetection/index.md). + +* If you decide to build your own connectors and apps, you can contribute something to the Code Samples. Visit the [GitHub Code Samples repository](https://github.com/quixio/quix-samples){target=_blank}. Fork our Code Samples repo and submit your code, updates, and ideas. + +What will you build? Let us know! Quix would like to feature your project or use case in our [newsletter](https://www.quix.io/community/){target=_blank}. + +## Getting help + +If you need any assistance, we're here to help in [The Quix Forum](https://forum.quix.io/){target=_blank}. Introduce yourself and then ask any questions in [Quix SaaS Platform](https://forum.quix.io/c/quix-saas-platform/6){target=_blank}. diff --git a/docs/platform/tutorials/data-science/data-explorer.png b/docs/platform/tutorials/data-science/data-explorer.png deleted file mode 100644 index 8adaa3c5..00000000 Binary files a/docs/platform/tutorials/data-science/data-explorer.png and /dev/null differ diff --git a/docs/platform/tutorials/data-science/data-science.md b/docs/platform/tutorials/data-science/data-science.md deleted file mode 100644 index bfb0e044..00000000 --- a/docs/platform/tutorials/data-science/data-science.md +++ /dev/null @@ -1,309 +0,0 @@ -# Data Science with Quix: NY Bikes - -Throughout this tutorial you will learn how to deploy a real-time data science project from scratch and into a scalable self-maintained solution. We will predict bike availability in New York by building the raw data ingestion pipelines, ETL and predictions. All in real time! - -## Introduction - -### Aim - -Quix allows you to create complex and efficient real time infrastructure in a simple and quick way. To show you that, you are going to build an application that uses real time New York bikes and weather data to predict the future availability of bikes in New York. - -In other words, you will complete all the typical phases of a data science project by yourself: - - - Build pipelines to gather bikes and weather data in real time - - - Store the data efficiently - - - Train some ML models with historical data - - - Deploy the ML models into production in real time - -This will typically take several people (Data Engineers, Data Scientists) and weeks of work, however you will complete this tutorial in under 90 minutes using Quix. - -### Prerequisites - -1. You will need to know how to train an ML model. - - ???- example "Want to learn it?" - - If you don't already know how to train an ML model, follow our "How to train an ML model" tutorial [here](../../tutorials/train-and-deploy-ml/train-ml-model.md). - - We walk you through the process of getting the code to access the data, running the code in a Jupyter notebook, training the model and uploading your pickle file to Quix. - - -2. You will need a Quix account and be logged into the [Portal](https://portal.platform.quix.ai/workspaces){target=_blank}. - - !!! tip - - Go [here](https://quix.io){target=_blank} to sign up if you need a free account. - - -### Overview - -This walk through covers the following steps: - -1. Create OpenWeather account (third party) - -2. Create a bikes data real time stream - -3. Create a weather forecast data real time stream - -4. Visualize the data - -5. Get data to train a model - -6. Deploy pre-trained ML models and produce predictions in real time - -7. See the models output - -## 1. Create OpenWeather account (free) - -!!! info - - [OpenWeather](https://openweathermap.org/){target=_blank} is a team of IT experts and data scientists that provides historical, current and forecasted weather data via light-speed APIs. - -1. Go to the [OpenWeather Sign Up page](https://home.openweathermap.org/users/sign_up/){target=_blank}. - -2. Click the "Sign Up" button and complete the dialog. Do the email and text message verifications. - -3. Then, go to the [OpenWeather API keys](https://home.openweathermap.org/api_keys){target=_blank} page to copy your key. Keep it safe for later. - -## 2. Bikes real time stream - -Start by getting the real time bikes stream. Use CityBikes to get real time bikes data (it doesn’t require a sign up or any keys). - -Instead of writing a lot of code you will use the Code Samples to deploy a pre-built service that streams data from the New York CitiBikes api. - -1. Navigate to `Code Samples` and search for `New York` and select the `New York Bikes - Source` tile. - - ![NY Bikes sample tile](ny-bikes-library-tile.png){width=400px} - - !!! tip - `Code Samples` is on the left hand menu - -2. Click `Setup and deploy` - - a. Leave the Name as it is - - b. Ensure the `output` is set to `bikes-topic` - -3. Click `Deploy` - - The pre-compiled service will be deployed to your workspace and will begin running immediately. - - -## 3. Weather real time stream - -You now have a working real time stream of bike data. Now use the OpenWeather account to create a real time weather stream. The procedure is almost the same, so you should have no problems! - -1. Search the Code Samples for `weather` and select the `Open Weather API` tile. - -2. Click `Setup and deploy` - - a. Leave the Name as it is - - b. Ensure the `output` is set to `weather-topic` - - c. Paste the key from your OpenWeather API keys page ([here](https://home.openweathermap.org/api_keys){target=_blank}) - - ![Open weather API page](open-weather-api.png){width=600px} - -3. Click `Deploy` - - The pre-compiled service will be deployed to your workspace and will begin running immediately. - - !!! note "OpenWeather limitation" - - The OpenWeather API has limits on how much data we can access for free and the real weather only changes in real-time (this means slowly). - - In order to prevent your account from being rate limited, we consume updated data every 30 minutes. (see more about this limitation later on in [training data](#training-data)) - -!!! success - - At this stage you should have two services running - - ![Successfully deployed pipeline](early-success.png) - - One service publishing New York CitiBike data to a topic and another publishing OpenWeather data. - -## 4. View and store the data - -With Quix it's easy to visualize your data in a powerful and flexible way, you can see the real-time data and view historical data. - -At it's heart Quix is a real-time data platform, so if you want to see data-at-rest for a topic, you must turn on data persistence for that topic (You'll do this [below](#historical)). - -### Real-time - -1. View the real-time data by clicking on the green arrow coming out of the `New York Bikes` service on the home page. - -2. Click `Explore live data` on the context menu - -3. Select a stream from the `select streams` list - -4. Select the parameters in the `select parameters or events` list - -5. Select the `Table` tab in the top middle of the screen - -!!! tip "Be patient" - - If you don't see any streams or parameters, just wait a moment or two. The next time data arrives these lists will be automatically populated. - -### Historical - -In order to train a machine learning model we will need to store the data we are ingesting so that we start building a historical dataset. However topics are real time infrastructures, not designed for data storage. To solve this, Quix allows you to send the data going through a topic to an efficient real time database if you need it: - -1. Navigate to the topics page using the left hand navigation - -2. Locate the topic(s) you want to store data for (in this case `bikes-topic` and `weather-topic`) - -3. Click the toggle in the Persistence column to `on` - -4. Finally, go back to the homepage. - - Now stop and then start the OpenWeather API service. This will collect and publish fresh data to the weather-topic. - -## 5. Train your model - -Quix gives you the freedom to train the ML model your way. If you already have tools and processes for doing that then great, you can train the model and import it into Quix so that you can run it in real-time. - -Follow the along and we'll show you how to get data out of Quix so you can train the model. - -### Training data - -We mentioned earlier in [Weather real time stream](#3-weather-real-time-stream) that free access to the OpenWeather API only allows us to consume new data every 30 minutes therefore, at this point you will have a limited data set. - -You can leave the data consumption process running overnight or for a few days to gather more data, but for the time being there's no problem in continuing with the tutorial with your limited historical data. - -#### Get the data - -1. Click `Persisted data` in the left hand navigation - -2. Select the `bikes-topic` in the left hand panel - -3. Mouse over a stream name in the table and click the `Visualize stream` button - - The page will navigate to the `Data explorer` - -4. In the query builder on the left hand side click the `+` under `SELECT (Parameters & Events)` - -5. Select `total_num_bikes_available` - -6. Click `+` again - -7. Select `num_docks_available` - -!!! success - - You should be looking at a visualization of the two selected parameters - - ![Data explorer](data-explorer.png){width=600px} - -8. Switch off `aggregation` to see all of the data - -9. Select the `Code` tab to view the code to access this data set from outside of Quix - -### Train the model - -At this point, you are generating historical data and know how to query it. You can train your ML models as soon as you've gathered enough data. - -!!! example "Need help?" - - Follow our "How to train an ML model" tutorial [here](../train-and-deploy-ml/train-ml-model.md) - - We walk you through the process of getting the code to access the data (as described above), running the code in a Jupyter notebook, training the model and uploading your pickle file to Quix. - -However, it would take several weeks to accumulate enough historical data to train a model, so let's continue the tutorial with some pre-trained models we have provided. We've done it using the very same data flow you've just built, and can find the Jupyter notebook code we used [here](https://github.com/quixio/NY-bikes-tutorial/blob/1stversion/notebooks-and-sample-data/04%20-%20Train%20ML%20models.ipynb){target=_blank}. - -## 6. Run the model - -We have included our trained model artifacts as pickle files in the prediction code project and uploaded it to the open source Code Samples, so let's use them. - -### Prediction service code - -Get the code for the prediction service. - -1. Click on `Code Samples` in the left hand navigation - -2. Search for `New York` and select the `New york Bikes - Prediction` tile - -3. Click `Edit code` - - a. Leave the Name as it is - - b. Ensure the `bike-input` is set to `bike-topic` - - c. Ensure the `weather-input` is set to `weather-topic` - - d. Ensure the `output` is set to `NY-bikes-prediction` - -4. Click `Save as project` - - This will save the code for this service to your workspace - -!!! note "Free Models" - - Look in the `MLModels` folder. We've stashed the pre-trained ML models here for you. You can upload your own and compare them to ours. (Let us know how they compare) - -### Run - -You can now run the prediction model from this 'dev' environment to make sure it's working before deploying it to an always ready, production environment. - -1. Click `run` in the top right hand corner. - -2. Observe the `console` tab at the bottom of the screen. - - Any packages that are needed will be installed. - - Any topics that didn't previously exist will be created. - - Then the code will run. - - You will see a line similar to this in the console output. - - ```shell - Current n bikes: 23742 Forecast 1h: 23663 Forecast 1 day: 22831 - ``` - - !!! note "Data" - - For a new prediction to be generated, the service has to receive data from both bikes (updated often) and weather feeds (only updated every 30 mins). - - When you test the model, you may want to force the weather service to produce some new data (to avoid waiting for 30 mins) by restarting the service: stop it and then re-deploy it. By doing this it will start generating predictions sooner. - -### Deploy - - With the code running we can deploy it to the Quix serverless environment. Here, it will run continuously, gathering data from the sources and producing predictions. - -1. Click stop if you haven't already done so. - -2. Click `Deploy` in the top right hand corner near `run` - -3. On the `Deployment settings`, increase the memory to at least 1.5GB - -4. Click `deploy` - - You will be redirected to the home page and the code will be built, deployed and started. - -## 7. See the models output - -Once the prediction service has started you can once more restart the 'Open Weather API' service and view the data. - -You should be familiar with some of the following steps, they are very similar to '[Get the data](#get-the-data)' above. - -1. Restart the 'Open Weather API' service - -2. Click `Persisted streams` in the left hand menu - -3. Click the toggle switch next to the `ny-bikes-prediction` topic to persist the data (wait for this to complete) - -4. Mouse over the `stream name` of one of the rows in the table - -5. Click the `Visualize stream` button - -6. Select both of the parameters (`timestamp_ny_prediction` and `forecast_1d`) - -7. You can select the `Waveform` tab to see a graphical representation of the forecast or select the `Table` tab to see the raw data. - -!!! success - - You made it to the end! Give yourself a high five. diff --git a/docs/platform/tutorials/data-science/early-success.png b/docs/platform/tutorials/data-science/early-success.png deleted file mode 100644 index d9054f51..00000000 Binary files a/docs/platform/tutorials/data-science/early-success.png and /dev/null differ diff --git a/docs/platform/tutorials/data-science/images/data-explorer-settings.png b/docs/platform/tutorials/data-science/images/data-explorer-settings.png new file mode 100644 index 00000000..7ba39490 Binary files /dev/null and b/docs/platform/tutorials/data-science/images/data-explorer-settings.png differ diff --git a/docs/platform/tutorials/data-science/images/data-explorer.png b/docs/platform/tutorials/data-science/images/data-explorer.png new file mode 100644 index 00000000..b0605d06 Binary files /dev/null and b/docs/platform/tutorials/data-science/images/data-explorer.png differ diff --git a/docs/platform/tutorials/data-science/images/data.png b/docs/platform/tutorials/data-science/images/data.png new file mode 100644 index 00000000..05fc56bf Binary files /dev/null and b/docs/platform/tutorials/data-science/images/data.png differ diff --git a/docs/platform/tutorials/data-science/images/early-success.png b/docs/platform/tutorials/data-science/images/early-success.png new file mode 100644 index 00000000..2fe68e9f Binary files /dev/null and b/docs/platform/tutorials/data-science/images/early-success.png differ diff --git a/docs/platform/tutorials/data-science/images/ny-bikes-library-tile.png b/docs/platform/tutorials/data-science/images/ny-bikes-library-tile.png new file mode 100644 index 00000000..8b6eb783 Binary files /dev/null and b/docs/platform/tutorials/data-science/images/ny-bikes-library-tile.png differ diff --git a/docs/platform/tutorials/data-science/open-weather-api.png b/docs/platform/tutorials/data-science/images/open-weather-api.png similarity index 100% rename from docs/platform/tutorials/data-science/open-weather-api.png rename to docs/platform/tutorials/data-science/images/open-weather-api.png diff --git a/docs/platform/tutorials/data-science/images/query-a.png b/docs/platform/tutorials/data-science/images/query-a.png new file mode 100644 index 00000000..9b8e10c6 Binary files /dev/null and b/docs/platform/tutorials/data-science/images/query-a.png differ diff --git a/docs/platform/tutorials/data-science/images/query-b.png b/docs/platform/tutorials/data-science/images/query-b.png new file mode 100644 index 00000000..50892dd2 Binary files /dev/null and b/docs/platform/tutorials/data-science/images/query-b.png differ diff --git a/docs/platform/tutorials/data-science/images/results.png b/docs/platform/tutorials/data-science/images/results.png new file mode 100644 index 00000000..34ae27aa Binary files /dev/null and b/docs/platform/tutorials/data-science/images/results.png differ diff --git a/docs/platform/tutorials/data-science/index.md b/docs/platform/tutorials/data-science/index.md new file mode 100644 index 00000000..4ed62585 --- /dev/null +++ b/docs/platform/tutorials/data-science/index.md @@ -0,0 +1,56 @@ +# Real-time Machine Learning (ML) predictions + +In this tutorial you will learn how to deploy a real-time data science project into a scalable self-maintained solution. You create a service that predicts bicycle availability in New York, by building the raw data ingestion pipelines, Extract Transform Load (ETL), and predictions. + +## Aim + +The Quix Platform enables you to harness complex, efficient real-time infrastructure in a quick and simple way. You are going to build an application that uses real-time New York bicycle data and weather data to predict the future availability of bikes in New York. + +You will complete all the typical phases of a data science project: + + - Build pipelines to gather bicycle and weather data. + + - Store the data efficiently. + + - Train ML models with historic data. + + - Deploy the ML models into production and create predictions in real time. + +This would traditionally take several people with a wide set of different skills (Data Engineers, Data Scientists and Developers) and weeks of work. However, you will complete this tutorial on your own in a fraction of the time using the Quix Platform. + +## Prerequisites + +This tutorial has the following prerequisites: + +1. You will need to know how to train an ML model. + + ???- example "Want to learn it?" + + If you don't already know how to train an ML model, follow our "How to train an ML model" tutorial [here](../../tutorials/train-and-deploy-ml/index.md). + + We take you through the process of getting the code to access the data, running the code in a Jupyter notebook, training the model and uploading your pickle file to the Quix Platform. + + +2. You will need a Quix account and be logged into the [Quix Portal](https://portal.platform.quix.ai/workspaces){target=_blank}. + + !!! tip + + Go [here](https://quix.io){target=_blank} to sign up if you need a free account. + +## The parts of the tutorial + +This tutorial is divided up into several parts, to make it a more manageable learning experience. The parts are summarized here: + +1. **Create a bikes data real-time stream**. Access real-time data from New York's CitiBikes API using a ready made Code Sample. + +2. **Create a weather forecast data stream**. Add weather data using a free weather API. + +3. **Visualize the data**. View real-time and historic data in the visualization tools. + +4. **Get data to train a model**. Use the built in tools to get training data. + +5. **Deploy pre-trained ML models and produce predictions in real time**. Use our pre-trained models to get CitiBike predictions based on historical bicycle availability and weather forecasts. You also use the built-in visualization tools to view the models prediction. + +6. **Conclusion**. In this [concluding](6-conclusion.md) part you are presented with a summary of the work you have completed, and also some next steps for more advanced learning about the Quix Platform. + +[Part 1 - CitiBike data stream :material-arrow-right-circle:{ align=right }](1-bikedata.md) diff --git a/docs/platform/tutorials/data-science/ny-bikes-library-tile.png b/docs/platform/tutorials/data-science/ny-bikes-library-tile.png deleted file mode 100644 index 27211019..00000000 Binary files a/docs/platform/tutorials/data-science/ny-bikes-library-tile.png and /dev/null differ diff --git a/docs/platform/tutorials/train-and-deploy-ml/conclusion.md b/docs/platform/tutorials/train-and-deploy-ml/conclusion.md index e359ce56..f655a566 100644 --- a/docs/platform/tutorials/train-and-deploy-ml/conclusion.md +++ b/docs/platform/tutorials/train-and-deploy-ml/conclusion.md @@ -14,7 +14,7 @@ Here are some suggested next steps to continue on your Quix learning journey: * [Sentiment analysis tutorial](../sentiment-analysis/index.md) - In this tutorial you learn how to build a sentiment analysis pipeline, capable of analyzing real-time chat. -* [Data science tutorial](../data-science/data-science.md) - In this tutorial you use data science to build a real-time bike availability pipeline. +* [Real-time Machine Learning (ML) predictions](../data-science/index.md) - In this tutorial you use data science to build a real-time bike availability pipeline. What will you build? Let us know! We’d love to feature your project or use case in our [newsletter](https://www.quix.io/community/). diff --git a/mkdocs.yml b/mkdocs.yml index 48661bef..bff57d10 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,10 +1,10 @@ site_name: 'Quix Docs' site_description: 'Quix Developer Documentation. Includes documentation (tutorials, reference guides, how-tos, concepts) for Quix Platform, Quix Streams client library, and REST and websocket APIs.' site_author: 'Quix.io' -site_url: 'https://docs.quix.io' +site_url: 'https://quix.io/docs' copyright: > - Copyright © 2020 - 2022 Quix Analytics, Ltd. – + Copyright © 2020 - 2023 Quix Analytics, Ltd. – Change cookie settings docs_dir: docs/ @@ -62,7 +62,14 @@ nav: - 'Train your ML model': 'platform/tutorials/train-and-deploy-ml/train-ml.md' - 'Deploy your ML model': 'platform/tutorials/train-and-deploy-ml/deploy-ml.md' - 'Conclusion': 'platform/tutorials/train-and-deploy-ml/conclusion.md' - - 'Data Science': 'platform/tutorials/data-science/index.md' + - 'Real-time Machine Learning (ML) predictions': + - 'platform/tutorials/data-science/index.md' + - '1. Bicycle data': 'platform/tutorials/data-science/1-bikedata.md' + - '2. Weather data': 'platform/tutorials/data-science/2-weatherdata.md' + - '3. Data views': 'platform/tutorials/data-science/3-data.md' + - '4. Get training data': 'platform/tutorials/data-science/4-train.md' + - '5. Run the model': 'platform/tutorials/data-science/5-run.md' + - '6. Conclusion': 'platform/tutorials/data-science/6-conclusion.md' - 'Data Stream Processing': 'platform/tutorials/data-stream-processing/data-stream-processing.md' - 'Slack Alerting': 'platform/tutorials/slack-alerting/slack-alerting.md' - 'Code Samples': 'platform/samples/samples.md' @@ -105,11 +112,12 @@ nav: - 'Portal API': 'apis/portal-api.md' plugins: + - multirepo: + cleanup: true - search: separator: '[\s\-\.]' - social - glightbox - - multirepo - redirects: redirect_maps: 'sdk-intro.md': 'client-library-intro.md' @@ -135,6 +143,7 @@ plugins: 'platform/tutorials/imageProcessing/imageProcessing.md': 'platform/tutorials/image-processing/index.md' 'platform/intro.md': 'platform/what-is-quix.md' 'platform/tutorials/train-and-deploy-ml/train-ml-model.md': 'platform/tutorials/train-and-deploy-ml/index.md' + "platform/tutorials/data-science/data-science.md": "platform/tutorials/data-science/intro.md" theme: name: 'material'