# A framework for solving business problems with algorithms

<style>
  a {
    color: #F95D2C !important;
  }
</style>

As businesses increasingly integrate algorithms into their operations, the adoption of a structured framework becomes essential for algorithmically solving problems. From streamlining manual processes to delivering predictive insights, algorithms offer transformative potential at a speed that surpasses human capabilities. 

```{figure} ./images/img1.png
---
scale: 35%
align: center
---
{cite}`prinofour2019a`.
```

This module introduces a **five-step framework** for leveraging algorithms to automate manual processes. By following these steps, you will gain the necessary skills to make informed decisions, and achieve better outcomes in a wide variety of business scenarios.

<!-- <span style="color: white;">...</span> -->

```{figure} ./images/step1.png
---
scale: 10%
align: left
---
```

## Identify the problem and define the task
``````{margin}
`````{admonition} Example: Pain points
:class: important, dropdown
A team may be spending a significant amount of time manually processing documents, or a client may be struggling with frequent errors in data entry, leading to delays and inaccuracies in their operations. 

These scenarios are are prime candidates for algorithmic solutions.
`````
``````
---
To start, we need to be able to clearly translate a problem's solution into a compatible machine learning task. A common way to identify opportunities for algorithmic implementation is by assessing **pain points** within one's own team or challenges faced by external stakeholders, such as clients. 

Machine learning algorithms excel at handling tedious tasks that are relatively simple. Therefore once a problem has been identified, it is beneficial to isolate a specific sub-task within the identified problem that can then be simplified into either a binary decision (true or false), multiple labels, continuous outcome, or be amenable to unsupervised learning.

However, before we can proceed to implement a suitable algorithm, we need to consider the availability of the input data! Do we have enough data to model the problem? How many sources of data do we need? 

```{figure} ./images/step2.png
---
scale: 10%
align: left
---
```

## Collect and pre-process data
``````{margin}
`````{admonition} Who can help?
:class: seealso, dropdown
Liaise with **data engineers** or **cloud solution** architects to access the necessary data either through a manual export or an automated pipeline. 

You may need to obtain credentials to gain access to cloud-based pipelines, or third-party APIs if external data is required.
`````
``````
---

Data serves as the foundation for any algorithmic solution, making it essential to gather relevant and high-quality data that accurately represents the problem domain. Depending on the nature of the problem, data can be sourced from various channels, including databases, APIs, or web scraping.

Once data is collected, it needs to be pre-processed, which involves cleaning, transforming, and organising the data to make it machine learning-ready. For example, if you plan on using a neural network, you will need to convert any categorical data into a binary form before you can begin the training process.

```{figure} ./images/step3.png
---
scale: 10%
align: left
---
```

## Select the right algorithm
---
It is important to consider algorithms that are best suited to handle the input data and the complexity of the problem, while also ensuring scalability for the required volume of data.

``````{margin}
`````{admonition} Storing models
:class: note
To facilitate the next steps, consider storing your trained model/s as a **pickle file**. Pickle files allow you easily reload the trained model to make predictions on new data without having to retrain the model from scratch. 

For more information on pickle files, see [documentation](https://docs.python.org/3/library/pickle.html). 

Datacamp also have a great tutorial on the topic [here](https://www.datacamp.com/tutorial/pickle-python-tutorial).
`````
``````

If uncertain, begin by experimenting with simpler models such as traditional tree-based, or error-based algorithms, and progressively explore more sophisticated options. However, it is worth noting that higher complexity doesn't necessarily guarantee better performance; simpler models can often prove more effective and cost-efficient.

```{figure} ./images/linear_regression.png
---
scale: 50%
align: center
---
{cite}`hanson2016`.
```

Remember to evaluate the algorithm's performance using appropriate metrics and validation techniques. Assessing accuracy, computational efficiency and interpretability will help you in making an informed decision.

```{figure} ./images/step4.png
---
scale: 10%
align: left
---
```

## Implement and test in a development environment
``````{margin}
`````{admonition} Cloud-based technologies
:class: note

Cloud platforms offer a range of services that can be used to build a pipeline for training and testing the model. These technologies are optimal for scalability, flexibility, and efficient processing of data.

You can use a cloud platform like [Azure](https://azure.microsoft.com/en-au/) or [AWS](https://aws.amazon.com/) to set up a pipeline that automates the process of training and testing the model.
`````
``````
---
To bring the selected algorithm to life, the next step is to implement and test it in a development environment, which may involve building a pipeline using **cloud-based technologies** As demonstrated in {ref}`Figure 3 <azure-ml>`, a machine learning pipeline consists of setting up a virtual machine to train and test the model. For deep learning models, a GPU-powered virtual machine may be necessary to meet memory requirements.

```{figure} ./images/azure-ml.png
---
align: center
name: azure-ml
height: 225px
---
{cite}`ngyuyen2023`.
```

Once the pipeline is successfully set up, you will be able to access and utilise the trained machine learning model through an API endpoint.

```{figure} ./images/step5.png
---
scale: 10%
align: left
---
```

## Deploy to production environment and monitor performance
``````{margin}
`````{admonition} Who can help?
:class: seealso, dropdown
To enable practical usage, **front-end developers** can embed buttons or triggers within user-friendly interfaces, allowing users, clients, or stakeholders to generate predictions using the deployed model seamlessly.
`````
``````
---
With the algorithm developed and tested in the development environment, the next crucial step is to deploy it to production, where it can operate in real-world scenarios and provide value to the business. This transition must be carefully managed to ensure a smooth transfer and to minimise any disruptions to ongoing business operations.

To effectively monitor the algorithm's performance, you may also want to track the algorithm's output and metrics through a real-time monitoring system. This proactive approach allows for swift action and minimises potential negative impacts on business operations.

For additional insights on more advanced cloud deployment and monitoring considerations, refer to this [article](https://www.anyscale.com/blog/considerations-for-deploying-machine-learning-models-in-production) by Anyscale — the leading team behind [Ray](https://www.anyscale.com/ray-open-source), the distributed computing platform empowering OpenAI’s ChatGPT.

<span style="color: white;">...</span>

---

<!-- ```{figure} ./images/conclusion.png
---
scale: 12%
align: right
---
{cite}`prinofour2019b`.
```

## Conclusion

By following this comprehensive framework, you will gain the skills needed to leverage algorithms and communicate their value effectively in diverse business scenarios. 

Embrace the power of algorithms to unlock new levels of productivity, efficiency, and decision-making accuracy in the dynamic world of business! -->

## References

```{bibliography}
:style: unsrt
```
<br>