# Connected Patterns

Summary of all patterns covered:

![IMG_1865.jpg](attachment:c17d1758-2fa3-44ea-bc34-1b2a971dd3ab.jpg)![IMG_1862.jpg](attachment:6117d508-dfe5-49d8-bb44-3f0fe42f33e9.jpg)![IMG_1864.jpg](attachment:4fe31739-ab69-4727-8c6b-4b6a93a9072a.jpg)![IMG_1863.jpg](attachment:58f7e17f-7c39-4e2b-a0b5-73840e330e0c.jpg)

## Pattern Interactions

- Design patterns don’t exist in isolation. Many of them are closely related to one another either directly or indirectly and often complement one another

## Patterns Within ML Projects
- Machine learning systems enable teams within an organization to build, deploy, and maintain machine learning solutions at scale. They provide a platform for automating and accelerating all stages of the ML life cycle, from managing data, to training models, evaluating performance, deploying models, serving predictions, and monitoring performance

### ML Life Cycle
- Building a machine learning solution is a cyclical process that begins with a clear understanding of the business goals and ultimately leads to having a machine learning model in production that benefits that goal
- Each of the stages is equally important, and failure to complete any one of these steps increases the risk in later stages of producing misleading insights or models of no value.

![Screenshot 2024-10-05 at 13.32.47.png](attachment:f04fce1c-ba6b-4717-aac6-00326c0e7222.png)

- The ML life cycle consists of three stages, as shown above: discovery, development, and deployment. There is a canonical order to the individual steps of each stage. However, these steps are completed in an iterative manner and earlier steps may be revisited depending on the outcomes and insights gathered from later stages.

#### Discovery

- Machine learning exists as a tool to solve a problem. The discovery stage of an ML project begins with defining the business use case
- This is a crucial time for business leaders and ML practitioners to align on the specifics of the problem and develop an understanding of what ML can and cannot do to achieve that goal.
- It is important to keep sight of the business value through each stage of the life cycle. Many choices and design decisions must be made throughout the various stages, and often there is no single “right” answer. Rather, the best option is determined by how the model will be used in support of the business goal
- For a production model built for a corporate organization, success is governed by factors more closely tied to the business, like improving customer retention, optimizing business processes, increasing customer engagement, or decreasing churn rates.
- There could also be indirect factors related to the business use case that influence development choices, like speed of inference, model size, or model interpretability. Any machine learning project should begin with a thorough understanding of the business opportunity and how a machine learning model can make a tangible improvement on current operations.


- A successful discovery stage requires collaboration between the business domain experts as well as machine learning experts to assess the viability of an ML approach.
- It is crucial to have someone who understands the business and the data collaborating with teams that understand the technical challenges and the engineering effort that would be involved
- If the overall investment of development resources outweighs the value to the organization, then it is not a worthwhile solution
- During the discovery phase, it is important to outline the business objectives and scope for the task. This is also the time to determine which metrics will be used to measure or define success.
- Creating well-defined metrics and key performance indicators (KPIs) at the onset of an ML project can help to ensure everyone is aligned on the common goal.
- Machine learning is not the answer to all problems, and sometimes a rule-based heuristic is hard to beat. Development shouldn’t be done for development’s sake. A baseline model, no matter how simple, is helpful to guide design decisions down the road and understand how each design choice moves the needle on that predetermined evaluation metric


- As beneficial as a solution might be, if quality data is not available, then there is no project. Or perhaps the data exists, but because of data privacy reasons, it cannot be used or must be scrubbed of relevant information needed for the model
- The data guides the process and it’s important to understand the quality of the data that is available. What are the distributions of the key features? How many missing values are there? How will missing values be handled? Are there outliers? Are any input values highly correlated? What features exist in the input data and which features should be engineered? Many machine learning models require a massive dataset for training. Is there enough data? How can we augment the dataset? Is there bias in the dataset?


- Data exploration is a key step in answering the question of whether data of sufficient quality exists. Conversation alone is rarely a substitute for getting your hands dirty and experimenting with the data. 
- Visualization plays an important role during this step. Density plots and histograms are helpful to understand the spread of different input values. Box plots can help to identify outliers. Scatter plots are useful for discovering and describing bivariate relationships. 
- Percentiles can help identify the range for numeric data. Averages, medians, and standard deviations can help to describe central tendency


- Within the discovery stage, it can be helpful to do a few modeling experiments to see if there really is “signal in the noise.” At this point, it could be beneficial to perform a machine learning feasibility study (step 3)
- Just as it sounds, this is typically a short technical sprint spanning only a few weeks whose goal is to assess the viability of the data for solving the problem.
- This provides a chance to explore options for framing the machine learning problem, experiment with algorithm selection, and learn which feature engineering steps would be most beneficial. The feasibility study step in the discovery stage is also a good point at which to create a Heuristic Benchmark