# End-To-End Machine Learning Project

## 1. Definition of End-To-End Machine Learning Project

A term ***"End-To-End "*** refers to a complete process from beginning to end; in other words, this term refers to all the steps and procedures to complete the project from idea to implementation. 

A Machine learning *(ML)* is ***"a subfield of artificial intelligence *(AI)* for the development and implementation of computer systems with the ability to learn and adapt from data using ***algorithms*** and ***statistical models*** to analyze and draw conclusions patterns without following explicit instructions "***.

The Artificial intelligence systems are used to *mimic* human perception and intelligence* to solve/achieve complex tasks.

***Therefore it could be said that***:\
*The **machine learning project** starts with the **data** of the problem statement and ends the **semi-human-intelligence model** to solve and achieve the related problem and task*.


To achieve the best goals of the machine learning project, a *systematic approach* must be followed to ensure that the ***checklist*** of the ML model building process has already been realized and fulfilled. This systematic approach of the checklist is called ***Machine Learning Life-Cycle***.


## 2. Types of End-To-End Machine Learning Projects

To determine the type of end-to-end machine learning project, the type/category of machine learning system used to realize this project should be identified.\
There are several ***criteria*** for classifying the type of ***machine learning system*** as follows:
1. `IF` it <u>does</u> `OR` <u>does not</u> require ***Human Supervision/Intervention*** during the ***Modeling*** phase, `THEN` it is of the ***Supervised***, ***Unsupervised***, ****Semisupervised***, `OR` ***Reinforcement*** type.

2. `IF` it <u>can</u> `OR` <u>can not</u> ***learn incrementally on the fly***, `THEN` there is the ***Offline/Batch*** `OR` ***Online*** type.

3. `IF` the ***Learnability*** depends on the ***Comparison*** <u>principle</u> of the unknown patterns/instances with the known patterns/instances `OR` on the ***Modeling*** <u>principle</u> to accomplish the corresponding learning tasks,`THEN` there is the ***Instance-based*** `OR` ***Model-based*** type.

These criteria can be taken together to classify a machine learning system like a ***spam filtering*** system, where this system can be classified as ***online, model-based, and supervised***.

### 2.1 Criterion of Human Supervision/Intervention 

In general, the ***task*** of the machine learning system is to `map\relate` the input space, i.e., the ***data space***, to the output space, i.e., the ***target space***, in order to achieve the related tasks such as prediction.

Although a machine learning system can be understood as a *computer program without being explicitly programmed*, this computer program *necessarily* needs the ***instructions of the mapping*** to perform the corresponding tasks. These mapping instructions can be built by the ***Human Supervision/Intervention*** in the form of either ***<u>solution</u>*** `OR` ***<u>action</u>***. In the case of solution-based mapping instructions, the concepts ***Training*** and ***Labeling*** are used to realize this mapping process; where the ***Training concept*** must <u>necessarily</u> be applied, while the ***Labeling concept*** is applied ***<u>relatively</u>*** as in this scale:
* `IF` Full application, `THEN` the type of the machine learning is ***Supervised***.
* `IF` partial application, `THEN` the type of the machine learning is ***Semisupervised***.
* `IF` Never applied, `THEN` the type of the machine learning is ***Unsupervised***.

<img src="attachment:45d72e6a-48fd-48b6-9971-e2e0c856854b.png"  width="60%" height="60%">\
Fig 1 Labeling concept in the Supervised, Unsupervised and Semisupervised Tpyes

The assignment of labels is realized by the ***Annotation*** procedure.

<img src="attachment:1c1fb981-3f0d-43bd-ab14-8752bcd820c6.png"  width="60%" height="60%">\
Fig 2 Annotation procedure of the considered data

In the ***Action*** form of mapping instructions, the learning system is viewed as an ***Agent*** that necessarily requires a ***Policy*** to perform the appropriate tasks.  This policy is based on the ***Penalty-Reward*** principle in the action with the considered ***Environment***; <u>Penalty and Reward is for a negative and positive action, respectively</u>. In this case thetype of the machine learning is **Reinforcement*** 

### 2.2 Criterion of Incrementally on the Fly

In the context of this criterion, the following question should be answered:
***Can a machine learning system learn incrementally from a stream of incoming data?***


1. In case of a `"Yes"` answer, <u>THEN</u> the type of machine learning system is ***online*** and this learning is ***incremental***, where the system is sequentially fed with the training data on the fly, either ***individually*** or ***in small groups, called mini-batches***, so that this system is able to learn from the new data. <u>Therefore, the resulting model can be updated incrementally for each new mini-batch data<u/>.

2. In case of answer `"NO"`, <u>THEN</u> the type of machine learning system is ***Batch*** and this learning is ***Batch***, where the system must be trained with all available data. <u> Therefore, a new model is created for each batch without considering the previous ones <u/>.


### 2.3 Criterion of Learnability Principle

This criterion provides the general framework for answering the question of how the machine learning system should learn from data.
The possible pronciples for learning capability could be as follows 

1. `IF` the system learns the samples/instances of the training dataset by building a ***Similarity Measurement***, `THEN` the type of this machine learning is\
***Instance-based learning***; this measurement is used to classify/predict the new sample.

2. `IF` the system learns the samples/instances of the training dataset by building a ***Mathematical or Statistical Model***. `THEN` the type of this machine learning is ***Model-based learning***; this model is used to classify/predict the new sample.

## 3. Machine Learning (ML) Terminology

In this chapter, the terminology of machine learning is introduced in general. This introduction gives a general overview of the terms used in the machine learning world. Of course, each term has a deep and broad world that is beyond the scope of this tutorial.

### 3.1 ML Dataset
Machine learning is characterized as a `practical tool/framework` of ***statistical learning theory***, which is statistics and functional analysis to find/create a predictive function, which is a model to draw conclusions from the considered <u>***`Dataset`***<u/> to achieve the related successes such as classification, prediction, etc.




### 3.1.1 Definition

The ***dataset*** can be defined as follows:\
*"A collection of related sets of information consisting of individual elements but capable of being processed by a computer as a unit."*\
In the context of machine learning, the dataset is the framework from which learning can be done to perform the related tasks (see Fig 5).

The common types of dataset are
1. Text data
2. Image data
3. Audio data
4. Video data
5. Numeric data



In general, for computer and related manipulation and analysis softwares, the dataset is considered and stored as a matrix: 
* Its ***Rows***: `Records`, `Samples`, `Instances`, or `Observations` captured about the use case/problem statement.
* Its ***columns***: `'Variables/Attributes'`, i.e., characteristics (such as name, age, pressure, etc.) of the input space, and characteristics of the output space such as (patient or non-patient), which are measured for each observation and may vary from one observation to another.

The variable or attribute of the output space is referred to as ***target***, which is the historical data used to learn patterns and discover relationships between other features in the input space of the data set and the target feature in the output space, which is the final output to be predicted, during ***Modeling Phase*** of the machine learning system.

<u> It is important to note that the variables/attributes of the dataset are referred to as "***Features***" when ***they are coupled by their measurements*** <u/>.


### 3.1.2 Splitting of the original dataset into Training, Validation and Testing (Holdout) sub-datasets

The presence of the dataset is the basic and necessary "if and only if" condition for the completion of the machine learning process; however, using the entire original dataset could lead to the following problems:

1. Overfitting: the desired algorithm/model works very well during the modeling phase by mapping the input domain (features) to the output domain (targets); however, this performance is not achieved in the production phase, i.e., the mapping process of the input/features and output/target domains will fail when this model is applied to the unknown data. This model is very complex and follows the relevant and irrelevant relationships within the dataset.
2. Underfitting: the desired model would not work well in either the modeling or production phases. This model is very simple and follows the easy relationships within the dataset.

The machine learning modeling aims to create a ***"Good Fit/Robust"*** model that maps the most correct and relevant relationships in the dataset and is also ***robust*** to noise relationships; it is also able to achieve the mapping process of input-output domains for any dataset generated from the same source as the modeling phase dataset 

For this reason, the ***Splitting*** of the original dataset must be done in such a way that the following partial data sets are created:
1. Training dataset:          it is used to design the model.
2. Validation dataset:        it is used to refine the model.
3. Testing (Hodlout) dataset: it is used to test the model

The Splitting techniques are as follows:
1. Fixed proportion: 
It returns a selected percentages of the entire data for training, valdiation and testing (Holdout) (60%, 20%, and 20%), respectively 

2. Cross Validation:
This technique aims to divide the original dataset into K distinct subsets, called folds, with K-1 folds for training and validation and one fold for testing. The training, validation and testing process is repeated `K` times, with the roles of each fold being exchanged between these processes at each iteration 

The cross validation technique can be:
* K-fold
* Repeated K-Fold
* Leave One Out (LOO)
* Leave P Out (LPO)
* Random permutations cross-validation a.k.a. Shuffle & Split
* Stratified k-fold
* Stratified Shuffle Split
* Group k-fold
* etc.

### 3.2 ML Algorithm and Model

In the machine learning it is very important to distinguish between two terms, namely ***Algorithm*** and ***Model***.

### 3.2.1 Defintions

*A machine learning **algorithim** is like a procedure run on data to find patterns and rules which are stored in and used to create a machine learning **model** which is like a program that can be used to make predictions.*

The `machine learning algorithm` can be mathematically defined as a statistical, correlative, mapping, and approximation tool. 
A Training/learning means adjusting and readjusting the parameters of this tool as much as possible to obtain the best representation of the considered input-output/features-target spaces.
The combination of ***`Predictive algorithm`***, which is composed of the machine learning algorithm and the correlation rules of the input-output spaces, and the ***related trained Parameters*** forms a `Machine Learning Model` 

For example, in the case of the ***linear regression system***, the algorithm and the model are as follows:
1. `Algorithm`: it is about finding a set of (Minus-Or-Plus Sign) coefficients that minimizes the error in the training dataset and. 
2. `Model`: it is the vector of coefficients as the trained parameters and the predictive algorithm is the linear combination of the summation (\$\sum$) of the multiplication (of the coefficients and the features/attributes) of the new sample to generate the corresponding numerical target of the output space.

In the machine learning landscape, there are two types of parameters as follows: 
1. ***Learning Parameter***, *also known as `Hyperparameters`*:
It is of the learning algorithm, not of the model.

2. ***Model Parameter***, *shortly `Parameters`*:
It is a configuration variable internal to the model, whose value can be estimated from data.



### 3.2.2 Performance and Evaluation

By splitting the raw/original dataset into subsets for training, validation and testing, the problems of overfitting and underfitting could be  usually avoided (see Section 3.1.2 and Fig 9); however, these problems can be also occured during the learning phase of the ML algorithm using the training and validation subsets.\
The terms ***overfitting*** and ***underfitting*** should be understood in the context of the ***Complexity of the intended model*** and the  ***`Errors`*** of ***Generalization/Test/Out-Of-Sample***, ***Bias***, ***Variance***, ***Irreducible*** and ***Training***  as follows (see Fig 14):
1. Complexity of ML Model:
   * Defiention:\
     *Model complexity is a measure of how accurately a machine learning model can predict unseen data, as well as how much data the model needs to see in order to make good predictions*.
   * Importance:\
   it is very imporatnt keyword/concept to answer on `how generalizable a model is; namely, how well the model can be used to make predictions on new, unseen data.` 
2. Generalization/Test Error/Out-Of-Sample error:
   * Definition:\
   `A Generalization error is an error (difference between actual and predicted values) that results from applying the designed model to the previously unseen data set`.
   * Components:\
   The components of generalization error are as follows:
     1) Bias:\
     *It is a type of error that occurs due to incorrect assumptions made during the training of the learning algorithm (e.g., assuming that the data are linear; however, in reality they are quadratic); this error could be defined as the difference between the predicted values and the actual or expected values during the prediction of the new data.*
     2) Variance:\
     *It is a type of error caused by the sensitivity of a model to small variations in the training set to indicate the amount of variance in the prediction when different training data are used during the training phase.*
     
     3) Irreducible Error:\
     This type will always be present in the model due to the noise of the data itself.
3. Training Error:\
This type can be estimated by calculating the amount of deviation between the actual and predicted values when the designed model is used to predict the data set used to train it.\
As shown in Fig. 14, simple models and a large amount of data cause the generalization and training errors to be more or less similar; with more complex models and fewer examples, the training error is expected to decrease, but the generalization gap increases, this behaviour will generate the ***Overfitting*** probelm.


The four different Combinations of Bias-Variance could created as follows:
1. Low-Bias, Low-Variance:\
it is  an ideal machine learning model. However, it is not possible practically.
2. Low-Bias, High-Variance:\
its model predictions are inconsistent and accurate on average. This combination apperas when  the large number of parameters is used during the learning phase. This combination leads to the ***Overfitting*** problem.
3. High-Bias, Low-Variance:\
its predictions are consistent but inaccurate on average. This combination occurs uses few numbers of the parameter. It leads to ***Underfitting*** problem. 
4. High-Bias, High-Variance:\
its predictions are inconsistent and also inaccurate on average.\

`Since bias and variance are` ***`reducible`*** `errors, it is important to pay attention to bias and variance when building the machine learning model to avoid over-fitting or under-fitting the model. If the model is very simple and has few parameters, it may have low variance and high bias. On the other hand, if the model has a large number of parameters, it will have a high variance and a low bias. Thus, it is necessary to balance between bias and variance errors, and this balance between bias and variance errors is called` <u>***`bias-variance trade-off`***<u/>`.

In order to achieve the ***Bias-Variance Trade-off*** (green rectangle in Fig 13) the following technqiues can be applied during the building phase of the ML model:
1. Data augmentation
2. Data Splitting
3. Feature selection
4. Regularization
5. Ensembling
6. Early stopping 

Comparing the performance of different possible models to select the best model to achieve the desired goals of the machine learning system is called ***Model Selection***. The model selection phase can be performed using comparison metrics known as ***Evaluation Metrics***. The evaluation metrics for classification learning task are as follows:
1. Confusion Matrix:\
A ***Confusion Matrix*** is used to represent the performance of the machine learning model on classification tasks, especially binary classification. The term "confusion" means for which class the model was not able to correctly distinguish.
For binary classification tasks, this evaluation matrix is a square matrix.

The elements of the binary confusion matrix are:
1. Rows are the real/actual classes of the samples of the considered dataset.
2. Columns are the predicted classes (prediction results of the used machine learning model) of the samples of the considered dataset.

The Terminology of the binary confusion matrix is:
* True: it indicates  a “Matching” case of the actual and predicated classes 
* False: it indicates  a “Mismatching” case of the actual and predicated classes
* True Positive (*TP*): How many times a model correctly classifies a positive sample as Positive?
* False Negative (*FN*): How many times a model incorrectly classifies a positive sample as Negative?
* False Positive (*FP*): How many times a model incorrectly classifies a negative sample as Positive?
* True Negative (*TN*): How many times a model correctly classifies a negative sample as Negative?

The confusion matrix can be also applied in the case of the multiclass tasks.

2. Accuracy:\
An ***Accuracy*** is answer of the following question/evaluation issue:\
`How correct generally is the machine learning model for the corresponding classification tasks?`\
The accuracy of the confusion matrix can be calculated by using this formal:
\begin{equation}
Accuracy=\frac{TP+TN}{TP+FN+TN+FP}
\end{equation}\
The ***Accuracy*** is used *when the True Positives and True Negatives are more important*; `it is a better metric for Balanced Data`.

3. Precision:\
A ***Precision*** is answer of the following question/evaluation issue:\
`How reliable is the machine learning model at predicting a specific category/class?`\
The precision of the confusion matrix can be calculated by using this formal:
\begin{equation}
Precision=\frac{TP}{TP+FP}
\end{equation}\
The ***Precision*** is a useful metric in cases (spam detection, music recommendation, or recommendation systems), where FP is a higher concern than FN, `in other words, whenever False Positive is much more important use Precision.`

4. Recall/Sensitivity/True Positive Rate (TPR):\
A ***Recall*** is answer of the following question/evaluation issue:\
`How ability is the machine learning model at predicting a specific category/class?`\
The recall of the confusion matrix can be calculated by using this formal:
\begin{equation}
Recall=\frac{TP}{TP+FN}
\end{equation}\
The ***Recall*** is a useful metric in cases (e.g. medical, security, or more careful about the decision making process), where the capturing as many positives as possible should be reached, `in other words, whenever False Negative is much more important use Recall`.`

5. F1 Score:\
A F1 Score is harmonic mean of the precision and recall as in the following formal:
\begin{equation}
F1 Score=2*\frac{precision*recall}{precision+recall}=\frac{TP}{TP+\frac{FN+FP}{2}}
\end{equation}
The F1 Score is used, where the precision-recall trade-off/balance should be achieved, because in these cases there no clear decision, which is important precision or recall. 
The ***F1-Score*** is used `when the False Negatives and False Positives are important`. <u>The F1-Score is a better metric for Imbalanced Data.<u/>

6. Specificity/True Negative Rate (TNR):\
A specificity/TNR is the ration of the negative samples/instances that are correctly classified and predicated as negative.
The specificity of the confusion matrix can be calculated by using this formal:
\begin{equation}
Specificity=\frac{TN}{TN+FP}=1-False Positive Rate (FPR)
\end{equation}

7. AUC-ROC (Area under Curve (AUC) -Receiver-Operating characteristic (ROC)):\
A ***ROC*** shows the tradeoff between ***sensitivity (TPR)*** and ***specificity (1 - FPR)** to compare the performance of machine learning models. If the ROC of the model is ***closer*** to the ***upper left corner***, `it means better performance`, as shown in Fig 16.


The diagonal dashed line in Fig 16 is considered the ***baseline***, i.e., the simplest machine learning algorithm that provides predictions without   complex calculations, and represents the algorithm's performance with 50%, since FPR = TPR.

`Whenever the degree of curvature of the ROC is greater than 45 degrees, this indicates better performance of the model`.\




### 3.3 ML Tasks and related ML Algorithms

### 3.3.1 Classification

A Classification is <u>supervised learning<u/> task that involves predicting a class label.

*The output is the answer if the input instance belongs to a specific class (Yes/No), or to select one from a finite number of choices*.\
A classification is a supervised learning task that involves a prediction-based selection process of a class/label of the output/label space. This classification task can be ***Binary*** (if the size of the output space =2) or ***Multi*** (if the size of the output space >=3).\
Fig 18 shows the classification of Emails as binary class task (left) and classification of the digit image into 10 classes (from 0 to 9) as multiclass task (right).
    
<img src="attachment:cc5992cf-1ef2-41ba-a293-accd51717ef1.png"  width="80%" height="80%">\
Fig 18 Classification of Emails as binary class task (left) and classification of the digit image as multiclass task (right)

Examples of classification algorithms: 
* K-Nearest Neighbors
* Decision Trees
* Random Forest
* Support Vector Machine
* Neural Networks
* etc.


### 3.3.2 Regression

Regression is a <u>supervised learning<u/> task that involves predicting a label in the output space as numerical/continuous target.
    
In the regression task the output of the machine learning is the model as liner/nonlinear combinations of the features/attributes -, so-called predictors such (mileage, age, brand, etc.)), of the input space as prediction of the numerical/continuous target/label in the output space, e.g.:\
\begin{equation}  
    y= ±a*x_1  ± b*x_2 ± c*x_3 ±⋯;
\end{equation}
where:  
$y$= car price, $x_1$=mileage, $x_2$=brand, $x_3$=age, $...$\
$a, b, c, ...$ are coefficients.

Examples of regression algorithms:
* Linear Regression
* Polynomial Regression
* Exponential Regression
* Logistic Regression
* etc.
    
`It is important to know that some regression algorithms can be used for classification as well, and vice versa. For example, Logistic Regression is commonly used for classification, since its output value corresponds to the probability of belonging to a given class (e.g., 20% chance of being banana)`.


### 3.3.3 Clustering

This unsupervised task sorts the inherent groupings in the data to ***clusters***.\
<img src="attachment:1b4b7df7-068d-4b94-9f71-515975fed41b.png"  width="75%" height="75%">\
Fig 19 Clustering task

Examples of clustering algorithms:
* Hierarchical
* K-means
* DBSCAN
* Gaussian Mixture Model
* etc.


### 3.3.4 Anomaly Detection

An **Anomaly detection**, also known as `outlier detection`, is the `process of identifying unexpected items or events in data sets, which differ from the norm` (as illustrated in Fig 20).

<img src="attachment:1f5a8c4a-b16e-4726-b928-af0728bb9059.png"  width="80%" height="80%">\
Fig 20 Anomaly detection examples

Examples of anomaly detection algorithms:
* Bayesian Gaussian Mixture Models
* Principle Components Analysis (PCA)
* Fast-Minimum Covariance Determinant
* One-class SVM
* Isolation Forest
* Local Outlier Factor (LOF)


### 3.3.5 Association Rule Learning

An **Association Rule Learning** problem is where is needed want to discover rules that describe large portions of the considered data, such as people that buy X also tend to buy Y. The associations are relationships between objects. The idea behind association rule mining is to determine rules that can allow identifying which objects may be related to a set of objects. In the association rule mining terminology, each object is defined as individual item.

A common example for association rule mining is basket analysis 

A shopper puts items from a store into a basket. Once there are some items in the basket, it is possible to recommend associated items that are available in the store to the shopper.

Examples of association algorithms:
* Apriori Algorithm
* Eclat Algorithm
* F-P Growth Algorithm


## 4. ML-Libraries, Distribution, Platform and Develpment Environment

Since ***Python*** is the best programming language in the machine learning landscape, all related ML libraries, platforms and distributions are presented in this chapter in the context of this language.

### 4.4.1 Libraries

1. ***P***ython ***Da***ta ***Ana***lysi***s*** Library (Panadas):\
*"pandas is a fast, powerful, flexible and easy to use open source **data analysis** and **manipulation tool**,
built on top of the Python programming language."*(https://pandas.pydata.org/)

<img src="attachment:1d746ff4-466b-4e38-a9bf-73ef8901bb87.png"  width="80%" height="80%">\
Fig 22 Logo of Pandas Librar

2. ***Num***erical ***Py***thon (NumPy)\
*"NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more."* (https://numpy.org/doc/stable/)

<img src="attachment:2eff51d2-3077-430e-9fdc-8dd0f5d33125.png"  width="80%" height="80%">\
Fig 23 Logo of NumPy Librar

3. ***MAT**lab-style ***Plot*** **Lib***rary (Matplotlib)\
*"Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible"*. (https://matplotlib.org/)

<img src="attachment:48ecaded-d30f-4975-87d5-f30a7ec33276.png"  width="80%" height="80%">\
Fig 24 Logo of MatPlotLib Librar

3. Seaborn: statistical data visualization\
*"Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics"*. (https://seaborn.pydata.org/)

<img src="attachment:2762972e-845a-4303-a221-67b225df2395.png"  width="80%" height="80%">\
Fig 25 Logo of Seaborn Librar

4. Scikit-learn (Sklearn):\
"Scikit-learn is an open source data analysis library, and the gold standard for Machine Learning (ML) in the Python ecosystem."(https://scikit-learn.org/stable/)
<img src="attachment:2016033d-8898-48de-a2a0-09cd2aed105d.png"  width="80%" height="80%">\
Fig 26 Logo of Scikit-Learn Librar

5. Keras:\
"*Keras (***Deep learning for humans***) is an API designed for human beings, not machines. Keras follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear & actionable error messages. It also has extensive documentation and developer guides.*"(https://keras.io/)

<img src="attachment:f1cae2ac-350e-47ff-a1db-245a640f9aa6.png"  width="80%" height="80%">\
Fig 27 Logo of Keras

### 4.4.2 Distribution, Platform  and Development Environment

1. *"Anaconda is a distribution of the Python and R programming languages for scientific computing (data science, `machine learning applications`, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment."* (https://www.anaconda.com/products/distribution)
<img src="attachment:0fa061b9-178e-4af6-ad2b-2084a1a0139a.png"  width="80%" height="50%">\
Fig 28 Logo of Anaconda Distribution

For "Getting started with **Annconda Navigator**": (https://docs.anaconda.com/navigator/getting-started/)

<img src="attachment:81268b4f-369c-406a-bb95-d48ebb5935f7.png"  width="80%" height="80%">\
Fig 29 Home Page of Anaconda Navigator

2. *"The Jupyter Notebook is a web-based interactive computing platform. The notebook combines live code, equations, narrative text, visualizations, interactive dashboards and other media."* (https://jupyter.org/)

<img src="attachment:b57588ff-ad44-42d6-bf91-ad00d2e46c1f.png" width="60%" height="60%">\
Fig 30 Logos of JupyterLab and Jupyter Notebook in the Home Page of Anaconda Navigator

For "Getting started with **Jupyter Notebook**": (https://kiran-parte.github.io/aiforall/blog-post-2.html)

3. *"Spyder is a free and open source scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It features a unique combination of the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and beautiful visualization capabilities of a scientific package."* (https://www.spyder-ide.org/)

<img src="attachment:961ecaeb-d328-4e64-ba8b-c3b7b65c1977.png" width="50%" height="40%">\
Fig 31 Logo of Spyder in the Home Page of Anaconda Navigator

<img src="attachment:af42e70a-5532-4e32-92c8-9af63b8e3886.png" width="60%" height="60%">\
Fig 32 Logo of Spyder and its related GUI

4. *"Orange Data Mining is Open source machine learning and data visualization. Build data analysis workflows visually, with a large, diverse toolbox"*. (https://orangedatamining.com/)

<img src="attachment:13a2d184-fb0c-439f-b474-909b1f75b833.png" width="50%" height="40%">\

Fig 33 Logo of Orange Data Mining in the Home Page of Anaconda Navigator

<img src="attachment:755b3551-87ef-4c6b-87dd-f87de20ab283.png" width="50%" height="40%">\

Fig 34 Logo of Orange Data Mining

For "Getting started with **Orange Data Mining**": (https://orange3.readthedocs.io/projects/orange-visual-programming/en/latest/index.html)

## 5. Life-Cycle of End-To-End Machine Learning Project

A Fig 35 shows the steps and processes of the Life-Cycle of End-To-End Machine Learning Project.

This lifecycle is an iterative, systematic approach with multiple steps/processes for building and executing the machine learning project.