## 1. Provide two examples in which supervised and unsupervised learning can be applied. 
### 1.1. Supervised Learning
#### 1.1.1 Battery lifetime prediction
Battery lifetime prediction refers to the estimation of the remaining useful life of a battery based on various factors such as charge-discharge cycles, temperature, and usage patterns. Accurate prediction is crucial for optimizing battery usage, preventing unexpected failures, and maximizing the overall lifespan of energy storage systems. The opportunities lie in developing advanced predictive algorithms, incorporating real-time data, and leveraging machine learning techniques to enhance accuracy. It can lead to improved battery management strategies, reduced maintenance costs, and increased reliability in various applications, from electric vehicles to renewable energy storage. 

#### 1.1.2 Battery failure detection
Battery failure detection involves identifying abnormalities or malfunctions in a battery system that may lead to a loss of performance, capacity, or safety issues. A machine learning algorithms trained on past failures can help in detecting early signs of battery degradation or faults. A robust failure detection systems can enhance the safety and reliability of batteries in electric vehicles, and ultimately extending the overall lifespan and performance of battery systems.
 
### 1.2 Unsupervised Learning
#### 1.2.1. Anomaly detection
Anomaly detection in electric vehicle (EV) batteries involves identifying abnormal behavior or deviations from expected performance in the battery system. Distinguishing between benign fluctuations and serious faults is critical. Unsupervised learning is essential in this context because anomalies in EV battery behavior can be diverse and may evolve over time, making it challenging to have a comprehensive labeled dataset.

#### 1.2.2. Segmentation of battery operating condition
The electric vehicle batteries will operate in a different conditions related to environment (temperature, humidity, etc), load (passanger, terrain (mountain, plain), usage (long distance, short distance), operating characteristics ( discharge rate, state of health), etc. Clustering methods will be useful in identifying patterns  in the battery operating conditions. 

## 2. What is the difference between regression and classification task?
The goal of Regression and Classification learning is to learn a mapping function from the input to the output based on the provided labeled examples. 

Regression is a type of supervised learning where the goal is to predict a continuous output variable. In regression, the algorithm learns the relationship between input features and a continuous target variable. The goal is to minimize the difference between the predicted values and the actual values. 

Where as, in classification the goal is to predict the categorical class label of a new instance based on its features. The goal is to accurately predict the label of features. 

## 3. Why do you have n-fold cross validation to evaluate models?
n-fold cross validation is used for assesment of performance of the models, and it is done by dividing the data into n equal folds. Then n models are iterative trained on n-1 fold and the remain 1 fold is used for testing. It provides a robust estimate of how well a model generalizes to new, unseen data. There by helping to detecting overfitting by providing a more realistic estimate of a model's performance on new data. It is used to compare different machine learning algorithm and/or to tune the hyper parameters of a model.

It is well suited for limited data as it provides opportunity to use all the avaliable data for training. 

## 4. Find the closes city to city 2 according to the Euclidian distance.

In [1]:
import pandas as pd 
from tasks.nearest_point import PointMap, Point #module to calculte euclidian distance.

#### 4.1 Load city coordinates

In [2]:
df = pd.read_csv('data/raw_data/sample_distance.csv')

In [3]:
points_obj = PointMap()
points_obj.parse_dataframe(df)
points_obj.print_points()

Point(City 1, (12, 8))
Point(City 2, (17, 11))
Point(City 3, (4, 8))
Point(City 4, (17, 14))
Point(City 5, (5, 3))
Point(City 6, (7, 9))


#### 4.2 Find the closes to City 2

In [4]:
closest_point = points_obj.calculate_closest_point(Point('City 2', 17, 11))
closest_point.print_point()

Point(City 4, (17, 14))


## 5. Specify two stopping criteria for training decision trees.
#### 5.1. Maximum tree depth
The complexity of a decision tree increases with its depth. This leads to model overfitting the training data. A criteria based on depth of tree can be used to stop further training of the model.
#### 5.2. Minimum Samples per Leaf 
Providing a minimum number of samples per leaf helps control the granularity of the tree. Nodes with very few samples might capture noise and not represent the true underlying patterns in the data. This criterion helps prevent the creation of overly specific nodes with limited data, promoting more robust generalization.

## 6. Extract classification rules and construct a decision tree.

In [5]:
import pandas as pd

from tasks.decision_tree import DecisionTreeTrainer, FeatureEnginerring
from tasks.constants import DECISION_TREE_TARGET

#### Load data

In [6]:
df = pd.read_csv('data/raw_data/buy_computer.csv', dtype=str)
df.head()

Unnamed: 0,age,income,student,creditRating,buysComputer
0,<=30,high,no,fair,no
1,<=30,high,no,excellent,no
2,30-40,high,no,fair,yes
3,>40,medium,no,fair,yes
4,>40,low,yes,fair,yes


In [7]:
# Instantiate DecisionTreeTrainer
decision_tree_trainer = DecisionTreeTrainer()

X = decision_tree_trainer.preprocessing(df)
y = df[DECISION_TREE_TARGET]

# Train decision tree with cross-validation
accuracy = decision_tree_trainer.train_with_cross_validation(X, y)
print(f'Mean accuracy across all folds {round(accuracy, 4)}')

Mean accuracy across all folds 0.6667


In [8]:
decision_tree_trainer.print_rules(X, y)

Decision Tree Rules:
|--- no <= 0.50
|   |--- age <= 1.50
|   |   |--- class: yes
|   |--- age >  1.50
|   |   |--- creditRating <= 0.50
|   |   |   |--- class: yes
|   |   |--- creditRating >  0.50
|   |   |   |--- class: no
|--- no >  0.50
|   |--- age <= 0.50
|   |   |--- class: no
|   |--- age >  0.50
|   |   |--- age <= 1.50
|   |   |   |--- class: yes
|   |   |--- age >  1.50
|   |   |   |--- creditRating <= 0.50
|   |   |   |   |--- class: yes
|   |   |   |--- creditRating >  0.50
|   |   |   |   |--- class: no

