In general, a learning problem considers a set of n samples of data and then tries to predict properties of unknown data.
<br>

Learning problems fall into a few categories:
- supervised learning, in which the data comes with additional attributes that we want to predict:
    - classification: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data.
    - regression: if the desired output consists of one or more continuous variables, then the task is called regression.
- unsupervised learning, in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation

Find out more on the following link: https://scikit-learn.org/stable/tutorial/basic/tutorial.html

Scikit-learn provides dozens of built-in machine learning algorithms and models, called estimators. For example:

In [4]:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_log_error

# 1. Load data and split the data into training and test sets

In [None]:
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# 2. instantiate a model
You can set different parameters for your estimators (aka. models). 

In [3]:
# instantiate a model with 
svm = SVC(C=0.1, kernel = 'poly', degree=3)

# 3. Fit the model on the training data

In [None]:
svm.fit(x_train, y_train)

# 4. Get predictions of the trained model on the test data

In [None]:
test_pred = svm.predict(x_test)

# 5. Evaluate the model 
Fitting a model to some data does not entail that it will predict well on unseen data. This needs to be directly evaluated. 

In [None]:
mean_squared_log_error(y_test, test_pred)