## Semi-Supervised Machine Learning
This is a machine learning algorithm which is a combination of supervised and unsupervised machine learning algorithms. It came to play to tackle the challenge of having a supervised machine learning that needs labelled data. In the event that there is a large dataset, it can be costly to label the data.

The idea is to cluster data using unsupervised learning and then use labelled data to label the rest of the unlabelled data.

Types of Unsupervised Learning Algorithms:
- Inductive Learning (It refers to building a learning algorithm that learns from a labeled training set and generalizes to new data.)
- Transductive Learning (The goal is to transduce information from labeled training datasets to available unlabeled data)

Application of Semi-Supervised Learning:
- Speech Analysis
- Internet Content Classification
- Protein Sequence Classification

Sources:<br>
https://www.geeksforgeeks.org/ml-semi-supervised-learning/
https://towardsdatascience.com/semi-supervised-machine-learning-explained-c1a6e1e934c7
https://machinelearningmastery.com/semi-supervised-learning-with-label-propagation/

In [15]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.semi_supervised import LabelPropagation

In [2]:
X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, random_state=1)

In [3]:
X

array([[ 0.86341137, -0.91235445],
       [-0.53099717,  0.90118241],
       [ 0.98277596, -1.59111159],
       ...,
       [ 1.33019532,  3.72180951],
       [-1.01084076,  0.42633933],
       [-1.00873243,  1.24540194]])

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5, random_state=1, stratify=y)

In [10]:
X_train_lab, X_test_unlab, y_train_lab, y_test_unlab = train_test_split(X_train, y_train, test_size=.5, random_state=1, stratify=y_train)

In [12]:
lr = LogisticRegression()
lr.fit(X_train_lab, y_train_lab)

LogisticRegression()

In [13]:
lrPred = lr.predict(X_test)

In [14]:
lrScore = accuracy_score(y_test, lrPred)
lrScore

0.848

In [17]:
X_train_mixed = np.concatenate((X_train_lab, X_test_unlab))

In [18]:
noLabel = [-1 for _ in range(len(y_test_unlab))]

In [19]:
y_train_mixed = np.concatenate((y_train_lab, noLabel))

In [20]:
lp = LabelPropagation()
lp.fit(X_train_mixed, y_train_mixed)

LabelPropagation()

In [21]:
lpPred = lp.predict(X_test)

In [23]:
lpScore = accuracy_score(y_test, lpPred)
lpScore

0.856

In [24]:
transLabels = lp.transduction_

In [25]:
lr = LogisticRegression()
lr.fit(X_train_mixed, transLabels)

LogisticRegression()

In [26]:
lrPred = lr.predict(X_test)

In [27]:
lrScore = accuracy_score(y_test, lrPred)