<a href="https://colab.research.google.com/github/guilhermelaviola/IntegrativePracticeInDataScience/blob/main/Class07.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Machine Learning (Representation Learning)**
Machine learning, a vital component of artificial intelligence, significantly impacts our daily lives through applications such as recommendation algorithms and facial recognition. A key element is representation learning, which enables machines to understand meaningful data representations for complex tasks like classification and prediction. This learning can be categorized into supervised, unsupervised, and semi-supervised types. Supervised learning employs labeled datasets for accurate mapping of inputs to outputs, while unsupervised learning identifies patterns in unlabeled data, and semi-supervised learning combines both approaches. Practical applications include enhancing user experiences in e-commerce and CRM through behavior analysis for personalized recommendations. Additionally, representation learning's capability to manage both structured and unstructured data empowers automated insights and smarter decision-making across various domains.

In [1]:
# Importing all the necessary libraries and resources:
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.semi_supervised import LabelPropagation
from sklearn.metrics import accuracy_score

## **Loading data**

In [3]:
# Handwritten digits:
digits = load_digits()
X = digits.data # Features
y = digits.target # Labels

# Splitting data into train and test:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

## **Unsupervised Representation Learning (PCA)**

In [4]:
pca = PCA(n_components=20) # Reducing 64 features to 20 latent features
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

print('Original feature dim:', X_train.shape[1])
print('Reduced feature dim:', X_train_pca.shape[1])

Original feature dim: 64
Reduced feature dim: 20


## **Supervised Learning**

In [6]:
clf = LogisticRegression(max_iter=2000)
clf.fit(X_train_pca, y_train)

y_pred = clf.predict(X_test_pca)
print('Supervised PCA + LogReg accuracy:', accuracy_score(y_test, y_pred))

Supervised PCA + LogReg accuracy: 0.9481481481481482


## **Semi-supervised Learning**

In [7]:
# Masking 90% of labels as -1 (unknown):
rng = np.random.RandomState(0)
mask = rng.rand(len(y_train)) < 0.1 # Keeping 10% of labels
y_train_partial = np.copy(y_train)
y_train_partial[~mask] = -1 # Unlabeled

label_prop = LabelPropagation()
label_prop.fit(X_train_pca, y_train_partial)

y_ss_pred = label_prop.predict(X_test_pca)
print('Semi-supervised accuracy:', accuracy_score(y_test, y_ss_pred))

Semi-supervised accuracy: 0.09814814814814815


  probabilities /= normalizer
