# PyTorch Tutorial, one-hot vectors:
#### labels into one-hot vectors 
- Download the iris dataset from the Internet, and extract its features and labels
- Define a callable class for target-transform of labels into one-hot vectors
- Use Lambda to define target-transforms

https://github.com/ostad-ai/PyTorch-Tutorial

In [1]:
# importing the necessary modules
import torch
from torchvision.transforms import Lambda
from torch.utils.data import TensorDataset
from urllib import request

We first download the irisi dataset from the specified address, as shown below:

In [2]:
# loading the Iris dataset
file_url='https://raw.githubusercontent.com/ostad-ai/Machine-Learning/main/iris.csv'
with request.urlopen(file_url) as file:
    iris=file.read().decode('utf-8').splitlines()
# to save in a file

Separating features and labels from the iris dataset. <br>
We can give our data pair (features,labels) to **TensorDataset** to get in the form of **Dataset** in PyTorch.

In [3]:
def iris_features_labels(dataset):
    header=[]; rows=[]
    for line in dataset:
        if line.strip():
            if not header: header=line.split(',')
            else: rows.append(line.split(','))   
    features=[]; labels=[]
    classes=list(set([row[4] for row in rows]))
    for row in rows:
        features.append([float(item) for item in row[:4]])
        labels.append(classes.index(row[4]))
    return features,labels,classes,header
features,labels,classes,header=iris_features_labels(iris)
# converting to tensors
features_tn,labels_tn=torch.tensor(features),torch.tensor(labels)
#converting to tensors
xy=TensorDataset(features_tn,labels_tn)

Let's examine the labels and see how many unique labels we have, which denotes the number of classes.
<br> The label of each data-pair $(features_i,label_i)$ comes from the set {0,1,2}.
<br> But, we want to convert each integer label having value $k$ into a vector of size three, which is zero in all components except the $k$th component which becomes one. This vector is called **one-hot** vector.

In [4]:
xs,ys=xy[:]
print(f'The classes are: {ys.unique().tolist()}')
print(f'The number of classes: {len(ys.unique())}')

The classes are: [0, 1, 2]
The number of classes: 3


For converting an integer label into a **one-hot** vector (assuming labels begin from zero, and we have $m$ classes):
<br> if $label_i=k$ then one-hot$(label_i)=[c_0,c_1,...,c_k,...,c_{m-1}]^T$, such that $c_k=1$, and $c_j=0$ for all $j \neq k$
<br> We can write a **callable** class to define target-transform for converting labels into one-hot vectors.
<br>For introduction of **callable classes**, you may see:
<br>https://raw.githubusercontent.com/ostad-ai/Python-Everything/main/P-E-callable%20instances%20with%20__call__%20method.ipynb

In [5]:
# a callable class for defining target-transform
class TargetTrans1Hot:
    def __init__(self,Nclasses=0):
        self.Nclasses=Nclasses
    def __call__(self,y):
        oneHot=torch.zeros(self.Nclasses,dtype=torch.float)
        oneHot[y]=1
        return oneHot
#  number of classes in our dataset    
Nclasses=len(set(labels))
ttr1hot=TargetTrans1Hot(Nclasses)
y1hot=torch.zeros(len(xy),3)
#converting labels into one-hot vectors
for i, label in enumerate(labels):
    y1hot[i]=ttr1hot(label)
# creating a dataset from features and one-hot vectors for labels
xy1hot=TensorDataset(features_tn,y1hot)

Now, we check the difference beween features and labels before and after one-hot encoding of labels

In [6]:
index=torch.randint(0,len(xy),(1,)).item()
print(f' A sample of original (feature,label): {xy[index]}')
print(f' A sample of (feature,one-hot): {xy1hot[index]}')

 A sample of original (feature,label): (tensor([6.3000, 2.7000, 4.9000, 1.8000]), tensor(0))
 A sample of (feature,one-hot): (tensor([6.3000, 2.7000, 4.9000, 1.8000]), tensor([1., 0., 0.]))


Instead of defining a callable class for target-transform, we may use **Lambda** from **torchvision.transforms**.
<br> In the cells below, we repeat converting labels into one-hot vectors, using Lambda

In [7]:
ttr1hot2=Lambda(lambda y:torch.zeros(Nclasses,dtype=torch.float).index_put([torch.tensor(y)],
                                                                    values=torchtensor(1.)))
y1hot2=torch.zeros(len(xy),3)
#converting labels into one-hot vectors
for i, label in enumerate(labels):
    y1hot2[i]=ttr1hot(label)
# creating a dataset from features and one-hot vectors for labels
xy1hot2=TensorDataset(features_tn,y1hot2)

In [9]:
index=torch.randint(0,len(xy),(1,)).item()
print(f' A sample of original (feature,label): {xy[index]}')
print(f' A sample of (feature,one-hot): {xy1hot2[index]}')

 A sample of original (feature,label): (tensor([6.2000, 2.9000, 4.3000, 1.3000]), tensor(2))
 A sample of (feature,one-hot): (tensor([6.2000, 2.9000, 4.3000, 1.3000]), tensor([0., 0., 1.]))
