<div class="alert alert-block alert-info" style="margin-top: 20px">

      
| Name | Description | Date
| :- |-------------: | :-:
|Reza Hashemi| 1st PyTorch DataSets  | On 23rd of August 2019 | width="750" align="center"></a></p>
</div>

# Generating Data in PyTorch
- Generating data from NumPy array
- Generating data using custom DataSet and DataLoaders

In [0]:
!pip3 install torch torchvision



In [0]:
import numpy as np
import pandas as pd
import torch, torchvision
torch.__version__

'1.1.0'

## 1. Generating data from NumPy array
- Import data using Pandas or NumPy and convert into Torch tensors

### Import data

In [0]:
# retrive iris dataset from UCI repository using read_table() in Pandas
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
data = pd.read_table(url, header = None, sep = ",")
data.columns = ["sepal_len", "sepal_wid", "petal_len", "petal_wid", "class"]
data.head()

  


Unnamed: 0,sepal_len,sepal_wid,petal_len,petal_wid,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [0]:
# first convert class into categorical values ({0, 1, 2})
data["class"] = data["class"].astype("category").cat.codes
data.head()

Unnamed: 0,sepal_len,sepal_wid,petal_len,petal_wid,class
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [0]:
X_data = data[["sepal_len", "sepal_wid", "petal_len", "petal_wid"]].values.astype("float32")
y_data = data["class"].values.astype("int32")

print(X_data.shape, y_data.shape)

(150, 4) (150,)


### Generating tensors
- Generating tensors directly

In [0]:
# using from_numpy(): infers data type implicitly from numpy array
X_tensor = torch.from_numpy(X_data)
y_tensor = torch.from_numpy(y_data)

print(X_tensor.type(), y_tensor.type())
print(X_tensor.size(), y_tensor.size())

torch.FloatTensor torch.IntTensor
torch.Size([150, 4]) torch.Size([150])


In [0]:
# assigning data type explicitly
X_tensor = torch.FloatTensor(X_data)
y_tensor = torch.LongTensor(y_data)

print(X_tensor.type(), y_tensor.type())
print(X_tensor.size(), y_tensor.size())

torch.FloatTensor torch.LongTensor
torch.Size([150, 4]) torch.Size([150])


In [0]:
# logistic regression model
model = torch.nn.Linear(X_data.shape[-1], len(set(y_data)))
criterion = torch.nn.CrossEntropyLoss()  
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)  
model

Linear(in_features=4, out_features=3, bias=True)

In [0]:
# model training: whole dataset at a time
for epoch in range(100):
  outputs = model(X_tensor)
  loss = criterion(outputs, y_tensor)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()
  
  if (epoch + 1) % 10 == 0:
    print("Epoch: {}, Loss: {:.5f}".format(epoch + 1, loss.item()))

Epoch: 10, Loss: 4.06050
Epoch: 20, Loss: 3.86774
Epoch: 30, Loss: 3.67591
Epoch: 40, Loss: 3.48540
Epoch: 50, Loss: 3.29671
Epoch: 60, Loss: 3.11048
Epoch: 70, Loss: 2.92754
Epoch: 80, Loss: 2.74892
Epoch: 90, Loss: 2.57589
Epoch: 100, Loss: 2.40987


## 2. Generating data using custom DataSet and DataLoaders
- Using customized dataset and dataloaders makes it easier to manage training process (e.g., implementing mini-batch SGD)

In [0]:
class IrisDataset(torch.utils.data.Dataset):
  def __init__(self):
    # import and initialize dataset
    url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
    data = pd.read_table(url, header = None, sep = ",")
    data.columns = ["sepal_len", "sepal_wid", "petal_len", "petal_wid", "class"]
    data["class"] = data["class"].astype("category").cat.codes
    
    self.X = data[["sepal_len", "sepal_wid", "petal_len", "petal_wid"]].values
    self.Y = data["class"].values[:, np.newaxis].astype(np.int32)
    
  def __getitem__(self, idx):
    # get item by index
    return self.X[idx], self.Y[idx]
  
  def __len__(self):
    # returns length of data
    return len(self.X)

In [0]:
# create dataset instance
irisdataset = IrisDataset()

print(type(irisdataset))
print(len(irisdataset))

  """


<class '__main__.IrisDataset'>
150


In [0]:
# create dataloader instance
# set batch size to 32 (mini-batch SGD) and shuffle before training 
# if batch_size is set to 1, stochastic gradient descent is implemented 
dataloader = torch.utils.data.DataLoader(irisdataset, batch_size = 32, shuffle = True)
dataloader

<torch.utils.data.dataloader.DataLoader at 0x7f806a6f3f60>

In [0]:
# logistic regression model
model = torch.nn.Linear(4, 3).double()
criterion = torch.nn.CrossEntropyLoss()  
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)  
model

Linear(in_features=4, out_features=3, bias=True)

In [0]:
steps = len(dataloader)

for epoch in range(100):
  for i, (x, y) in enumerate(dataloader):
    outputs = model(x)
    # to match outputs, y should be converted into LongTensor and size should be reduced to 1-d
    # this is because CrossEntropyLoss requires y_target to be 1-d
    loss = criterion(outputs, y.type(torch.LongTensor).view(-1))
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

  if (epoch + 1) % 10 == 0:
    print("Epoch: {}, Loss: {:.5f}".format(epoch + 1, loss.item()))

Epoch: 10, Loss: 2.94175
Epoch: 20, Loss: 2.04952
Epoch: 30, Loss: 1.56159
Epoch: 40, Loss: 1.38190
Epoch: 50, Loss: 1.39200
Epoch: 60, Loss: 1.21061
Epoch: 70, Loss: 1.27424
Epoch: 80, Loss: 1.16360
Epoch: 90, Loss: 1.11826
Epoch: 100, Loss: 1.09301
