# **4.2 Load Data**

**Jonathan Choi 2021**

**[Deep Learning By Torch] End to End study scripts of Deep Learning by implementing code practice with Pytorch.**

If you have an any issue, please PR below.

[[Deep Learning By Torch] - Github @JonyChoi](https://github.com/jonychoi/Deep-Learning-By-Torch)

## Loading Data from .csv file

In [51]:
import numpy as np

In [52]:
xy = np.loadtxt('../datasets/data-01-test-score.csv', delimiter=',', dtype=np.float32)

### Take a Moment!

```y_data = xy[:, [-1]]```

brings the all rows of last column with [] array wrapped

In [53]:
x_data = xy[:, 0: -1]
y_data = xy[:, [-1]]

In [54]:
print(x_data.shape)
print(len(x_data))
print(x_data[:5])

(25, 3)
25
[[ 73.  80.  75.]
 [ 93.  88.  93.]
 [ 89.  91.  90.]
 [ 96.  98. 100.]
 [ 73.  66.  70.]]


In [55]:
print(y_data.shape)
print(len(y_data))
print(y_data[:5])

(25, 1)
25
[[152.]
 [185.]
 [180.]
 [196.]
 [142.]]


## Imports

In [56]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [57]:
torch.manual_seed(1)

<torch._C.Generator at 0x2055c8eb8f0>

## Low-level Implementation

In [58]:
#Data
x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)

#Model Initialize
W = torch.zeros((3, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

#Set optimizer
optimizer = optim.SGD([W, b], lr=1e-5)

nb_epochs = 20

for epoch in range(nb_epochs + 1):

    #Hypothesis
    pred = x_train.matmul(W) + b

    #Cost
    cost = torch.mean((pred - y_train)**2)

    #Reduce cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    print('Epoch {:4d}/{} Hypotheis: {} Cost: {:.6f}'.format(epoch, nb_epochs, pred.squeeze().detach(), cost.item()))

Epoch    0/20 Hypotheis: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.]) Cost: 26811.960938
Epoch    1/20 Hypotheis: tensor([60.3300, 72.5121, 71.4468, 77.8114, 55.3021, 40.7728, 58.2450, 43.1799,
        67.7685, 62.7711, 56.1159, 55.3320, 73.8140, 61.3605, 58.5129, 73.5830,
        58.4375, 69.8998, 70.3709, 62.9651, 68.3015, 68.0264, 65.1199, 60.8261,
        75.1500]) Cost: 9920.530273
Epoch    2/20 Hypotheis: tensor([ 97.0136, 116.6032, 114.8901, 125.1249,  88.9286,  65.5651,  93.6612,
         69.4359, 108.9755, 100.9401,  90.2373,  88.9771, 118.6964,  98.6703,
         94.0921, 118.3256,  93.9699, 112.4028, 113.1596, 101.2509, 109.8326,
        109.3903, 104.7163,  97.8108, 120.8450]) Cost: 3675.298828
Epoch    3/20 Hypotheis: tensor([119.3189, 143.4130, 141.3056, 153.8940, 109.3752,  80.6404, 115.1964,
         85.4014, 134.0320, 124.1496, 110.9851, 109.4354, 145.9869, 121.3560,
        115.7265, 145.5315, 115

## High-level Implementaion with ```nn.Module```

In [59]:
class MultivariateLinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 1)

    def forward(self, x):
        return self.linear(x)

In [60]:
x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)

model = MultivariateLinearRegressionModel()

optimizer = optim.SGD(model.parameters(), lr=1e-5)

nb_epochs = 20

for epoch in range(nb_epochs + 1):

    #Hypothesis
    pred = model(x_train)

    #cost
    cost = F.mse_loss(pred, y_train)

    #Reduce Cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    print('Epoch {:4d}/{} Hypothesis: {} Cost: {:.6f}'.format(epoch, nb_epochs, pred.squeeze().detach(), cost.item()))

Epoch    0/20 Hypothesis: tensor([-6.7933, -4.8968, -6.5155, -7.3361, -2.6660, -1.8403, -6.6781, -6.7331,
        -4.0525, -3.9151, -5.2111, -3.7514, -6.4568, -4.7845, -6.2377, -5.4874,
        -3.2482, -8.9763, -6.6201, -6.2942, -7.3238, -5.0026, -7.1896, -6.2176,
        -5.5024]) Cost: 28693.490234
Epoch    1/20 Hypothesis: tensor([55.6147, 70.1117, 67.3916, 73.1548, 54.5398, 40.3360, 53.5729, 37.9342,
        66.0489, 61.0169, 52.8371, 53.4856, 69.8990, 58.6889, 54.2903, 70.6290,
        57.2013, 63.3310, 66.1742, 58.8394, 63.3299, 65.3660, 60.1730, 56.7034,
        72.2351]) Cost: 10618.750000
Epoch    2/20 Hypothesis: tensor([ 93.5619, 115.7207, 112.3309, 122.0975,  89.3237,  65.9814,  90.2090,
         65.0951, 108.6743, 100.4994,  88.1336,  88.2888, 116.3270,  97.2834,
         91.0948, 116.9119,  93.9569, 107.2983, 110.4365,  98.4438, 106.2914,
        108.1538, 101.1332,  94.9621, 119.5033]) Cost: 3936.015381
Epoch    3/20 Hypothesis: tensor([116.6357, 143.4532, 139.6562, 151

## Dataset and DataLoader

Let's create our custom dataset with ```Dataset``` from ```torch.utils.data```.

We can create custom dataset class by inheritiing ```Dataset```.

We should implement 2 magic method:

- ```__len__()```: return the total length of dataset

- ```__getitem()__```: return the data of corresponding index with converting torch.Tensor().

In [61]:
from torch.utils.data import Dataset

class CustomDataset(Dataset):
    def __init__(self, x_data, y_data):
        super().__init__()
        self.x_data = x_data
        self.y_data = y_data

    def __len__(self):
        return len(self.x_data)

    def __getitem__(self, idx):
        x = torch.FloatTensor(self.x_data[idx])
        y = torch.FloatTensor(self.y_data[idx])

        return x, y

dataset = CustomDataset(xy[:, :-1], xy[:, [-1]])

- ```batch_size```: the size of the each minibatch. Conventionally, we set the number of squares of 2 (16, 32, 64, 128, 256, 512 ...)

- ```shuffle=True```: Mix the dataset for every epochs to change the sequence of data.

In [62]:
from torch.utils.data import DataLoader

dataloader = DataLoader(
    dataset,
    batch_size = 2,
    shuffle = True,
)

In [66]:
nb_epochs = 20

for epoch in range(nb_epochs+1):
    for batch_idx, samples in enumerate(dataloader):
        x_train, y_train = samples

        #Hypothesis
        pred = model(x_train)

        #cost Function
        cost = F.mse_loss(pred, y_train)

        #Reduce cost
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()
        
        print()
        print('Epoch {:3d}/{} Batch {:3d}/{} \n y: {} Hypothesis: {} \n Cost: {:.6f}'.format(epoch, nb_epochs, batch_idx + 1, len(dataloader), y_train.squeeze(), pred.squeeze().detach(), cost.item()))

torch.float32

Epoch   0/20 Batch   1/13 
 y: tensor([175., 149.]) Hypothesis: tensor([165.5893, 148.0238]) 
 Cost: 44.757126
torch.float32

Epoch   0/20 Batch   2/13 
 y: tensor([141., 196.]) Hypothesis: tensor([144.7430, 201.2185]) 
 Cost: 20.621504
torch.float32

Epoch   0/20 Batch   3/13 
 y: tensor([164., 185.]) Hypothesis: tensor([163.1870, 187.0544]) 
 Cost: 2.440781
torch.float32

Epoch   0/20 Batch   4/13 
 y: tensor([115., 142.]) Hypothesis: tensor([108.5007, 143.2139]) 
 Cost: 21.856876
torch.float32

Epoch   0/20 Batch   5/13 
 y: tensor([177., 184.]) Hypothesis: tensor([174.5395, 189.0653]) 
 Cost: 15.855763
torch.float32

Epoch   0/20 Batch   6/13 
 y: tensor([175., 183.]) Hypothesis: tensor([175.0678, 176.7762]) 
 Cost: 19.369955
torch.float32

Epoch   0/20 Batch   7/13 
 y: tensor([148., 192.]) Hypothesis: tensor([150.3080, 190.7705]) 
 Cost: 3.419217
torch.float32

Epoch   0/20 Batch   8/13 
 y: tensor([152., 141.]) Hypothesis: tensor([157.8540, 144.4617]) 
 Cost: 23.1

Or we can use ```TensorDataset``` from ```torch.utils.data```.

```torch.utils.data.Dataset``` is more compatible for custom dataset, which should be set with various custom setting, otherwise, we can just use ```TensorDataset``` if we have x_train and y_train.

In [64]:
from torch.utils.data import TensorDataset

train_ds = TensorDataset(torch.FloatTensor(xy[:,:-1]), torch.FloatTensor(xy[:, [-1]]))

#train_x, train_y = train_ds <--- Error
#too many values to unpack (expected 2)
# https://www.pythonpool.com/valueerror-too-many-values-to-unpack-expected-2-solved/

train_x, train_y = zip(*train_ds)

train_dl = DataLoader(train_ds, batch_size = 2)



In [65]:
nb_epochs = 20

for epoch in range(nb_epochs+1):
    for batch_idx, samples in enumerate(train_dl):
        x_train, y_train = samples

        #Hypothesis
        pred = model(x_train)

        #cost Function
        cost = F.mse_loss(pred, y_train)

        #Reduce cost
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()
        
        print()
        print('Epoch {:3d}/{} Batch {:3d}/{} \n y: {} Hypothesis: {} \n Cost: {:.6f}'.format(epoch, nb_epochs, batch_idx + 1, len(dataloader), y_train.squeeze(), pred.squeeze().detach(), cost.item()))


Epoch   0/20 Batch   1/13 
 y: tensor([152., 185.]) Hypothesis: tensor([150.7479, 184.4272]) 
 Cost: 0.947925

Epoch   0/20 Batch   2/13 
 y: tensor([180., 196.]) Hypothesis: tensor([180.4276, 196.4123]) 
 Cost: 0.176412

Epoch   0/20 Batch   3/13 
 y: tensor([142., 101.]) Hypothesis: tensor([141.7346, 104.8539]) 
 Cost: 7.461353

Epoch   0/20 Batch   4/13 
 y: tensor([149., 115.]) Hypothesis: tensor([145.5017, 106.4138]) 
 Cost: 42.980682

Epoch   0/20 Batch   5/13 
 y: tensor([175., 164.]) Hypothesis: tensor([174.6315, 161.9280]) 
 Cost: 2.214495

Epoch   0/20 Batch   6/13 
 y: tensor([141., 141.]) Hypothesis: tensor([143.1907, 142.6220]) 
 Cost: 3.714942

Epoch   0/20 Batch   7/13 
 y: tensor([184., 152.]) Hypothesis: tensor([187.6170, 156.3239]) 
 Cost: 15.889662

Epoch   0/20 Batch   8/13 
 y: tensor([148., 192.]) Hypothesis: tensor([146.6165, 186.4181]) 
 Cost: 16.535614

Epoch   0/20 Batch   9/13 
 y: tensor([147., 183.]) Hypothesis: tensor([149.8710, 175.3498]) 
 Cost: 33.3840