<br>

**This jupyter notebook touches on**
* **Pandas dataframe, Numpy array, PyTorch Tensor**
* **PyTorch Dataset method**
* **PyTorch DataLoader method (batch and shuffle)**

<br>

## **Step 1: Import**

In [1]:
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader

<br>

## **Step 2: Read**

In [2]:
data_df = pd.read_csv('step_1.csv')

In [3]:
data_df.head()

Unnamed: 0,X1,X2,y
0,1.1,11.1,21.1
1,1.2,11.2,21.2
2,1.3,11.3,21.3
3,1.4,11.4,21.4
4,1.5,11.5,21.5


<br>

## **Step 3: Convert from Pandas dataframe to Numpy array**

In [4]:
X = data_df[['X1','X2']].to_numpy()
y = data_df[['y']].to_numpy()

In [5]:
X[:5]

array([[ 1.1, 11.1],
       [ 1.2, 11.2],
       [ 1.3, 11.3],
       [ 1.4, 11.4],
       [ 1.5, 11.5]])

In [6]:
y[:5]

array([[21.1],
       [21.2],
       [21.3],
       [21.4],
       [21.5]])

<br>

## **Step 4: Convert from Numpy array to PyTorch tensor**

In [7]:
X = torch.tensor(X)
y = torch.tensor(y).reshape(-1,1)

In [8]:
X[:5]

tensor([[ 1.1000, 11.1000],
        [ 1.2000, 11.2000],
        [ 1.3000, 11.3000],
        [ 1.4000, 11.4000],
        [ 1.5000, 11.5000]], dtype=torch.float64)

In [9]:
y[:5]

tensor([[21.1000],
        [21.2000],
        [21.3000],
        [21.4000],
        [21.5000]], dtype=torch.float64)

**Actually, Pandas dataframe can be converted to PyTorch tensor directly.**

<br>

## **Step 5: Set up custom dataset using torch.utils.data.Dataset**

In [10]:
class CustomDataset(Dataset):
    def __init__(self, X, y):
        self.y = y
        self.X = X
    def __len__(self):
        return len(self.y)
    def __getitem__(self, idx):
        y = self.y[idx]
        X = self.X[idx]
        return X, y

In [11]:
df_dataset = CustomDataset(X, y)

In [12]:
for i, j in df_dataset:
    print(i.numpy(), '\t', j.numpy())

[ 1.1 11.1] 	 [21.1]
[ 1.2 11.2] 	 [21.2]
[ 1.3 11.3] 	 [21.3]
[ 1.4 11.4] 	 [21.4]
[ 1.5 11.5] 	 [21.5]
[ 1.6 11.6] 	 [21.6]
[ 1.7 11.7] 	 [21.7]
[ 1.8 11.8] 	 [21.8]
[ 1.9 11.9] 	 [21.9]
[ 2. 12.] 	 [22.]
[ 2.1 12.1] 	 [22.1]
[ 2.2 12.2] 	 [22.2]
[ 2.3 12.3] 	 [22.3]
[ 2.4 12.4] 	 [22.4]
[ 2.5 12.5] 	 [22.5]
[ 2.6 12.6] 	 [22.6]
[ 2.7 12.7] 	 [22.7]
[ 2.8 12.8] 	 [22.8]
[ 2.9 12.9] 	 [22.9]
[ 3. 13.] 	 [23.]
[ 3.1 13.1] 	 [23.1]
[ 3.2 13.2] 	 [23.2]
[ 3.3 13.3] 	 [23.3]
[ 3.4 13.4] 	 [23.4]
[ 3.5 13.5] 	 [23.5]
[ 3.6 13.6] 	 [23.6]
[ 3.7 13.7] 	 [23.7]
[ 3.8 13.8] 	 [23.8]
[ 3.9 13.9] 	 [23.9]
[ 4. 14.] 	 [24.]


<br>

## **Step 6a: Set up custom dataloader using torch.utils.data.DataLoader - No batch**

In [13]:
df_dataloader = DataLoader(df_dataset)

In [14]:
for i, j in df_dataloader:
    print(pd.DataFrame(zip(i.numpy(), j.numpy()), columns=['X','y']))

             X       y
0  [1.1, 11.1]  [21.1]
             X       y
0  [1.2, 11.2]  [21.2]
             X       y
0  [1.3, 11.3]  [21.3]
             X       y
0  [1.4, 11.4]  [21.4]
             X       y
0  [1.5, 11.5]  [21.5]
             X       y
0  [1.6, 11.6]  [21.6]
             X       y
0  [1.7, 11.7]  [21.7]
             X       y
0  [1.8, 11.8]  [21.8]
             X       y
0  [1.9, 11.9]  [21.9]
             X       y
0  [2.0, 12.0]  [22.0]
             X       y
0  [2.1, 12.1]  [22.1]
             X       y
0  [2.2, 12.2]  [22.2]
             X       y
0  [2.3, 12.3]  [22.3]
             X       y
0  [2.4, 12.4]  [22.4]
             X       y
0  [2.5, 12.5]  [22.5]
             X       y
0  [2.6, 12.6]  [22.6]
             X       y
0  [2.7, 12.7]  [22.7]
             X       y
0  [2.8, 12.8]  [22.8]
             X       y
0  [2.9, 12.9]  [22.9]
             X       y
0  [3.0, 13.0]  [23.0]
             X       y
0  [3.1, 13.1]  [23.1]
             X       y
0  [3.2, 13

<br>

## **Step 6b: Set up custom dataloader using torch.utils.data.DataLoader - Batch of 9**

In [15]:
df_dataloader = DataLoader(df_dataset, batch_size=9)

In [16]:
for i, j in df_dataloader:
    print(pd.DataFrame(zip(i.numpy(), j.numpy()), columns=['X','y']))

             X       y
0  [1.1, 11.1]  [21.1]
1  [1.2, 11.2]  [21.2]
2  [1.3, 11.3]  [21.3]
3  [1.4, 11.4]  [21.4]
4  [1.5, 11.5]  [21.5]
5  [1.6, 11.6]  [21.6]
6  [1.7, 11.7]  [21.7]
7  [1.8, 11.8]  [21.8]
8  [1.9, 11.9]  [21.9]
             X       y
0  [2.0, 12.0]  [22.0]
1  [2.1, 12.1]  [22.1]
2  [2.2, 12.2]  [22.2]
3  [2.3, 12.3]  [22.3]
4  [2.4, 12.4]  [22.4]
5  [2.5, 12.5]  [22.5]
6  [2.6, 12.6]  [22.6]
7  [2.7, 12.7]  [22.7]
8  [2.8, 12.8]  [22.8]
             X       y
0  [2.9, 12.9]  [22.9]
1  [3.0, 13.0]  [23.0]
2  [3.1, 13.1]  [23.1]
3  [3.2, 13.2]  [23.2]
4  [3.3, 13.3]  [23.3]
5  [3.4, 13.4]  [23.4]
6  [3.5, 13.5]  [23.5]
7  [3.6, 13.6]  [23.6]
8  [3.7, 13.7]  [23.7]
             X       y
0  [3.8, 13.8]  [23.8]
1  [3.9, 13.9]  [23.9]
2  [4.0, 14.0]  [24.0]


**You can see batches of 9 data points with every point in same order as original dataframe**

<br>

## **Step 6c: Set up custom dataloader using torch.utils.data.DataLoader - Batch of 9 with shuffle**

In [17]:
df_dataloader = DataLoader(df_dataset, batch_size=9, shuffle=True)

In [18]:
for i, j in df_dataloader:
    print(pd.DataFrame(zip(i.numpy(), j.numpy()), columns=['X','y']))

             X       y
0  [3.4, 13.4]  [23.4]
1  [3.3, 13.3]  [23.3]
2  [2.6, 12.6]  [22.6]
3  [1.6, 11.6]  [21.6]
4  [2.4, 12.4]  [22.4]
5  [3.2, 13.2]  [23.2]
6  [1.5, 11.5]  [21.5]
7  [4.0, 14.0]  [24.0]
8  [2.3, 12.3]  [22.3]
             X       y
0  [1.2, 11.2]  [21.2]
1  [1.4, 11.4]  [21.4]
2  [2.1, 12.1]  [22.1]
3  [2.7, 12.7]  [22.7]
4  [1.8, 11.8]  [21.8]
5  [3.5, 13.5]  [23.5]
6  [2.8, 12.8]  [22.8]
7  [2.2, 12.2]  [22.2]
8  [3.6, 13.6]  [23.6]
             X       y
0  [3.8, 13.8]  [23.8]
1  [3.0, 13.0]  [23.0]
2  [2.0, 12.0]  [22.0]
3  [2.5, 12.5]  [22.5]
4  [3.7, 13.7]  [23.7]
5  [1.3, 11.3]  [21.3]
6  [1.9, 11.9]  [21.9]
7  [3.1, 13.1]  [23.1]
8  [3.9, 13.9]  [23.9]
             X       y
0  [1.7, 11.7]  [21.7]
1  [1.1, 11.1]  [21.1]
2  [2.9, 12.9]  [22.9]


**You can see batches of 9 data points with every point <u>NOT</u> in same order as original dataframe**