# Using the TensorDataset class

In practice, loading your data into a PyTorch dataset will be one of the first steps you take in order to create and train a neural network with PyTorch.

The TensorDataset class is very helpful when your dataset can be loaded directly as a NumPy array. Recall that TensorDataset() can take one or more NumPy arrays as input.

In this exercise, you'll practice creating a PyTorch dataset using the TensorDataset class.

torch and numpy have already been imported for you, along with the TensorDataset class.

In [10]:
import torch.nn as nn
import pandas as pd
from torch.utils.data import TensorDataset, DataLoader

In [2]:
import numpy as np
import torch
from torch.utils.data import TensorDataset

np_features = np.array(np.random.rand(12, 8))
np_target = np.array(np.random.rand(12, 1))

# Convert arrays to PyTorch tensors
torch_features = torch.tensor(np_features)
torch_target = torch.tensor(np_target)
# Create a TensorDataset from two tensors
dataset = TensorDataset(torch_features, torch_target)
# dataset = TensorDataset(torch_features.float(), torch_target.float())

# Return the last element of this dataset
print(dataset[-1])

(tensor([0.0244, 0.8569, 0.0847, 0.9896, 0.8408, 0.2973, 0.0486, 0.8533],
       dtype=torch.float64), tensor([0.5312], dtype=torch.float64))


TensorDataset is great to use when your dataset can be loaded from NumPy arrays (or converted to NumPy arrays). However, sometimes you need to code a custom dataset class. 



In [18]:
dataframe = pd.DataFrame({
    'ph': [7.0, 8.1, np.nan, 7.8],
    'Sulfate': [300, 320, 330, np.inf],
    'Solids': [20000, 21000, 22000, 23000],
    'Conductivity': [400, 420, 430, 440],
    'Chloramines': [3.1, 3.2, 3.3, 3.4],
    'Turbidity': [4.0, 4.1, 4.2, 4.3],
    'Hardness': [150, 160, 170, 180],
    'Organic_carbon': [10, 11, 12, 13],
    'Potability': [0, 1, 0, 1]
})


In [23]:
# Normalize the features
features = dataframe[['ph', 'Sulfate', 'Solids', 'Conductivity', 'Chloramines', 'Turbidity', 'Hardness', 'Organic_carbon']]
features = (features - features.mean()) / features.std()

# Convert to PyTorch tensors
features_tensor = torch.tensor(features.to_numpy()).float()
target_tensor = torch.tensor(dataframe['Potability'].to_numpy()).float()

# Create a dataset from the two generated tensors
dataset = TensorDataset(features_tensor, target_tensor)

# Create a dataloader using the above dataset
dataloader = DataLoader(dataset, shuffle=True, batch_size=2)

# Create a model using the nn.Sequential API
model = nn.Sequential(
    nn.Linear(8, 16),  # Adjust the input dimension to 8 to match the features
    nn.ReLU(),
    nn.Linear(16, 1),
    nn.Sigmoid()  # Sigmoid activation function to squash output values to [0, 1]
)

# Define loss function and optimizer
criterion = nn.BCELoss()  # Binary Cross-Entropy Loss
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer

# Train the model
num_epochs = 10
for epoch in range(num_epochs):
    for features_batch, target_batch in dataloader:
        # Forward pass
        output = model(features_batch)
        
        # Debugging: print output values
        print(f"Output: {output.detach().numpy()}")
        
        # Ensure target shape matches output
        loss = criterion(output, target_batch.unsqueeze(1))
        
        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# After training, let's print out the output of the trained model
output = model(features_tensor)
print(output)

Output: [[0.4731331 ]
 [0.45483366]]
Output: [[0.46132874]
 [0.49768358]]
Output: [[0.46951687]
 [0.49919876]]
Output: [[0.46271887]
 [0.45657685]]
Output: [[0.5022042 ]
 [0.45703876]]
Output: [[0.46380085]
 [0.46520767]]
Output: [[0.5044786]
 [0.4576963]]
Output: [[0.46440172]
 [0.4622614 ]]
Output: [[0.5064703]
 [0.4581445]]
Output: [[0.46491283]
 [0.45917076]]
Output: [[0.4650357 ]
 [0.45744115]]
Output: [[0.45845065]
 [0.5088982 ]]
Output: [[0.45377332]
 [0.4651148 ]]
Output: [[0.4585345]
 [0.5103406]]
Output: [[0.4586553]
 [0.5112361]]
Output: [[0.46570426]
 [0.44879502]]
Output: [[0.45899907]
 [0.51325816]]
Output: [[0.4662863 ]
 [0.44567487]]
Output: [[0.44397348]
 [0.4593102 ]]
Output: [[0.4666956]
 [0.5160333]]
tensor([[0.4405],
        [0.4596],
        [0.4669],
        [0.5169]], grad_fn=<SigmoidBackward0>)
