##When you have a dataset in deep learning, you split it into two parts:

- Training set → for training the model
- Test set → for checking if your model is working well on unseen data

- In PyTorch, we usually do this using:
  - random_split() → for simple splitting
  - or SubsetRandomSampler and DataLoader → for more advanced splitting.

- First, let's start with the basic and easiest method: random_split().

##Code to Split Dataset in PyTorch

In [1]:
# Import necessary libraries
import torch
from torch.utils.data import random_split, DataLoader, Dataset
import pandas as pd

In [2]:
# Load your CSV dataset using pandas
data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/carprices.csv')

In [3]:
# Create a custom Dataset class
class MyDataset(Dataset):
    def __init__(self, dataframe):
        self.data = dataframe

    def __len__(self):
        return len(self.data)    # Return total number of samples

    def __getitem__(self, idx):
        row = self.data.iloc[idx]    # Get one row
        features = torch.tensor(row[:-1].values, dtype=torch.float32)  # All columns except the last are features
        label = torch.tensor(row[-1], dtype=torch.long)               # Last column is the label
        return features, label

In [4]:
# Create the dataset object
dataset = MyDataset(data)

##Split the Dataset

In [5]:
# Decide the split sizes
train_size = int(0.8 * len(dataset))  # 80% of the data for training
test_size = len(dataset) - train_size # Remaining 20% for testing

In [6]:
# Split the dataset randomly
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])
# train_dataset will contain 80% of the samples
# test_dataset will contain 20% of the samples

##Create Dataloaders for Training and Testing

In [7]:
# Create DataLoader for training
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
# shuffle=True to mix data for better training

In [8]:
# Create DataLoader for testing
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
# shuffle=False because we don't need to mix during testing

- train_loader → Use this for training your model

- test_loader → Use this for evaluating your model

In [9]:
# Check a batch of training data
for features, labels in train_loader:
    print("Train batch - features shape:", features.shape)
    print("Train batch - labels shape:", labels.shape)
    break  # Only print one batch


Train batch - features shape: torch.Size([16, 2])
Train batch - labels shape: torch.Size([16])


  label = torch.tensor(row[-1], dtype=torch.long)               # Last column is the label


In [10]:
# Check a batch of testing data
for features, labels in test_loader:
    print("Test batch - features shape:", features.shape)
    print("Test batch - labels shape:", labels.shape)
    break  # Only print one batch

Test batch - features shape: torch.Size([4, 2])
Test batch - labels shape: torch.Size([4])


  label = torch.tensor(row[-1], dtype=torch.long)               # Last column is the label
