Cleaned Automobile Dataset Documentation
Dataset Overview:
The cleaned automobile dataset is derived from the UCI Machine Learning Repository's ["Automobile Data Set."](https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data) This dataset contains various continuous features related to automobile specifications and a target variable (price) representing the price of the car.

**File Details:**
* File Name: cleaned_automobile_data.pt
* File Format: PyTorch .pt file (serialized dictionary)
* Size: Varies based on the number of valid observations.
* Contents of the File:
* The .pt file contains the following keys:

* features: PyTorch tensor of shape (n_samples, 13)
* Description: Continuous numerical features related to automobile characteristics.
* target: PyTorch tensor of shape (n_samples, 1)
* Description: Price of the automobile (target variable).
* feature_names: List of strings (length 13)
* Description: Names of the feature columns:['wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-size', 'bore', 'stroke', 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg', 'highway-mpg']
* target_name: String
* Description: Name of the target variable (price).

**Data Cleaning Details:**
* Missing Values Handling: Rows with missing values for any of the 13 continuous features or price were removed.
* Continuous Features: Only continuous features relevant to car specifications are retained.
* Target Variable: The price column was cleaned and reshaped for use in regression tasks.
* Dataset Shape: After cleaning, the dataset contains approximately 195 samples





In [1]:
import torch

# === Step 2: Loading the Dataset === #
def load_dataset(file_path):
    """
    Loads the saved PyTorch .pt dataset from the specified file path.

    This function reads a dataset saved in PyTorch format and extracts its features, target values, 
    and relevant metadata. It prints the shapes of the feature matrix and target vector, 
    as well as the feature and target names.

    Parameters:
    -----------
    file_path : str
        The file path to the saved .pt dataset.

    Returns:
    --------
    tuple
        A tuple containing:
        - features (torch.Tensor): A tensor containing the feature values.
        - target (torch.Tensor): A tensor containing the target values.
        - feature_names (list of str): A list of names corresponding to the feature columns.
        - target_name (str): The name of the target variable.
    
    Example:
    --------
    >>> features, target, feature_names, target_name = load_dataset("cleaned_automobile_data.pt")
    Loaded dataset successfully!
    Features shape: torch.Size([100, 13]), Target shape: torch.Size([100])
    Feature Names: ['wheel-base', 'length', 'width', ...], Target Name: 'price'
    """
    dataset = torch.load(file_path)
    print("Loaded dataset successfully!")
    print(f"Features shape: {dataset['features'].shape}, Target shape: {dataset['target'].shape}")
    print(f"Feature Names: {dataset['feature_names']}, Target Name: {dataset['target_name']}")
    return dataset["features"], dataset["target"], dataset["feature_names"], dataset["target_name"]

if __name__ == "__main__":

    #Load the dataset and display information
    features, target, feature_names, target_name = load_dataset("cleaned_automobile_data.pt")
    print(f"First 5 rows of features:\n{features[:2]}")
    print(f"First 5 rows of target:\n{target[:2]}")
    

Loaded dataset successfully!
Features shape: torch.Size([195, 13]), Target shape: torch.Size([195, 1])
Feature Names: ['wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-size', 'bore', 'stroke', 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg', 'highway-mpg'], Target Name: price
First 5 rows of features:
tensor([[8.8600e+01, 1.6880e+02, 6.4100e+01, 4.8800e+01, 2.5480e+03, 1.3000e+02,
         3.4700e+00, 2.6800e+00, 9.0000e+00, 1.1100e+02, 5.0000e+03, 2.1000e+01,
         2.7000e+01],
        [8.8600e+01, 1.6880e+02, 6.4100e+01, 4.8800e+01, 2.5480e+03, 1.3000e+02,
         3.4700e+00, 2.6800e+00, 9.0000e+00, 1.1100e+02, 5.0000e+03, 2.1000e+01,
         2.7000e+01]])
First 5 rows of target:
tensor([[13495.],
        [16500.]])
