**Reference**: Dive into Deep Learning -- ASTON ZHANG

https://drive.google.com/file/d/1bV_z9nx2dF2oSqM5ly82izKmWwgi9ylj/view?usp=drive_link

# Load Data

In [1]:
import os
from pathlib import Path


data_file = Path("../data/house_tiny.pt-basics-02.csv")
os.makedirs(data_file.parent, exist_ok=True)

with open(data_file, "w") as f:
    f.write(
        """NumRooms,RoofType,Price
NA,NA,127500
2,NA,106000
4,Slate,178100
NA,NA,140000"""
    )

In [2]:
import pandas as pd

data = pd.read_csv(data_file)
print(data)

   NumRooms RoofType   Price
0       NaN      NaN  127500
1       2.0      NaN  106000
2       4.0    Slate  178100
3       NaN      NaN  140000


# Data Preparation

In [3]:
inputs, targets = data.iloc[:, 0:2], data.iloc[:, 2]

"""
For categorical input fields, we can treat NaN as a category. Since the RoofType
column takes values Slate and NaN, pandas can convert this column into two
columns RoofType_Slate and RoofType_nan. A row whose roof type is Slate will
set values of RoofType_Slate and RoofType_nan to 1 and 0, respectively.
"""

inputs = pd.get_dummies(inputs, dummy_na=True, dtype=int)

print(inputs)

   NumRooms  RoofType_Slate  RoofType_nan
0       NaN               0             1
1       2.0               0             1
2       4.0               1             0
3       NaN               0             1


In [4]:
"""
For missing numerical values, one common heuristic is to replace the NaN entries
with the mean value of the corresponding column.
"""

inputs = inputs.fillna(inputs.mean())
print(inputs)

   NumRooms  RoofType_Slate  RoofType_nan
0       3.0               0             1
1       2.0               0             1
2       4.0               1             0
3       3.0               0             1


# Conversion to the Tensor Format

In [5]:
import torch

X, y = torch.tensor(inputs.values), torch.tensor(targets.values)
print(X, y, sep="\n")

tensor([[3., 0., 1.],
        [2., 0., 1.],
        [4., 1., 0.],
        [3., 0., 1.]], dtype=torch.float64)
tensor([127500, 106000, 178100, 140000])
