**2.2.1 Reading the Dataset**

In [1]:
import os
os.makedirs(os.path.join('..', 'data'), exist_ok=True)
data_file = os.path.join('..', 'data', 'house_tiny.csv')
with open(data_file, 'w') as f:
    f.write('''NumRooms, RoofType, Price
NA, NA, 127500
2, NA, 106000
4, Slate, 178100
NA, NA, 140000''')

In [2]:
#@ Import the library and load the dataset
import pandas as pd

data = pd.read_csv(data_file)
print(data)

   NumRooms  RoofType   Price
0       NaN        NA  127500
1       2.0        NA  106000
2       4.0     Slate  178100
3       NaN        NA  140000


**2.2.2. Data Prepration**

In [3]:
#@ Handling the missing data
inputs, targets = data.iloc[:, 0:2], data.iloc[:, 2]
inputs = pd.get_dummies(inputs, dummy_na=True)
print(inputs)

   NumRooms   RoofType_ NA   RoofType_ Slate   RoofType_nan
0       NaN              1                 0              0
1       2.0              1                 0              0
2       4.0              0                 1              0
3       NaN              1                 0              0


In [4]:
#@ For missing numerical values
inputs = inputs.fillna(inputs.mean())
print(inputs)

   NumRooms   RoofType_ NA   RoofType_ Slate   RoofType_nan
0       3.0              1                 0              0
1       2.0              1                 0              0
2       4.0              0                 1              0
3       3.0              1                 0              0


**2.2.3 Conversion to the Tensor Format**

In [5]:
#@ Loading the converted values into tesor
import torch 

X, y = torch.tensor(inputs.values), torch.tensor(targets.values)
X, y

(tensor([[3., 1., 0., 0.],
         [2., 1., 0., 0.],
         [4., 0., 1., 0.],
         [3., 1., 0., 0.]], dtype=torch.float64),
 tensor([127500, 106000, 178100, 140000]))

**Note:** 

Real-world datasets are often plagued by outliers, faulty measurements from sensors, and recording errors, which must be addressed before feeding the data into any model. Data visualization tools such as seaborn, Bokeh, or matplotlib can help you to manually inspect the data and develop intuitions about what problems you may need to address.