# House Prices

In this lab you will be faced with the House Prices competition from Kaggle:
* [House Prices competition](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques)

The purpose of the competition is to create a model that can predict house prices.

"Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence."

The Ames Housing dataset was compiled by Dean De Cock for use in data science education. 

### Variables

For this we have to our disposal a dataset with the following variables:

```
    SalePrice - the property's sale price in dollars. This is the target variable that you're trying to predict.
    MSSubClass: The building class
    MSZoning: The general zoning classification
    LotFrontage: Linear feet of street connected to property
    LotArea: Lot size in square feet
    Street: Type of road access
    Alley: Type of alley access
    LotShape: General shape of property
    LandContour: Flatness of the property
    Utilities: Type of utilities available
    LotConfig: Lot configuration
    LandSlope: Slope of property
    Neighborhood: Physical locations within Ames city limits
    Condition1: Proximity to main road or railroad
    Condition2: Proximity to main road or railroad (if a second is present)
    BldgType: Type of dwelling
    HouseStyle: Style of dwelling
    OverallQual: Overall material and finish quality
    OverallCond: Overall condition rating
    YearBuilt: Original construction date
    YearRemodAdd: Remodel date
    RoofStyle: Type of roof
    RoofMatl: Roof material
    Exterior1st: Exterior covering on house
    Exterior2nd: Exterior covering on house (if more than one material)
    MasVnrType: Masonry veneer type
    MasVnrArea: Masonry veneer area in square feet
    ExterQual: Exterior material quality
    ExterCond: Present condition of the material on the exterior
    Foundation: Type of foundation
    BsmtQual: Height of the basement
    BsmtCond: General condition of the basement
    BsmtExposure: Walkout or garden level basement walls
    BsmtFinType1: Quality of basement finished area
    BsmtFinSF1: Type 1 finished square feet
    BsmtFinType2: Quality of second finished area (if present)
    BsmtFinSF2: Type 2 finished square feet
    BsmtUnfSF: Unfinished square feet of basement area
    TotalBsmtSF: Total square feet of basement area
    Heating: Type of heating
    HeatingQC: Heating quality and condition
    CentralAir: Central air conditioning
    Electrical: Electrical system
    1stFlrSF: First Floor square feet
    2ndFlrSF: Second floor square feet
    LowQualFinSF: Low quality finished square feet (all floors)
    GrLivArea: Above grade (ground) living area square feet
    BsmtFullBath: Basement full bathrooms
    BsmtHalfBath: Basement half bathrooms
    FullBath: Full bathrooms above grade
    HalfBath: Half baths above grade
    Bedroom: Number of bedrooms above basement level
    Kitchen: Number of kitchens
    KitchenQual: Kitchen quality
    TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
    Functional: Home functionality rating
    Fireplaces: Number of fireplaces
    FireplaceQu: Fireplace quality
    GarageType: Garage location
    GarageYrBlt: Year garage was built
    GarageFinish: Interior finish of the garage
    GarageCars: Size of garage in car capacity
    GarageArea: Size of garage in square feet
    GarageQual: Garage quality
    GarageCond: Garage condition
    PavedDrive: Paved driveway
    WoodDeckSF: Wood deck area in square feet
    OpenPorchSF: Open porch area in square feet
    EnclosedPorch: Enclosed porch area in square feet
    3SsnPorch: Three season porch area in square feet
    ScreenPorch: Screen porch area in square feet
    PoolArea: Pool area in square feet
    PoolQC: Pool quality
    Fence: Fence quality
    MiscFeature: Miscellaneous feature not covered in other categories
    MiscVal: $Value of miscellaneous feature
    MoSold: Month Sold
    YrSold: Year Sold
    SaleType: Type of sale
    SaleCondition: Condition of sale
```

# Load Data
We have to start downloading the data. It is available under the data tab.

* [House Prices competition](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques)

We will need the following files that are in CSV format:
- train.csv
- test.csv

Then we have to upload the files to HDFS, to the `datasets/house-prices` directory.

We can now now load the data and explore it:

In [1]:
houses = spark.read.csv('datasets/house-prices/train.csv', header=True,
                       inferSchema=True)

Let's see how the data looks like:

In [2]:
houses.printSchema()

root
 |-- Id: integer (nullable = true)
 |-- MSSubClass: integer (nullable = true)
 |-- MSZoning: string (nullable = true)
 |-- LotFrontage: string (nullable = true)
 |-- LotArea: integer (nullable = true)
 |-- Street: string (nullable = true)
 |-- Alley: string (nullable = true)
 |-- LotShape: string (nullable = true)
 |-- LandContour: string (nullable = true)
 |-- Utilities: string (nullable = true)
 |-- LotConfig: string (nullable = true)
 |-- LandSlope: string (nullable = true)
 |-- Neighborhood: string (nullable = true)
 |-- Condition1: string (nullable = true)
 |-- Condition2: string (nullable = true)
 |-- BldgType: string (nullable = true)
 |-- HouseStyle: string (nullable = true)
 |-- OverallQual: integer (nullable = true)
 |-- OverallCond: integer (nullable = true)
 |-- YearBuilt: integer (nullable = true)
 |-- YearRemodAdd: integer (nullable = true)
 |-- RoofStyle: string (nullable = true)
 |-- RoofMatl: string (nullable = true)
 |-- Exterior1st: string (nullable = true)
 |--

In [3]:
houses.select('YearBuilt', 'BedroomAbvGr', 'LotArea', '1stFlrSF', '2ndFlrSF', 'SalePrice').show()

+---------+------------+-------+--------+--------+---------+
|YearBuilt|BedroomAbvGr|LotArea|1stFlrSF|2ndFlrSF|SalePrice|
+---------+------------+-------+--------+--------+---------+
|     2003|           3|   8450|     856|     854|   208500|
|     1976|           3|   9600|    1262|       0|   181500|
|     2001|           3|  11250|     920|     866|   223500|
|     1915|           3|   9550|     961|     756|   140000|
|     2000|           4|  14260|    1145|    1053|   250000|
|     1993|           1|  14115|     796|     566|   143000|
|     2004|           3|  10084|    1694|       0|   307000|
|     1973|           3|  10382|    1107|     983|   200000|
|     1931|           2|   6120|    1022|     752|   129900|
|     1939|           2|   7420|    1077|       0|   118000|
|     1965|           3|  11200|    1040|       0|   129500|
|     2005|           4|  11924|    1182|    1142|   345000|
|     1962|           2|  12968|     912|       0|   144000|
|     2006|           3|

In [4]:
houses.count()

1460

## Feature engineering

To keep feature engineering simple we will select the following features:

Features related to age:
- YearBuilt: Original construction date
- YearRemodAdd: Remodel date

Features related to size:
- LotArea: Lot size in square feet
- 1stFlrSF: First Floor square feet
- 2ndFlrSF: Second floor square feet
- BedroomAbvGr: Number of bedrooms above basement level
- KitchenAbvGr: Number of kitchens
- TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
- GarageCars: Size of garage in car capacity

Features related to quality:
- KitchenQual: Kitchen quality
- Functional: Home functionality rating
- RoofMatl: Roof material
- RoofStyle: Type of roof
- Heating: Type of heating

To use the Sex variable we will need a StringIndexer Estimator to convert gender to a numeric index so we can use it as a feature:

In [5]:
cols = ['YearBuilt', 'YearRemodAdd', 'LotArea', '1stFlrSF', '2ndFlrSF', 
        'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'GarageCars',
        'KitchenQual', 'Functional', 'RoofMatl', 'RoofStyle', 'Heating', 'SalePrice']

In [6]:
data = houses.select(cols)

In [7]:
data.show()

+---------+------------+-------+--------+--------+------------+------------+------------+----------+-----------+----------+--------+---------+-------+---------+
|YearBuilt|YearRemodAdd|LotArea|1stFlrSF|2ndFlrSF|BedroomAbvGr|KitchenAbvGr|TotRmsAbvGrd|GarageCars|KitchenQual|Functional|RoofMatl|RoofStyle|Heating|SalePrice|
+---------+------------+-------+--------+--------+------------+------------+------------+----------+-----------+----------+--------+---------+-------+---------+
|     2003|        2003|   8450|     856|     854|           3|           1|           8|         2|         Gd|       Typ| CompShg|    Gable|   GasA|   208500|
|     1976|        1976|   9600|    1262|       0|           3|           1|           6|         2|         TA|       Typ| CompShg|    Gable|   GasA|   181500|
|     2001|        2002|  11250|     920|     866|           3|           1|           6|         2|         Gd|       Typ| CompShg|    Gable|   GasA|   223500|
|     1915|        1970|   9550|  

Let's explore the data and look a some statistics:

In [8]:
data.describe().toPandas()

Unnamed: 0,summary,YearBuilt,YearRemodAdd,LotArea,1stFlrSF,2ndFlrSF,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,GarageCars,KitchenQual,Functional,RoofMatl,RoofStyle,Heating,SalePrice
0,count,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460,1460,1460,1460,1460,1460.0
1,mean,1971.267808219178,1984.8657534246572,10516.828082191782,1162.626712328767,346.99246575342465,2.866438356164384,1.0465753424657531,6.517808219178082,1.7671232876712328,,,,,,180921.19589041092
2,stddev,30.202904042525294,20.64540680770938,9981.26493237915,386.5877380410744,436.528435886257,0.8157780441442279,0.2203381983840307,1.6253932905840511,0.7473150101111095,,,,,,79442.50288288663
3,min,1872.0,1950.0,1300.0,334.0,0.0,0.0,0.0,2.0,0.0,Ex,Maj1,ClyTile,Flat,Floor,34900.0
4,max,2010.0,2010.0,215245.0,4692.0,2065.0,8.0,3.0,14.0,4.0,TA,Typ,WdShngl,Shed,Wall,755000.0


Let's focus on the sale prices:

In [9]:
data.describe('SalePrice').show()

+-------+------------------+
|summary|         SalePrice|
+-------+------------------+
|  count|              1460|
|   mean|180921.19589041095|
| stddev| 79442.50288288663|
|    min|             34900|
|    max|            755000|
+-------+------------------+



In [10]:
from pyspark.sql.functions import col

In [11]:
data.where('SalePrice > 700000').show()

+---------+------------+-------+--------+--------+------------+------------+------------+----------+-----------+----------+--------+---------+-------+---------+
|YearBuilt|YearRemodAdd|LotArea|1stFlrSF|2ndFlrSF|BedroomAbvGr|KitchenAbvGr|TotRmsAbvGrd|GarageCars|KitchenQual|Functional|RoofMatl|RoofStyle|Heating|SalePrice|
+---------+------------+-------+--------+--------+------------+------------+------------+----------+-----------+----------+--------+---------+-------+---------+
|     1994|        1995|  21535|    2444|    1872|           4|           1|          10|         3|         Ex|       Typ| WdShngl|    Gable|   GasA|   755000|
|     1996|        1996|  15623|    2411|    2065|           4|           1|          10|         3|         Ex|       Typ| CompShg|      Hip|   GasA|   745000|
+---------+------------+-------+--------+--------+------------+------------+------------+----------+-----------+----------+--------+---------+-------+---------+



In [12]:
data.where('SalePrice < 50000').show()

+---------+------------+-------+--------+--------+------------+------------+------------+----------+-----------+----------+--------+---------+-------+---------+
|YearBuilt|YearRemodAdd|LotArea|1stFlrSF|2ndFlrSF|BedroomAbvGr|KitchenAbvGr|TotRmsAbvGrd|GarageCars|KitchenQual|Functional|RoofMatl|RoofStyle|Heating|SalePrice|
+---------+------------+-------+--------+--------+------------+------------+------------+----------+-----------+----------+--------+---------+-------+---------+
|     1920|        1950|   8500|     649|     668|           3|           1|           6|         1|         TA|       Typ| CompShg|  Gambrel|   GasA|    40000|
|     1920|        1950|   7879|     720|       0|           2|           1|           4|         0|         TA|       Typ| CompShg|    Gable|   GasA|    34900|
|     1946|        1950|   5000|     334|       0|           1|           1|           2|         0|         Fa|       Typ| CompShg|    Gable|   GasA|    39300|
|     1949|        1950|   9000|  

For the LinearRegresion Estimator the 'label' must be of type double but 'SalePrice' is of type integer so we have to convert it:

In [13]:
input = data.withColumn('label', col('SalePrice').cast('double')).drop('SalePrice')

In [14]:
input.show()

+---------+------------+-------+--------+--------+------------+------------+------------+----------+-----------+----------+--------+---------+-------+--------+
|YearBuilt|YearRemodAdd|LotArea|1stFlrSF|2ndFlrSF|BedroomAbvGr|KitchenAbvGr|TotRmsAbvGrd|GarageCars|KitchenQual|Functional|RoofMatl|RoofStyle|Heating|   label|
+---------+------------+-------+--------+--------+------------+------------+------------+----------+-----------+----------+--------+---------+-------+--------+
|     2003|        2003|   8450|     856|     854|           3|           1|           8|         2|         Gd|       Typ| CompShg|    Gable|   GasA|208500.0|
|     1976|        1976|   9600|    1262|       0|           3|           1|           6|         2|         TA|       Typ| CompShg|    Gable|   GasA|181500.0|
|     2001|        2002|  11250|     920|     866|           3|           1|           6|         2|         Gd|       Typ| CompShg|    Gable|   GasA|223500.0|
|     1915|        1970|   9550|     961

In [15]:
input.printSchema()

root
 |-- YearBuilt: integer (nullable = true)
 |-- YearRemodAdd: integer (nullable = true)
 |-- LotArea: integer (nullable = true)
 |-- 1stFlrSF: integer (nullable = true)
 |-- 2ndFlrSF: integer (nullable = true)
 |-- BedroomAbvGr: integer (nullable = true)
 |-- KitchenAbvGr: integer (nullable = true)
 |-- TotRmsAbvGrd: integer (nullable = true)
 |-- GarageCars: integer (nullable = true)
 |-- KitchenQual: string (nullable = true)
 |-- Functional: string (nullable = true)
 |-- RoofMatl: string (nullable = true)
 |-- RoofStyle: string (nullable = true)
 |-- Heating: string (nullable = true)
 |-- label: double (nullable = true)



Now we have to convert all categorical variables into numerical ones:

In [16]:
categorical_cols = ['KitchenQual', 'Functional', 'RoofMatl', 'RoofStyle', 'Heating']

In [17]:
from pyspark.ml.feature import StringIndexer

In [18]:
indexers = []
for c in categorical_cols:
    indexers.append(StringIndexer(inputCol=c, outputCol='{}_feature'.format(c), handleInvalid='keep'))

NOTE: If you do not use the `handleInvalid` option, then when the StringIndexer goes through the test data and finds strings that did not find previously in the training set it will abort execution.

Now we have to assemble the selected features into a vector to use it with our model:

In [19]:
from pyspark.ml.feature import VectorAssembler

In [20]:
numerical_cols = ['YearBuilt', 'YearRemodAdd', 'LotArea', '1stFlrSF', '2ndFlrSF', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'GarageCars']

In [21]:
categorical_cols_names = ['{}_feature'.format(c) for c in categorical_cols]

In [22]:
col_names = numerical_cols + categorical_cols_names

In [23]:
col_names

['YearBuilt',
 'YearRemodAdd',
 'LotArea',
 '1stFlrSF',
 '2ndFlrSF',
 'BedroomAbvGr',
 'KitchenAbvGr',
 'TotRmsAbvGrd',
 'GarageCars',
 'KitchenQual_feature',
 'Functional_feature',
 'RoofMatl_feature',
 'RoofStyle_feature',
 'Heating_feature']

In [24]:
assembler = VectorAssembler(inputCols=col_names, outputCol='features')

Let's look if we have to deal with any null values in the dataset:

In [25]:
from pyspark.sql.functions import isnan, when, count
input.select([count(when(col(c).isNull(), c)).alias(c) for c in input.columns]).toPandas()

Unnamed: 0,YearBuilt,YearRemodAdd,LotArea,1stFlrSF,2ndFlrSF,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,GarageCars,KitchenQual,Functional,RoofMatl,RoofStyle,Heating,label
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [26]:
input.count()

1460

## Training

In this case we are dealing with a regression problem so we will use a simple LinearRegression as the Estimator:

In [27]:
from pyspark.ml.regression import LinearRegression

In [28]:
lr = LinearRegression(maxIter=10, regParam=0.01)

We can now create the pipeline:

In [29]:
from pyspark.ml import Pipeline

In [30]:
pipeline = Pipeline(stages=indexers + [assembler, lr])

In [31]:
training, test = input.randomSplit([0.8, 0.2])

In [32]:
%%time
model = pipeline.fit(training)

CPU times: user 45.9 ms, sys: 13.5 ms, total: 59.4 ms
Wall time: 3.13 s


## Evaluation

In [33]:
from pyspark.ml.evaluation import RegressionEvaluator
evaluator = RegressionEvaluator()

In [34]:
test.count()

265

In [35]:
predictions = model.transform(test)

In [36]:
predictions.select('label', 'prediction').show(5)

+--------+------------------+
|   label|        prediction|
+--------+------------------+
|122000.0|107056.47361213807|
|107500.0|138808.13651330466|
|157500.0|176075.85258704377|
|135000.0|126246.83883265592|
|137000.0|135816.40516097378|
+--------+------------------+
only showing top 5 rows



In [37]:
rmse = evaluator.evaluate(predictions)

Root-mean squared error (RMSE):

In [38]:
rmse

46520.39217655911

So in general a estimation of our average error is around 30k and if we look at the above predicitons and compare them with the labels we see our predictions are fine. 

## Final note

This was just a quick and simple solution but you can definitely improve this!!

You can also upload your results to kaggle and take part in the competition:
```
Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. 

Submission File Format

The file should contain a header and have the following format:

    Id,SalePrice
    1461,169000.1
    1462,187724.1233
    1463,175221
    etc.
```

In [39]:
competition_test_raw = spark.read.csv('datasets/house-prices/test.csv', header=True, inferSchema=True)

In [40]:
competition_test_raw.count()

1459

Let's keep only the columns needed for our model:

In [41]:
competition_test = competition_test_raw.select(numerical_cols + categorical_cols + ['Id'])

We can now try obtain the predicitons but to our surprise this will return an errror (try to uncomment and run the command below):

In [42]:
#result = model.transform(competition_test)

Let's look at the GarageCars column that seems to contain non numeric values:

In [43]:
competition_test.select('GarageCars').filter(col('GarageCars').cast('integer').isNull()).show()

+----------+
|GarageCars|
+----------+
|        NA|
+----------+



So, in the competition test dataset, there is one row that for GarageCars it has the value "NA", this causes the conversion to "integer" to fail.

We can now confirm that this is the only numerical column where this happens:

In [44]:
competition_test.select(numerical_cols).select([count(when(col(c).cast('integer').isNull(), c)).alias(c) for c in numerical_cols]).toPandas()

Unnamed: 0,YearBuilt,YearRemodAdd,LotArea,1stFlrSF,2ndFlrSF,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,GarageCars
0,0,0,0,0,0,0,0,0,1


So we have to fix this issue before proceeding:

In [45]:
competition_test_fixed = competition_test.replace('NA', '0')

Let's check it has been fixed:

In [46]:
competition_test_fixed.select(numerical_cols).select([count(when(col(c).cast('integer').isNull(), c)).alias(c) for c in numerical_cols]).toPandas()

Unnamed: 0,YearBuilt,YearRemodAdd,LotArea,1stFlrSF,2ndFlrSF,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,GarageCars
0,0,0,0,0,0,0,0,0,0


So now we can proceed and get our predicitions, but you would get into another issue if you run the command:

In [47]:
#result = model.transform(competition_test_fixed)

Now the problem is that after the change the 'GarageCars' is no longer automatically detected as 'integer' type:

In [48]:
competition_test_fixed.printSchema()

root
 |-- YearBuilt: integer (nullable = true)
 |-- YearRemodAdd: integer (nullable = true)
 |-- LotArea: integer (nullable = true)
 |-- 1stFlrSF: integer (nullable = true)
 |-- 2ndFlrSF: integer (nullable = true)
 |-- BedroomAbvGr: integer (nullable = true)
 |-- KitchenAbvGr: integer (nullable = true)
 |-- TotRmsAbvGrd: integer (nullable = true)
 |-- GarageCars: string (nullable = true)
 |-- KitchenQual: string (nullable = true)
 |-- Functional: string (nullable = true)
 |-- RoofMatl: string (nullable = true)
 |-- RoofStyle: string (nullable = true)
 |-- Heating: string (nullable = true)
 |-- Id: integer (nullable = true)



As you can see now we have: `GarageCars: string`

To fix this we will do the cast manually:

In [49]:
competition_test_fixed2 = competition_test_fixed.withColumnRenamed('GarageCars', 'GarageCarsString').withColumn('GarageCars', col('GarageCarsString').cast('integer'))

In [50]:
competition_test_fixed2.printSchema()

root
 |-- YearBuilt: integer (nullable = true)
 |-- YearRemodAdd: integer (nullable = true)
 |-- LotArea: integer (nullable = true)
 |-- 1stFlrSF: integer (nullable = true)
 |-- 2ndFlrSF: integer (nullable = true)
 |-- BedroomAbvGr: integer (nullable = true)
 |-- KitchenAbvGr: integer (nullable = true)
 |-- TotRmsAbvGrd: integer (nullable = true)
 |-- GarageCarsString: string (nullable = true)
 |-- KitchenQual: string (nullable = true)
 |-- Functional: string (nullable = true)
 |-- RoofMatl: string (nullable = true)
 |-- RoofStyle: string (nullable = true)
 |-- Heating: string (nullable = true)
 |-- Id: integer (nullable = true)
 |-- GarageCars: integer (nullable = true)



In [51]:
competition_test_fixed2.select('GarageCars', 'GarageCarsString').show()

+----------+----------------+
|GarageCars|GarageCarsString|
+----------+----------------+
|         1|               1|
|         1|               1|
|         2|               2|
|         2|               2|
|         2|               2|
|         2|               2|
|         2|               2|
|         2|               2|
|         2|               2|
|         2|               2|
|         2|               2|
|         1|               1|
|         1|               1|
|         2|               2|
|         1|               1|
|         3|               3|
|         3|               3|
|         3|               3|
|         3|               3|
|         3|               3|
+----------+----------------+
only showing top 20 rows



So as we can see above the issue has been fixed and we can remove the 'GarageCarsString' column:

In [52]:
competition_test_fixed3 = competition_test_fixed2.drop('GarageCarsString')

And now we can proceed obtaining the predictions:

In [53]:
result = model.transform(competition_test_fixed3)

And finally we can save it in CSV format:

In [54]:
submit = result.select('Id', 'prediction').withColumnRenamed('prediction', 'SalePrice')

In [55]:
submit.show()

+----+------------------+
|  Id|         SalePrice|
+----+------------------+
|1461|107943.03079417977|
|1462|158408.90819984255|
|1463|189033.22889016382|
|1464| 207487.9997357831|
|1465|188271.48248962918|
|1466|189155.33859713562|
|1467|171976.71691800654|
|1468|182992.98591455957|
|1469| 194812.4015582949|
|1470|119442.33346839249|
|1471|198923.10175579996|
|1472|102000.08272453747|
|1473|104680.07841736497|
|1474|165491.23943012394|
|1475| 99101.06739765313|
|1476|332844.13643109216|
|1477|250435.63928739377|
|1478|  278808.778015197|
|1479| 299989.5512064288|
|1480|386667.51530223456|
+----+------------------+
only showing top 20 rows



Everything looks fine so we can now write it to HDFS:

In [56]:
submit.write.mode('overwrite').csv('results_house_prices', header=True)

We can now download and review the final csv file and add the header row before submitting.

```
hdfs dfs -get results_house_prices
```