# Housing prices in California


The original database is available from StatLib

    http://lib.stat.cmu.edu/datasets/

The data contains 20,640 observations on 9 variables.

This dataset contains the average house value as target variable
and the following input variables (features): 
- average income,
- housing average age, 
- average rooms, 
- average bedrooms, 
- population,
- average occupation, 
- latitude, and 
- longitude

## Reference

* Regression Analysis with Python, Luca Massaron, Alberto Boschetti, Packt Publishing
* Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions, Statistics and Probability Letters, 33 (1997) 291-297.

In [1]:
from sklearn.datasets import fetch_california_housing
import openturns as ot

In [2]:
california = fetch_california_housing()

In [3]:
p = california.data.shape[1]
p

8

In [4]:
n = california.data.shape[0]
n

20640

In [5]:
print(california.DESCR)

California housing dataset.

The original database is available from StatLib

    http://lib.stat.cmu.edu/datasets/

The data contains 20,640 observations on 9 variables.

This dataset contains the average house value as target variable
and the following input variables (features): average income,
housing average age, average rooms, average bedrooms, population,
average occupation, latitude, and longitude in that order.

References
----------

Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,
Statistics and Probability Letters, 33 (1997) 291-297.




In [6]:
sample = ot.Sample(n,p+1)
sample[:,0:p] = california.data

In [7]:
sample[:,p] = ot.Sample(california.target,1)

In [12]:
descr = [california.feature_names[i] for i in range(p)]
descr.append("Average_house_value")
descr

['MedInc',
 'HouseAge',
 'AveRooms',
 'AveBedrms',
 'Population',
 'AveOccup',
 'Latitude',
 'Longitude',
 'Average_house_value']

In [9]:
sample.setDescription(descr)

In [10]:
sample.exportToCSVFile("Housing-prices-California.csv")