## Ingest the Data

This notebook contains the basic commands required to ingest the data for our work. Note that all of these commands were added to the file, `src/load_data-01.r` so that in subsequent notebooks the data is loaded via script.

### Join the Data Sets

Often you will receive data describing the same instances from multiple data sources. The original Ames, Iowa housing data has been arbitrarily split in order to allow us the opportunity to practice joining data from different sources. 

In [None]:
zoning_df = read.csv('data/zoning.csv')
listing_df = read.csv('data/listing.csv')
sale_df = read.csv('data/sale.csv')

In [None]:
head(zoning_df)

In [None]:
head(listing_df)

In [None]:
head(sale_df)

Here, we join the three datasets using the `merge` command using the column `Id` as reference.

In [None]:
housing_df = merge(zoning_df, listing_df, by="Id")
housing_df = merge(housing_df, sale_df, by="Id")

In [None]:
head(housing_df)

In [None]:
dim(housing_df)

In [None]:
str(Filter(is.numeric, housing_df))

In [None]:
rownames(housing_df) <- housing_df$Id 
housing_df$Id <- NULL

### Typecast Categorical Features

Several features are categorical in nature in spite of the fact that the data is stored as integer values. We must explicitly cast these features as `factor` type features.

In [None]:
housing_df$MSSubClass <- as.factor(housing_df$MSSubClass)
housing_df$OverallQual <- as.factor(housing_df$OverallQual)
housing_df$OverallCond <- as.factor(housing_df$OverallCond)
housing_df$BsmtFullBath <- as.factor(housing_df$BsmtFullBath)
housing_df$BsmtHalfBath <- as.factor(housing_df$BsmtHalfBath)
housing_df$FullBath <- as.factor(housing_df$FullBath)
housing_df$HalfBath <- as.factor(housing_df$HalfBath)
housing_df$BedroomAbvGr <- as.factor(housing_df$BedroomAbvGr)
housing_df$KitchenAbvGr <- as.factor(housing_df$KitchenAbvGr)
housing_df$TotRmsAbvGrd <- as.factor(housing_df$TotRmsAbvGrd)
housing_df$Fireplaces <- as.factor(housing_df$Fireplaces)
housing_df$GarageCars <- as.factor(housing_df$GarageCars)
housing_df$MoSold <- as.factor(housing_df$MoSold)