#### House Prices: Advanced Regression Techniques
**본 프로젝트는 주어진 집(house) 관련 데이터를 토대로 집 값을 예측하는 회귀(Regression) 분석 및 예측 대회입니다.**

다음의 기술 및 분석 능력을 통해 보다 정확한 예측 성능(performance)을 기대할 수 있습니다.

- 데이터 전처리 (preprocessing)를 통하여, 결측치 처리
- 시각화
- 모델 앙상블(model ensemble)을 통해 예측 성능 향상

집 구매자에게 꿈의 집을 묘사해달라고 물어본다면, 그들은 지하실 천장 높이나 '동서(east-west) 철도와의 근접성'이라고 대답하지는 않을 것입니다. 그런데 본 데이터 세트는 앞서 얘기한 요소들이 침실 수 또는 하얀 울타리보다 가격 협상에 훨씬 더 많은 영향을 준다는 것을 알 수 있게됩니다.

본 프로젝트는 **79개의 다양한 집을 묘사할 수 있는 데이터 요소들을 기반으로, 각각의 집들에 대하여 최종 집값을 예측**해 주길 기대합니다.

#### 모델 향상 (점수 개선)을 위한 주요 tip
1. 데이터에는 결측치도 굉장히 많이 존재하고, 이상치(outlier)도 존재합니다. 실전 데이터를 통해 이를 잘 처리해 주어야먄 좋은 점수를 기대할 수 있습니다.
2. 다양한 모델들의 앙상블을 통해 획기적인 점수 개선을 눈으로 직접 확인할 수 있습니다. 좋은 모델을 선정하고, baggine, boosting, stacking 등의 앙상블 기법을 통해 점수 개선을 기대해 봅니다.

#### 데이터 소개
- train.csv: 학습을 위하여 활용되는 데이터 셋입니다.
- test.csv: 테스트를 위하여 제공되는 데이터 셋입니다. 본 파일 데이터를 기반으로 예측하여 집값을 예측합니다.
- submission.csv: 예측한 집값 데이터를 submission 파일에 작성한 후 제출합니다.

##### train.csv column 소개
- SalePrice - the property's sale price in dollars. This is the target variable that you're trying to predict.
- MSSubClass: The building class
- MSZoning: The general zoning classification
- LotFrontage: Linear feet of street connected to property
- LotArea: Lot size in square feet
- Street: Type of road access
- Alley: Type of alley access
- LotShape: General shape of property
- LandContour: Flatness of the property
- Utilities: Type of utilities available
- LotConfig: Lot configuration
- LandSlope: Slope of property
- Neighborhood: Physical locations within Ames city limits
- Condition1: Proximity to main road or railroad
- Condition2: Proximity to main road or railroad (if a second is present)
- BldgType: Type of dwelling
- HouseStyle: Style of dwelling
- OverallQual: Overall material and finish quality
- OverallCond: Overall condition rating
- YearBuilt: Original construction date
- YearRemodAdd: Remodel date
- RoofStyle: Type of roof
- RoofMatl: Roof material
- Exterior1st: Exterior covering on house
- Exterior2nd: Exterior covering on house (if more than one material)
- MasVnrType: Masonry veneer type
- MasVnrArea: Masonry veneer area in square feet
- ExterQual: Exterior material quality
- ExterCond: Present condition of the material on the exterior
- Foundation: Type of foundation
- BsmtQual: Height of the basement
- BsmtCond: General condition of the basement
- BsmtExposure: Walkout or garden level basement walls
- BsmtFinType1: Quality of basement finished area
- BsmtFinSF1: Type 1 finished square feet
- BsmtFinType2: Quality of second finished area (if present)
- BsmtFinSF2: Type 2 finished square feet
- BsmtUnfSF: Unfinished square feet of basement area
- TotalBsmtSF: Total square feet of basement area
- Heating: Type of heating
- HeatingQC: Heating quality and condition
- CentralAir: Central air conditioning
- Electrical: Electrical system
- 1stFlrSF: First Floor square feet
- 2ndFlrSF: Second floor square feet
- LowQualFinSF: Low quality finished square feet (all floors)
- GrLivArea: Above grade (ground) living area square feet
- BsmtFullBath: Basement full bathrooms
- BsmtHalfBath: Basement half bathrooms
- FullBath: Full bathrooms above grade
- HalfBath: Half baths above grade
- Bedroom: Number of bedrooms above basement level
- Kitchen: Number of kitchens
- KitchenQual: Kitchen quality
- TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
- Functional: Home functionality rating
- Fireplaces: Number of fireplaces
- FireplaceQu: Fireplace quality
- GarageType: Garage location
- GarageYrBlt: Year garage was built
- GarageFinish: Interior finish of the garage
- GarageCars: Size of garage in car capacity
- GarageArea: Size of garage in square feet
- GarageQual: Garage quality
- GarageCond: Garage condition
- PavedDrive: Paved driveway
- WoodDeckSF: Wood deck area in square feet
- OpenPorchSF: Open porch area in square feet
- EnclosedPorch: Enclosed porch area in square feet
- 3SsnPorch: Three season porch area in square feet
- ScreenPorch: Screen porch area in square feet
- PoolArea: Pool area in square feet
- PoolQC: Pool quality
- Fence: Fence quality
- MiscFeature: Miscellaneous feature not covered in other categories
- MiscVal: $Value of miscellaneous feature
- MoSold: Month Sold
- YrSold: Year Sold
- SaleType: Type of sale
- SaleCondition: Condition of sale

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import seaborn as sns
import os
import pickle

%matplotlib inline
warnings.filterwarnings('ignore')

# pickle savedata
def save_data(df, filename):
    filename = os.path.join('./pickle', filename)
    with open(filename, "wb" ) as file:
        pickle.dump(df, file)

# pickle load data
def load_data(filename):
    filename = os.path.join('./pickle', filename)
    with open(filename, "rb" ) as file:
        return pickle.load(file)
    
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
sample_submission = pd.read_csv('sample_submission.csv')

In [3]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999


<IPython.core.display.Javascript object>

In [5]:
train = train.drop('Id', 1)
test = test.drop('Id', 1)


TypeError: DataFrame.drop() takes from 1 to 2 positional arguments but 3 were given