# Modeling with Support Vector Regressor
- The support vector machine (SVM) is a data-classification algorithm that assigns new data elements to one of the labeled categories, it assumes that the data in question contains some possible target values. In Machine Learning or Data Science are quite familiar with the term Support Vector Machine, but Support Vector Regression (SVR) is a bit different from SVM. As the name suggests the SVR is a regression algorithm, so we can use SVR for working with continuous Values instead of Classification which is SVM. The basic idea of SVR is to find a function f(y) that has most ε deviation from the actually obtained target for the training data yi, and at the same time is as flat as possible. In other words, **we do not care about the errors as long as they are less than ε**. **This property determines the SVR to be less sensitive to outliers than the quadratic loss function.**

In [1]:
%matplotlib inline
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)

import warnings
from scipy import stats
from sklearn.preprocessing import OneHotEncoder, StandardScaler, RobustScaler

In [2]:
df_train = pd.read_csv('../input/train_macro_without_outliers.csv', index_col=0)
df_train_augmented = pd.read_csv('../input/train_macro_with_outliers.csv', index_col=0)
df_test = pd.read_csv('../input/test_macro.csv', index_col=0)

In [3]:
df_train.tail(2)

Unnamed: 0_level_0,usdrub,full_sq,life_sq,floor,num_room,kitch_sq,state,product_type,area_m,preschool_education_centers_raion,school_education_centers_raion,school_education_centers_top_20_raion,hospital_beds_raion,healthcare_centers_raion,university_top_20_raion,sport_objects_raion,culture_objects_top_25,shopping_centers_raion,office_raion,thermal_power_plant_raion,incineration_raion,oil_chemistry_raion,radiation_raion,railroad_terminal_raion,big_market_raion,nuclear_reactor_raion,detention_facility_raion,0_17_all,raion_build_count_with_material_info,build_count_brick,build_count_monolith,raion_build_count_with_builddate_info,build_count_before_1920,metro_min_avto,kindergarten_km,school_km,park_km,railroad_station_walk_min,railroad_station_avto_min,public_transport_station_min_walk,water_1line,ttk_km,sadovoe_km,bulvar_ring_km,kremlin_km,big_road1_km,big_road1_1line,big_road2_km,railroad_1line,zd_vokzaly_avto_km,bus_terminal_avto_km,oil_chemistry_km,nuclear_reactor_km,radiation_km,power_transmission_line_km,thermal_power_plant_km,ts_km,market_shop_km,fitness_km,swim_pool_km,ice_rink_km,stadium_km,basketball_km,hospice_morgue_km,detention_facility_km,public_healthcare_km,university_km,workplaces_km,shopping_centers_km,office_km,additional_education_km,preschool_km,big_church_km,church_synagogue_km,mosque_km,theater_km,museum_km,exhibition_km,catering_km,ecology,office_count_500,office_sqm_500,cafe_count_500,cafe_count_500_price_1000,cafe_count_500_price_1500,leisure_count_500,office_count_1000,office_sqm_1000,cafe_count_1000,cafe_count_1000_na_price,cafe_count_1000_price_1000,cafe_count_1000_price_1500,cafe_count_1000_price_high,leisure_count_1000,sport_count_1000,office_count_1500,office_sqm_1500,trc_count_1500,cafe_count_1500,cafe_sum_1500_min_price_avg,cafe_sum_1500_max_price_avg,cafe_avg_price_1500,cafe_count_1500_na_price,cafe_count_1500_price_500,cafe_count_1500_price_1000,cafe_count_1500_price_1500,cafe_count_1500_price_2500,cafe_count_1500_price_high,mosque_count_1500,leisure_count_1500,sport_count_1500,green_part_2000,office_count_2000,office_sqm_2000,trc_count_2000,trc_sqm_2000,cafe_count_2000,cafe_sum_2000_max_price_avg,cafe_count_2000_na_price,cafe_count_2000_price_500,cafe_count_2000_price_1000,cafe_count_2000_price_1500,cafe_count_2000_price_2500,cafe_count_2000_price_high,mosque_count_2000,sport_count_2000,market_count_2000,green_part_3000,office_count_3000,office_sqm_3000,trc_count_3000,trc_sqm_3000,big_church_count_3000,church_count_3000,leisure_count_3000,sport_count_3000,market_count_3000,green_part_5000,office_count_5000,office_sqm_5000,trc_count_5000,trc_sqm_5000,big_church_count_5000,church_count_5000,mosque_count_5000,leisure_count_5000,sport_count_5000,market_count_5000,room_size,avg_price_sub_area,price_doc
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1
30472,55.2655,64,32.1,5.1,2.1,11.1,2.0,Investment,6050064.566,6.1,8.1,0.1,5469.953703,2.1,1.1,11.1,no,4.1,5.1,no,no,no,yes,no,no,no,no,10896,185.0,4.1,9.1,186.0,0.977191,3.477814,0.20302,0.230667,1.772506,82.75034,9.128624,3.13833,no,8.940313,11.752036,12.872535,13.622569,0.960608,no,2.174001,no,15.303338,5.45866,19.591574,8.011139,0.718679,1.971656,6.417997,3.881523,2.711199,0.512813,0.73128,2.730674,2.374106,2.2105,1.625064,24.788893,2.428096,1.98245,2.440429,1.208672,1.304798,1.440017,0.230667,1.644053,0.576021,2.748055,2.088193,4.119706,1.800186,0.134566,satisfactory,0.1,0.1,3.1,0.1,3.1,0.1,0.1,0.1,13.1,0.1,6.1,5.1,0.1,0.1,10.1,1.1,37800.1,1.1,42.1,646.34,1097.56,871.95,1.1,15.1,13.1,8.1,5.1,0.1,0.1,0.1,15.1,32.0,2.1,107800.1,10.1,136296.1,67.1,1195.31,3.1,17.1,23.1,15.1,9.1,0.1,0.1,18.1,2.1,30.31,15.1,473168.1,25.1,481350.1,2.1,17.1,2.1,33.1,4.1,30.36,39.1,1225712.1,45.1,1464521.1,6.1,31.1,1.1,4.1,65.1,7.1,16.1,13924200.0,13500000
30473,55.2655,43,28.1,1.1,2.1,6.1,2.0,Investment,4395332.782,4.1,4.1,0.1,3184.953703,2.1,0.1,7.1,no,5.1,1.1,no,no,no,yes,no,no,no,no,14994,304.0,105.1,4.1,303.0,1.977191,0.684636,0.093619,0.47895,0.848766,24.453053,3.604917,3.001814,no,6.809408,9.675169,10.228634,11.812614,1.920884,no,2.08923,no,12.243439,5.645123,3.261652,14.359141,0.999365,1.832979,5.239948,1.515294,1.902431,0.919001,2.470789,1.784764,3.641656,0.812511,0.633901,8.868202,1.064828,2.731394,3.165101,0.324601,2.208265,0.925811,0.47895,0.480531,0.967332,8.987913,0.688707,0.127867,2.477068,0.182831,poor,0.1,0.1,4.1,2.1,0.1,4.1,0.1,0.1,11.1,0.1,6.1,1.1,0.1,4.1,3.1,0.1,0.1,5.1,17.1,600.0,1058.82,829.41,0.1,4.1,9.1,3.1,1.1,0.1,0.1,4.1,6.1,23.49,0.1,0.1,5.1,39106.1,26.1,942.31,0.1,10.1,11.1,4.1,1.1,0.1,0.1,12.1,2.1,28.52,6.1,155237.1,13.1,545023.1,3.1,7.1,7.1,26.1,4.1,25.1,15.1,351244.1,22.1,646575.1,7.1,16.1,0.1,9.1,54.1,10.1,14.1,6168624.0,5600000


In [4]:
df_train_augmented.tail(2)

Unnamed: 0_level_0,usdrub,full_sq,life_sq,floor,num_room,kitch_sq,state,product_type,area_m,preschool_education_centers_raion,school_education_centers_raion,school_education_centers_top_20_raion,hospital_beds_raion,healthcare_centers_raion,university_top_20_raion,sport_objects_raion,culture_objects_top_25,shopping_centers_raion,office_raion,thermal_power_plant_raion,incineration_raion,oil_chemistry_raion,radiation_raion,railroad_terminal_raion,big_market_raion,nuclear_reactor_raion,detention_facility_raion,0_17_all,raion_build_count_with_material_info,build_count_brick,build_count_monolith,raion_build_count_with_builddate_info,build_count_before_1920,metro_min_avto,kindergarten_km,school_km,park_km,railroad_station_walk_min,railroad_station_avto_min,public_transport_station_min_walk,water_1line,ttk_km,sadovoe_km,bulvar_ring_km,kremlin_km,big_road1_km,big_road1_1line,big_road2_km,railroad_1line,zd_vokzaly_avto_km,bus_terminal_avto_km,oil_chemistry_km,nuclear_reactor_km,radiation_km,power_transmission_line_km,thermal_power_plant_km,ts_km,market_shop_km,fitness_km,swim_pool_km,ice_rink_km,stadium_km,basketball_km,hospice_morgue_km,detention_facility_km,public_healthcare_km,university_km,workplaces_km,shopping_centers_km,office_km,additional_education_km,preschool_km,big_church_km,church_synagogue_km,mosque_km,theater_km,museum_km,exhibition_km,catering_km,ecology,office_count_500,office_sqm_500,cafe_count_500,cafe_count_500_price_1000,cafe_count_500_price_1500,leisure_count_500,office_count_1000,office_sqm_1000,cafe_count_1000,cafe_count_1000_na_price,cafe_count_1000_price_1000,cafe_count_1000_price_1500,cafe_count_1000_price_high,leisure_count_1000,sport_count_1000,office_count_1500,office_sqm_1500,trc_count_1500,cafe_count_1500,cafe_sum_1500_min_price_avg,cafe_sum_1500_max_price_avg,cafe_avg_price_1500,cafe_count_1500_na_price,cafe_count_1500_price_500,cafe_count_1500_price_1000,cafe_count_1500_price_1500,cafe_count_1500_price_2500,cafe_count_1500_price_high,mosque_count_1500,leisure_count_1500,sport_count_1500,green_part_2000,office_count_2000,office_sqm_2000,trc_count_2000,trc_sqm_2000,cafe_count_2000,cafe_sum_2000_max_price_avg,cafe_count_2000_na_price,cafe_count_2000_price_500,cafe_count_2000_price_1000,cafe_count_2000_price_1500,cafe_count_2000_price_2500,cafe_count_2000_price_high,mosque_count_2000,sport_count_2000,market_count_2000,green_part_3000,office_count_3000,office_sqm_3000,trc_count_3000,trc_sqm_3000,big_church_count_3000,church_count_3000,leisure_count_3000,sport_count_3000,market_count_3000,green_part_5000,office_count_5000,office_sqm_5000,trc_count_5000,trc_sqm_5000,big_church_count_5000,church_count_5000,mosque_count_5000,leisure_count_5000,sport_count_5000,market_count_5000,room_size,avg_price_sub_area,price_doc
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1
30472,55.2655,64,32.1,5.1,2.1,11.1,2.0,Investment,6050064.566,6.1,8.1,0.1,5469.953703,2.1,1.1,11.1,no,4.1,5.1,no,no,no,yes,no,no,no,no,10896,185.0,4.1,9.1,186.0,0.977191,3.477814,0.20302,0.230667,1.772506,82.75034,9.128624,3.13833,no,8.940313,11.752036,12.872535,13.622569,0.960608,no,2.174001,no,15.303338,5.45866,19.591574,8.011139,0.718679,1.971656,6.417997,3.881523,2.711199,0.512813,0.73128,2.730674,2.374106,2.2105,1.625064,24.788893,2.428096,1.98245,2.440429,1.208672,1.304798,1.440017,0.230667,1.644053,0.576021,2.748055,2.088193,4.119706,1.800186,0.134566,satisfactory,0.1,0.1,3.1,0.1,3.1,0.1,0.1,0.1,13.1,0.1,6.1,5.1,0.1,0.1,10.1,1.1,37800.1,1.1,42.1,646.34,1097.56,871.95,1.1,15.1,13.1,8.1,5.1,0.1,0.1,0.1,15.1,32.0,2.1,107800.1,10.1,136296.1,67.1,1195.31,3.1,17.1,23.1,15.1,9.1,0.1,0.1,18.1,2.1,30.31,15.1,473168.1,25.1,481350.1,2.1,17.1,2.1,33.1,4.1,30.36,39.1,1225712.1,45.1,1464521.1,6.1,31.1,1.1,4.1,65.1,7.1,16.1,13924200.0,13500000
30473,55.2655,43,28.1,1.1,2.1,6.1,2.0,Investment,4395332.782,4.1,4.1,0.1,3184.953703,2.1,0.1,7.1,no,5.1,1.1,no,no,no,yes,no,no,no,no,14994,304.0,105.1,4.1,303.0,1.977191,0.684636,0.093619,0.47895,0.848766,24.453053,3.604917,3.001814,no,6.809408,9.675169,10.228634,11.812614,1.920884,no,2.08923,no,12.243439,5.645123,3.261652,14.359141,0.999365,1.832979,5.239948,1.515294,1.902431,0.919001,2.470789,1.784764,3.641656,0.812511,0.633901,8.868202,1.064828,2.731394,3.165101,0.324601,2.208265,0.925811,0.47895,0.480531,0.967332,8.987913,0.688707,0.127867,2.477068,0.182831,poor,0.1,0.1,4.1,2.1,0.1,4.1,0.1,0.1,11.1,0.1,6.1,1.1,0.1,4.1,3.1,0.1,0.1,5.1,17.1,600.0,1058.82,829.41,0.1,4.1,9.1,3.1,1.1,0.1,0.1,4.1,6.1,23.49,0.1,0.1,5.1,39106.1,26.1,942.31,0.1,10.1,11.1,4.1,1.1,0.1,0.1,12.1,2.1,28.52,6.1,155237.1,13.1,545023.1,3.1,7.1,7.1,26.1,4.1,25.1,15.1,351244.1,22.1,646575.1,7.1,16.1,0.1,9.1,54.1,10.1,14.1,6168624.0,5600000


In [5]:
df_test.tail(2)

Unnamed: 0_level_0,usdrub,full_sq,life_sq,floor,num_room,kitch_sq,state,product_type,area_m,raion_popul,preschool_education_centers_raion,school_education_centers_raion,school_education_centers_top_20_raion,hospital_beds_raion,healthcare_centers_raion,university_top_20_raion,sport_objects_raion,culture_objects_top_25,shopping_centers_raion,office_raion,thermal_power_plant_raion,incineration_raion,oil_chemistry_raion,radiation_raion,railroad_terminal_raion,big_market_raion,nuclear_reactor_raion,detention_facility_raion,young_all,work_all,ekder_all,0_17_all,raion_build_count_with_material_info,build_count_brick,build_count_monolith,raion_build_count_with_builddate_info,build_count_before_1920,metro_min_avto,kindergarten_km,school_km,park_km,railroad_station_walk_min,railroad_station_avto_min,public_transport_station_min_walk,water_1line,ttk_km,sadovoe_km,bulvar_ring_km,kremlin_km,big_road1_km,big_road1_1line,big_road2_km,railroad_1line,zd_vokzaly_avto_km,bus_terminal_avto_km,oil_chemistry_km,nuclear_reactor_km,radiation_km,power_transmission_line_km,thermal_power_plant_km,ts_km,market_shop_km,fitness_km,swim_pool_km,ice_rink_km,stadium_km,basketball_km,hospice_morgue_km,detention_facility_km,public_healthcare_km,university_km,workplaces_km,shopping_centers_km,office_km,additional_education_km,preschool_km,big_church_km,church_synagogue_km,mosque_km,theater_km,museum_km,exhibition_km,catering_km,ecology,office_count_500,office_sqm_500,cafe_count_500,cafe_count_500_price_1000,cafe_count_500_price_1500,leisure_count_500,office_count_1000,office_sqm_1000,cafe_count_1000,cafe_count_1000_na_price,cafe_count_1000_price_1000,cafe_count_1000_price_1500,cafe_count_1000_price_high,leisure_count_1000,sport_count_1000,office_count_1500,office_sqm_1500,trc_count_1500,cafe_count_1500,cafe_sum_1500_min_price_avg,cafe_sum_1500_max_price_avg,cafe_avg_price_1500,cafe_count_1500_na_price,cafe_count_1500_price_500,cafe_count_1500_price_1000,cafe_count_1500_price_1500,cafe_count_1500_price_2500,cafe_count_1500_price_high,mosque_count_1500,leisure_count_1500,sport_count_1500,green_part_2000,office_count_2000,office_sqm_2000,trc_count_2000,trc_sqm_2000,cafe_count_2000,cafe_sum_2000_max_price_avg,cafe_count_2000_na_price,cafe_count_2000_price_500,cafe_count_2000_price_1000,cafe_count_2000_price_1500,cafe_count_2000_price_2500,cafe_count_2000_price_high,mosque_count_2000,sport_count_2000,market_count_2000,green_part_3000,office_count_3000,office_sqm_3000,trc_count_3000,trc_sqm_3000,cafe_count_3000,cafe_count_3000_na_price,cafe_count_3000_price_500,cafe_count_3000_price_1000,cafe_count_3000_price_1500,cafe_count_3000_price_2500,cafe_count_3000_price_4000,cafe_count_3000_price_high,big_church_count_3000,church_count_3000,leisure_count_3000,sport_count_3000,market_count_3000,green_part_5000,office_count_5000,office_sqm_5000,trc_count_5000,trc_sqm_5000,cafe_count_5000,cafe_count_5000_na_price,cafe_count_5000_price_500,cafe_count_5000_price_1000,cafe_count_5000_price_1500,cafe_count_5000_price_2500,cafe_count_5000_price_4000,cafe_count_5000_price_high,big_church_count_5000,church_count_5000,mosque_count_5000,leisure_count_5000,sport_count_5000,market_count_5000,room_size,avg_price_sub_area
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1
38134,65.6745,34.8,19.8,8.0,1.0,6.4,2.0,Investment,7128794.338,145576,7,7,1,1031.546494,1,0,7,no,6,0,no,no,no,no,no,yes,no,no,13595,104635,27346,14976,195.0,0.0,3.0,195.0,0.0,1.469263,0.073023,0.20854,2.950264,60.226932,6.16457,1.392524,no,11.538742,14.88307,16.528376,17.137752,1.425847,no,1.425847,no,17.433375,2.160649,6.997322,5.259407,3.258864,2.011375,6.365003,4.649813,1.28065,0.542683,1.27775,1.116351,6.449755,3.994409,2.596009,4.655112,1.350505,3.679888,1.795903,0.469357,1.467622,0.676312,0.20854,2.411682,0.331122,8.247379,12.564484,3.127103,3.618234,0.322872,poor,0,0,3,2,0,0,0,0,20,1,9,4,0,0,6,2,54500,6,27,630.77,1057.69,844.23,1,8,9,8,1,0,0,0,9,10.24,2,54500,11,779021,37,1161.76,3,10,12,9,2,0,0,13,3,21.96,2,54500,17,1399021,50,3,12,20,9,4,2,0,2,5,0,16,4,17.69,2,54500,30,1555688,89,10,20,34,16,7,2,0,5,11,0,2,43,10,19.8,6774592.0
38135,65.6745,63.0,43.8,5.0,3.0,7.1,3.0,Investment,6206098.885,111874,5,5,0,1132.811184,0,0,10,no,11,5,no,no,no,no,no,no,no,no,13326,73503,25045,14552,158.0,0.0,3.0,157.0,0.0,2.974336,0.179474,0.498553,3.013805,15.587705,7.176224,1.015379,no,7.784206,10.97325,12.427532,13.037478,0.418628,no,5.902435,no,12.410939,7.185025,12.515646,4.419469,1.6479,1.086247,6.085476,1.897355,4.298827,0.356533,0.566859,8.818686,9.295427,2.054947,2.967581,15.558481,3.284689,5.018347,0.498553,0.304042,0.150085,0.500342,0.498553,0.711156,0.732334,1.84446,3.808958,4.530642,3.07137,0.078852,poor,2,112562,2,1,1,0,2,112562,9,1,2,2,0,0,5,5,467562,11,29,603.7,1018.52,811.11,2,11,8,6,2,0,0,0,8,8.66,8,480362,13,462600,40,1121.62,3,13,8,13,3,0,1,9,0,17.38,12,591362,20,890503,76,5,26,22,17,6,0,0,2,8,0,24,6,25.88,35,1662474,41,1474430,137,11,41,42,31,11,1,0,6,26,1,4,42,11,14.6,7424558.0


## 1. Preprocessing
- Label Encoding
- Log nomalization
- Standard scaling

## 2. Support Vector Regreesor

In [6]:
cate_features = list(set(df_train.columns) - set(df_train._get_numeric_data().columns))
numeric_features = list(df_train.columns.drop(cate_features + ['price_doc']).values)

## 1. Preprocessing
- Label Encoding
- Log nomalization
- Standard scaling

### Label Encoding

In [7]:
df_train_cat = df_train[cate_features]
df_train_cat1 = df_train_cat
encode = OneHotEncoder(sparse=False)

for col in cate_features:
    encode.fit(df_train_cat[[col]])
    transform = encode.transform(df_train_cat[[col]])
    
    transform = pd.DataFrame(transform, 
                             columns=[(col+"_"+str(i)) for i in df_train_cat[col].value_counts().index])
    transform = transform.set_index(df_train_cat.index.values)
    
    df_train_cat1 = pd.concat([df_train_cat1, transform], axis=1)
    df_train_cat1 = df_train_cat1.drop(col, 1)

In [None]:
df_train_cat1.tail(2)

### Log nomalizaion

In [8]:
from scipy import stats

# Log Normalization of Numeric Features
for column in numeric_features + ['price_doc']:
    if stats.skew(df_train[column].values) > 1:
        df_train[column] = np.log(df_train[column] + 1)  
        df_train_augmented[column] = np.log(df_train_augmented[column] + 1)
        if column in df_test.columns.values:
            df_test[column]  = np.log(df_test[column] + 1)

### Standard scaling

In [9]:
# 평균 0 표준편차 1이 되도록 스케일링
train_scaler = StandardScaler()
train_scaler.fit(df_train[numeric_features])

scaled_numeric_train_X = train_scaler.transform(df_train[numeric_features])
df_scaled_numeric_train_X = pd.DataFrame(scaled_numeric_train_X, index=df_train.index, columns=numeric_features)
df_train = pd.concat([df_scaled_numeric_train_X, df_train_cat1, df_train['price_doc']], axis=1)

# scaled_numeric_test_X = train_scaler.transform(df_test[numeric_features])
# df_scaled_numeric_test_X = pd.DataFrame(scaled_numeric_test_X, index=df_test.index, columns=numeric_features)
# df_test = pd.concat([df_scaled_numeric_test_X, df_train[cate_features]], axis=1)

# scaled_numeric_train_X = train_scaler.transform(df_train_augmented[numeric_features])
# df_scaled_numeric_train_X = pd.DataFrame(scaled_numeric_train_X, index=df_train_augmented.index, columns=numeric_features)
# df_train = pd.concat([df_scaled_numeric_train_X, df_train_augmented[cate_features]],axis=1)

  return self.partial_fit(X, y)
  """


In [10]:
df_train.tail(2)

Unnamed: 0,usdrub,full_sq,life_sq,floor,num_room,kitch_sq,state,area_m,preschool_education_centers_raion,school_education_centers_raion,school_education_centers_top_20_raion,hospital_beds_raion,healthcare_centers_raion,university_top_20_raion,sport_objects_raion,shopping_centers_raion,office_raion,0_17_all,raion_build_count_with_material_info,build_count_brick,build_count_monolith,raion_build_count_with_builddate_info,build_count_before_1920,metro_min_avto,kindergarten_km,school_km,park_km,railroad_station_walk_min,railroad_station_avto_min,public_transport_station_min_walk,ttk_km,sadovoe_km,bulvar_ring_km,kremlin_km,big_road1_km,big_road2_km,zd_vokzaly_avto_km,bus_terminal_avto_km,oil_chemistry_km,nuclear_reactor_km,radiation_km,power_transmission_line_km,thermal_power_plant_km,ts_km,market_shop_km,fitness_km,swim_pool_km,ice_rink_km,stadium_km,basketball_km,hospice_morgue_km,detention_facility_km,public_healthcare_km,university_km,workplaces_km,shopping_centers_km,office_km,additional_education_km,preschool_km,big_church_km,church_synagogue_km,mosque_km,theater_km,museum_km,exhibition_km,catering_km,office_count_500,office_sqm_500,cafe_count_500,cafe_count_500_price_1000,cafe_count_500_price_1500,leisure_count_500,office_count_1000,office_sqm_1000,cafe_count_1000,cafe_count_1000_na_price,cafe_count_1000_price_1000,cafe_count_1000_price_1500,cafe_count_1000_price_high,leisure_count_1000,sport_count_1000,office_count_1500,office_sqm_1500,trc_count_1500,cafe_count_1500,cafe_sum_1500_min_price_avg,cafe_sum_1500_max_price_avg,cafe_avg_price_1500,cafe_count_1500_na_price,cafe_count_1500_price_500,cafe_count_1500_price_1000,cafe_count_1500_price_1500,cafe_count_1500_price_2500,cafe_count_1500_price_high,mosque_count_1500,leisure_count_1500,sport_count_1500,green_part_2000,office_count_2000,office_sqm_2000,trc_count_2000,trc_sqm_2000,cafe_count_2000,cafe_sum_2000_max_price_avg,cafe_count_2000_na_price,cafe_count_2000_price_500,cafe_count_2000_price_1000,cafe_count_2000_price_1500,cafe_count_2000_price_2500,cafe_count_2000_price_high,mosque_count_2000,sport_count_2000,market_count_2000,green_part_3000,office_count_3000,office_sqm_3000,trc_count_3000,trc_sqm_3000,big_church_count_3000,church_count_3000,leisure_count_3000,sport_count_3000,market_count_3000,green_part_5000,office_count_5000,office_sqm_5000,trc_count_5000,trc_sqm_5000,big_church_count_5000,church_count_5000,mosque_count_5000,leisure_count_5000,sport_count_5000,market_count_5000,room_size,avg_price_sub_area,railroad_terminal_raion_no,railroad_terminal_raion_yes,nuclear_reactor_raion_no,nuclear_reactor_raion_yes,detention_facility_raion_no,detention_facility_raion_yes,ecology_no data,ecology_good,ecology_poor,ecology_excellent,ecology_satisfactory,culture_objects_top_25_no,culture_objects_top_25_yes,thermal_power_plant_raion_no,thermal_power_plant_raion_yes,water_1line_no,water_1line_yes,oil_chemistry_raion_no,oil_chemistry_raion_yes,big_market_raion_no,big_market_raion_yes,incineration_raion_no,incineration_raion_yes,big_road1_1line_no,big_road1_1line_yes,radiation_raion_no,radiation_raion_yes,product_type_Investment,product_type_OwnerOccupier,railroad_1line_no,railroad_1line_yes,price_doc
30472,1.834105,0.752733,0.089728,-0.52174,0.179317,1.130801,-0.03416,-0.901343,0.658269,0.975016,-0.322377,1.762127,0.779084,2.514275,0.944188,0.418008,0.593391,-0.141997,0.035858,-1.258351,0.518236,0.042921,-0.507295,-0.031417,-0.677593,-0.831873,-0.244537,0.919986,0.920955,0.180626,-0.065891,-0.087813,-0.055495,-0.087672,-0.740215,-0.689143,0.006953,-0.588652,0.173815,-0.19938,-1.175996,-0.200835,0.034556,-0.031246,-0.222101,-0.677684,-1.57665,-0.880035,-1.372364,-0.445854,-0.365496,0.94332,-0.134618,-0.99782,-0.177768,-0.046449,-0.259476,0.237768,-0.85613,-0.115491,-0.840906,-1.13872,-1.630298,-0.407877,-0.987536,-0.94231,-0.430364,-0.48802,0.633537,-0.617938,1.91094,-0.205669,-0.680333,-0.882595,0.839032,-0.530255,1.108213,1.157861,-0.1904,-0.341651,1.759636,-0.254122,0.692382,-0.511738,1.134498,-0.347916,-0.374563,-0.365927,0.159789,1.42632,1.146115,0.956275,1.254134,-0.214512,-0.186549,-0.442363,1.363638,0.755442,-0.133698,0.698584,0.947134,0.54118,1.091935,0.084655,0.676177,1.112071,1.173211,1.061051,1.361857,-0.237628,-0.295607,1.021304,0.901275,0.495354,0.660608,0.736532,1.123585,0.536238,-0.043083,1.131769,0.480372,0.911944,0.796874,0.628507,0.636347,0.552294,0.732992,0.385565,0.025878,0.710339,1.115341,0.385265,0.678949,0.278354,-0.214551,2.678325,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,16.4182
30473,1.834105,-0.455776,-0.151525,-1.287503,0.179317,0.340487,-0.03416,-1.292956,0.006018,-0.164057,-0.322377,0.156325,0.779084,-0.298584,0.533404,0.619369,-0.390754,0.288841,0.417914,0.575246,-0.156562,0.417386,-0.161859,-1.446784,-0.865174,-0.501026,-0.863507,-0.646328,-0.564489,0.13889,-0.411362,-0.386793,-0.406891,-0.310343,-0.004659,-0.73444,-0.343327,-0.542269,-1.383538,0.629459,-0.97141,-0.268453,-0.268741,-1.112238,-0.626061,-0.146892,-0.422004,-1.379593,-0.907065,-1.208099,-1.259975,-0.295208,-1.004393,-0.695363,0.083941,-1.063486,0.308595,-0.288163,-0.523303,-1.145204,-0.057622,0.480687,-2.54709,-2.641963,-0.66027,-0.827246,-0.430364,-0.48802,0.873847,1.196468,-0.525933,9.593191,-0.680333,-0.882595,0.715783,-0.530255,1.108213,-0.072931,-0.1904,3.219009,0.484211,-0.846199,-1.181051,0.741033,0.525203,-0.607801,-0.518308,-0.556918,-0.721899,0.394415,0.837358,0.202303,0.043108,-0.214512,-0.186549,2.284614,0.506552,0.161613,-0.950317,-1.340393,0.327183,0.291807,0.46747,-0.984868,-0.846636,0.714833,0.606513,0.062037,-0.171339,-0.237628,-0.295607,0.667124,0.901275,0.370639,0.124136,0.539941,0.565871,0.564086,0.247331,0.150748,1.546428,0.720793,0.796874,0.167103,0.115585,0.252758,-0.263881,-0.458974,0.148493,0.024445,-0.732051,0.973582,0.527112,0.896021,-0.527256,-0.286631,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,15.538277


## 2. Support Vector Regreesor

In [11]:
from sklearn.svm import SVR
import numpy as np


y= df_train.iloc[:, -1]
X = df_train.iloc[:, :-1]
clf = SVR(gamma='scale', C=1.0, epsilon=0.2)
clf.fit(X, y) 
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.2, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)


SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.2, gamma='scale',
  kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [12]:
score

NameError: name 'score' is not defined