I decided to study the effects of different 35/65% random splits on hold out data and determined the differences were potentially significant.  To demo this, I forked Reynaldo's kernel...

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn import model_selection, preprocessing, metrics
import xgboost as xgb
import datetime
#now = datetime.datetime.now()

otrain = pd.read_csv('../input/train.csv')

train = otrain.iloc[0:24000].copy()
test = otrain.iloc[24000:].copy()

id_test = test.id


y_train = train["price_doc"]
x_train = train.drop(["id", "timestamp", "price_doc"], axis=1)
x_test = test.drop(["id", "timestamp", "price_doc"], axis=1)

lbl1 = {}
lbl2 = {}

for c in x_train.columns:
    if x_train[c].dtype == 'object':
        lbl1[c] = preprocessing.LabelEncoder()
        lbl1[c].fit(list(x_train[c].values)) 
        x_train[c] = lbl1[c].transform(list(x_train[c].values))
        #x_train.drop(c,axis=1,inplace=True)
        
for c in x_test.columns:
    if x_test[c].dtype == 'object':
        lbl2[c] = preprocessing.LabelEncoder()
        lbl2[c].fit(list(x_test[c].values)) 
        x_test[c] = lbl2[c].transform(list(x_test[c].values))
        #x_test.drop(c,axis=1,inplace=True)        

xgb_params = {
    'eta': 0.05,
    'max_depth': 5,
    'subsample': 0.7,
    'colsample_bytree': 0.7,
    'objective': 'reg:linear',
    'eval_metric': 'rmse',
    'silent': 1
}

dtrain = xgb.DMatrix(x_train, y_train)
dtest = xgb.DMatrix(x_test)

cv_output = xgb.cv(xgb_params, dtrain, num_boost_round=1000, early_stopping_rounds=20,
    verbose_eval=50, show_stdv=True)
cv_output[['train-rmse-mean', 'test-rmse-mean']].plot()

In [None]:
num_boost_rounds = len(cv_output)
model = xgb.train(dict(xgb_params, silent=0), dtrain, num_boost_round= num_boost_rounds)

y_predict = model.predict(dtest)
output = pd.DataFrame({'id': id_test, 'price_doc': y_predict})
#output.head()

def rmsle(tgt, preds):
    return np.sqrt(sklearn.metrics.mean_squared_log_error(tgt, preds))


In [None]:
# oops, but i don't feel like rerunning that code ;)
import sklearn.metrics

# now split the held-back predicted output and run several different 35/65% splits on it...

sp = int(len(test) * .35)

for seeds in [0, 1337, 31337, 71331, 12345, 54321]:
    ind = output.index.copy()
    np.random.seed(seeds)
    ind = np.random.permutation(ind)
    
    tmp = test[['price_doc', 'id']].copy()
    tmp['preds'] = output.price_doc
    
    tmp = tmp.loc[ind]
    
    print(rmsle(tmp.iloc[0:sp].price_doc, tmp.iloc[0:sp].preds), rmsle(tmp.iloc[sp:].price_doc, tmp.iloc[sp:].preds))

In [None]:
# let's also take the entire HB area for comparison...
print(rmsle(tmp.price_doc, tmp.preds))

As you can see, the 35/65 scores are rarely close.  It is very likely (but NOT certain) that the LB shakeup will be significant at the end!

A run on a home system running the same docker container, given xgb seed of 12345 gave me these results:

    0.397478374522 0.420745197489
    0.41342785749 0.412390991909
    0.399848227388 0.419535052541
    0.424963481769 0.406031620705
    0.406878538719 0.415881614054
    0.408207325049 0.41518027333
    0.412754054895

This indicates that there is a ~25% chance a model with slightly better HB score (i.e. CV) will perform worse on the private leaderboard.