# Tournament Info
- **Dataset** : C-Mechanics
- **Description** (as on Tournament page)
    - This strategy is a trend-following strategy based on the trend of idiosyncratic (an individualizing characteristic) return and volatility.

# To-Do's
- Download needed libraries required for Analysis, ML models
- Perform required cleaning/analysis, if desired
- Run the notebook, submit first submission
- Update analysis or model

-----
-----

# Update Code
## Existing Code
- Aggregate all library imports with try-except blocks
- Aggregate similar category code, such as nulls-check code, into single if-else block or try-except block into single cell
- Add print statements in aggregated blocks of code for debugging
- Aggregate Modeling section - all model initializations and model training into single cell
- Aggregate Preditions and Submission, if possible

## New Code
- Save dataset locally
- Save trained models
- Understand orchestration with 'mlflow'
- Implement orchestration with 'mlflow'

## Import/Install Libraries

In [1]:
#%pip install xgboost

In [2]:
# Lib & Dependencies
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_squared_error
import requests
from scipy import stats

print("Libraries import successful!")

Libraries import successful!


## Download & Explore data

    - Training_data will be use to train your model.
    - Hackathon_data will be use to make your prediciton.
    - Three targets to provide predictions : target_r, target_g, target_b.




In [3]:
# Data Download (may take a few minutes depending on your network)
train_datalink_X = 'https://tournament.crunchdao.com/data/X_train.csv'  
train_datalink_y = 'https://tournament.crunchdao.com/data/y_train.csv' 
hackathon_data_link = 'https://tournament.crunchdao.com/data/X_test.csv' 

# Data for training
train_data = pd.read_csv(train_datalink_X)
# Data for which you will submit your prediction
test_data = pd.read_csv(hackathon_data_link)
# Targets use for your supervised trainning
train_targets = pd.read_csv(train_datalink_y)

In [4]:
# If you don't want to work with time series
train_data = train_data.drop(columns=['id'])
test_data = test_data.drop(columns=['id'])

In [5]:
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)

In [6]:
if train_data.isnull().sum().any() > 0 | test_data.isnull().sum().any() > 0:
    print('No Nulls - Training & Testing dataset')
    
        



False

In [7]:
train_targets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1340815 entries, 0 to 1340814
Data columns (total 3 columns):
 #   Column    Non-Null Count    Dtype  
---  ------    --------------    -----  
 0   target_r  1340815 non-null  float64
 1   target_g  1340815 non-null  float64
 2   target_b  1340815 non-null  float64
dtypes: float64(3)
memory usage: 30.7 MB


In [9]:
display(train_data)

Unnamed: 0,Moons,Feature_1,Feature_2,Feature_3,Feature_4,Feature_5,Feature_6,Feature_7,Feature_8,Feature_9,Feature_10,Feature_11,Feature_12,Feature_13,Feature_14,Feature_15,Feature_16,Feature_17,Feature_18,Feature_19,Feature_20,Feature_21,Feature_22,Feature_23,Feature_24,Feature_25,Feature_26,Feature_27,Feature_28,Feature_29,Feature_30,Feature_31,Feature_32,Feature_33,Feature_34,Feature_35,Feature_36,Feature_37,Feature_38,Feature_39,Feature_40,Feature_41,Feature_42,Feature_43,Feature_44,Feature_45,Feature_46,Feature_47,Feature_48,Feature_49,Feature_50,Feature_51,Feature_52,Feature_53,Feature_54,Feature_55,Feature_56,Feature_57,Feature_58,Feature_59,Feature_60,Feature_61,Feature_62,Feature_63,Feature_64,Feature_65,Feature_66,Feature_67,Feature_68,Feature_69,Feature_70,Feature_71,Feature_72,Feature_73,Feature_74,Feature_75,Feature_76,Feature_77,Feature_78,Feature_79,Feature_80,Feature_81,Feature_82,Feature_83,Feature_84,Feature_85,Feature_86,Feature_87,Feature_88,Feature_89,Feature_90,Feature_91,Feature_92,Feature_93,Feature_94,Feature_95,Feature_96,Feature_97,Feature_98,Feature_99,Feature_100,Feature_101,Feature_102,Feature_103,Feature_104,Feature_105,Feature_106,Feature_107,Feature_108,Feature_109,Feature_110,Feature_111,Feature_112,Feature_113,Feature_114,Feature_115,Feature_116
0,0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,1.0,1.00,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.00,0.00,0.00,0.00,1.0,1.0,1.0,0.0,0.00,1.00,0.00,0.0,0.00,1.0,1.0,1.00,0.00,0.00,1.00,1.00,0.0,1.0,0.00,0.00,1.00,1.00,0.0,0.00,0.0,1.00,0.00,0.00,1.00,1.0,1.00,0.00,0.0,1.0,1.00,0.75,0.75,0.50,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.00,0.75,0.00,0.75,0.00,0.50,0.00,0.50,0.00,0.50,0.00,0.50,0.00,0.00,0.00,0.50,0.00,0.00
1,0,0.75,0.00,0.25,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,1.0,1.00,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.00,0.00,0.00,0.00,1.0,1.0,1.0,0.0,0.00,1.00,0.00,0.0,0.00,1.0,1.0,1.00,0.00,0.00,1.00,1.00,0.0,1.0,0.00,0.00,1.00,1.00,0.0,0.00,0.0,1.00,0.00,0.00,1.00,1.0,1.00,0.00,0.0,1.0,1.00,0.50,0.50,0.75,0.25,0.25,0.50,0.75,0.50,0.50,0.25,0.50,0.50,0.25,0.25,0.00,0.50,0.00,0.25,0.00,0.25,0.00,0.25,0.00,0.50,0.00,0.50,0.00,0.00,0.00,0.25,0.00,0.00
2,0,0.00,1.00,1.00,1.00,1.00,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.50,0.75,1.00,0.75,0.75,0.75,0.75,0.75,0.5,0.50,0.75,1.00,1.00,0.50,0.50,0.25,0.50,0.50,0.50,0.50,0.0,1.0,1.00,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.00,0.00,0.00,0.00,1.0,1.0,1.0,0.0,0.00,1.00,0.00,0.0,1.00,0.0,0.0,1.00,0.00,0.00,1.00,1.00,0.0,1.0,0.00,0.00,1.00,1.00,0.0,0.00,0.0,1.00,0.00,0.00,1.00,1.0,1.00,0.00,0.0,1.0,1.00,0.75,0.75,0.75,0.25,0.25,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.75,1.00,0.75,1.00,0.75,1.00,0.75,0.75,0.75,0.75,0.50,0.75,0.75,1.00,0.75,0.75,0.25,0.50
3,0,0.25,0.25,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.25,0.25,0.50,0.25,0.25,0.50,0.25,0.25,0.25,0.0,1.0,1.00,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.00,0.00,0.00,0.00,1.0,1.0,1.0,0.0,0.00,1.00,0.00,0.0,1.00,0.0,0.0,1.00,0.00,0.00,1.00,1.00,0.0,1.0,0.00,0.00,1.00,1.00,0.0,0.00,0.0,1.00,0.00,0.00,1.00,1.0,1.00,0.00,0.0,1.0,1.00,0.25,0.25,0.00,0.50,0.50,1.00,1.00,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.00,0.75,0.00,0.50,0.00,0.50,0.00,0.50,0.50,0.25,0.25,0.25,0.25,0.75,0.75,0.75,0.25,0.00
4,0,0.75,0.00,0.25,0.25,1.00,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.50,0.75,0.75,0.75,0.75,0.75,0.50,0.50,0.5,0.75,1.00,0.75,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,0.0,1.0,1.00,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.00,0.00,0.00,0.00,1.0,1.0,1.0,0.0,0.00,1.00,0.00,0.0,1.00,0.0,0.0,1.00,0.00,0.00,0.00,0.00,1.0,0.0,1.00,1.00,0.00,0.00,1.0,0.25,0.0,1.00,1.00,1.00,1.00,0.0,0.00,1.00,0.5,1.0,1.00,0.75,0.75,0.25,0.25,0.25,0.00,0.00,0.25,0.00,0.25,0.25,0.25,0.25,0.25,0.75,0.75,0.50,0.75,0.75,0.75,1.00,0.50,1.00,0.75,1.00,0.75,0.25,0.50,0.25,0.25,0.00,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1340810,540,0.75,1.00,1.00,0.25,0.75,0.50,0.50,0.50,0.25,0.75,0.25,0.50,0.50,1.00,0.75,1.00,0.75,0.75,1.00,1.00,1.0,1.00,1.00,0.00,0.00,0.00,0.25,0.25,0.25,0.00,0.00,0.00,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.25,0.50,0.75,0.0,1.0,1.0,0.5,1.00,0.25,0.25,1.0,0.75,1.0,0.0,0.75,0.50,0.25,0.75,1.00,0.0,0.0,0.75,0.00,0.25,0.25,0.5,1.00,1.0,0.00,0.25,0.75,0.25,0.5,1.00,0.00,1.0,0.0,0.00,1.00,1.00,1.00,1.00,0.75,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.75,1.00,1.00,1.00,1.00,1.00,1.00,1.00,0.00,1.00,0.00,1.00,1.00,0.00,0.00,0.00,0.75,0.25
1340811,540,0.75,0.50,0.25,0.50,0.50,0.25,0.25,0.75,1.00,0.75,0.75,0.75,0.75,0.75,0.75,0.50,0.75,0.50,0.50,0.50,0.5,0.75,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.25,0.50,0.75,0.0,0.5,1.0,0.5,0.50,0.50,0.25,1.0,0.75,1.0,1.0,0.75,0.50,1.00,0.75,1.00,1.0,0.0,0.75,0.75,0.25,0.25,1.0,1.00,1.0,0.50,0.25,0.75,1.00,0.5,1.00,0.75,1.0,0.0,1.00,0.00,0.00,0.00,0.00,0.25,0.75,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,0.75,0.00,0.50,0.00,0.75,0.00,1.00,0.00,1.00,0.00,1.00,0.00,0.50,0.50,0.50,0.25,0.25,0.00
1340812,540,0.00,0.00,0.00,0.00,0.25,0.00,0.25,0.50,0.50,0.75,0.75,0.50,1.00,1.00,1.00,1.00,1.00,1.00,0.75,1.00,1.0,1.00,1.00,1.00,1.00,1.00,1.00,1.00,0.75,0.75,0.75,1.00,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.25,0.25,0.75,0.0,1.0,1.0,0.5,1.00,0.50,0.00,1.0,1.00,1.0,1.0,0.75,0.50,1.00,0.75,1.00,1.0,0.0,0.50,0.75,0.00,0.25,1.0,1.00,1.0,0.50,0.25,0.75,0.75,0.5,1.00,0.50,1.0,0.0,1.00,0.25,0.50,0.25,0.25,0.25,0.75,0.50,0.25,0.25,0.25,0.50,0.50,0.50,0.50,0.75,0.75,1.00,0.75,1.00,0.75,1.00,0.75,1.00,0.75,1.00,0.75,1.00,1.00,0.25,0.50,1.00,1.00
1340813,540,0.75,1.00,0.75,1.00,1.00,1.00,0.75,0.75,0.75,0.50,0.75,0.75,0.75,0.50,0.50,0.25,0.50,0.25,0.50,0.50,0.5,0.75,0.75,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.25,0.25,0.75,0.0,0.5,1.0,0.5,0.50,0.50,0.25,1.0,1.00,1.0,1.0,0.75,0.25,1.00,0.75,0.75,1.0,0.0,0.50,0.75,0.25,0.25,1.0,1.00,1.0,0.75,0.25,0.75,1.00,0.5,0.75,0.75,1.0,0.0,1.00,0.25,0.25,0.25,0.00,0.00,0.00,0.25,0.50,0.50,1.00,1.00,0.75,0.75,0.75,0.50,0.25,0.50,0.00,0.75,0.00,0.75,0.00,1.00,0.25,1.00,0.25,0.50,0.75,0.75,1.00,0.50,0.25


## Modeling

#### Models explored so far :
    - Random Forest Regressor
    - Extra Trees Regressor
    - XG Boost
    - Voting Regressor

In [10]:
# model trial
from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor
from sklearn.ensemble import VotingRegressor

In [11]:
SEED = 41

# Random Forest Regressor Initialization
rf_reg = RandomForestRegressor(n_estimators=1000, max_depth=5, min_samples_split=100, min_samples_leaf=50, min_weight_fraction_leaf=0.4, ccp_alpha=0.5, max_samples=0.8,   
                                n_jobs=-1, random_state=SEED)


In [12]:
# Extra Tress Regressor Initialization
et_reg = ExtraTreesRegressor(n_estimators=1000, max_depth=5, min_samples_split=100, min_samples_leaf=50, min_weight_fraction_leaf=0.4, ccp_alpha=0.5, max_samples=0.8,
                            bootstrap=True, n_jobs=-1, random_state=SEED)



#### XGB Parameters Update
Source : [XG Boost Documentation](https://xgboost.readthedocs.io/en/stable/)

* Additional XGBoostRegressor Parameters (tree_method='gpu_hist')
* Need to Figure out -- LEARNING TASK PARAMETERS
* Increase test_size to 0.2


In [13]:
# Xtra Gradient Boosting Regressor Initialization

xgb_reg = xgb.XGBRegressor(max_depth=10, min_child_weight=4, subsample=0.5, learning_rate=0.2, n_estimators=2000, colsample_bytree=0.5, colsample_bylevel=0.5, colsample_bynode=0.5,
                            tree_method="gpu_hist", num_parallel_tree=5, 
                            objective='reg:squaredlogerror', eval_metric='rmsle')
# sample_type='weighted', normalize_type='forest', feature_selector='thrifty', top_k=25

### Score Models using Spearman's rank correlation -> _predictions vs targets_

In [14]:
def scorer(y_test, y_pred):
    score = (stats.spearmanr(y_test, y_pred)*100)[0]
    print('Score as calculated for the leader board (っಠ‿ಠ)っ {}'.format(score))

In [15]:
# Voting Regressor Initialization
models = [('rf', rf_reg), ('et', et_reg), ('xgb', xgb_reg)]
vote_reg = VotingRegressor(estimators=models, verbose=True)

def best_reg(data, target):
    X, y = data, target

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, shuffle=True, random_state=SEED)    

    vote_reg.fit(X_train, y_train)

    preds = vote_reg.predict(X_test)

    scorer(y_test, preds)

    return vote_reg

In [17]:
# Score Model

model_target_r = best_reg(train_data, train_targets.target_r)
model_target_g = best_reg(train_data, train_targets.target_g)
model_target_b = best_reg(train_data, train_targets.target_b)

[Voting] ....................... (1 of 3) Processing rf, total=14.3min
[Voting] ....................... (2 of 3) Processing et, total= 8.7min


KeyboardInterrupt: 

## Predictions

    * When model is accurate enough it's time to predict the target and submit results.
    * Repeat operation on the three targets, concatenate the answers and submit.

### MUST Do's :
1. **Keep raw order identical.**
2. **Be sure that columns are named as : target_r, target_g and target_b.**
3. **Prediction need to be between 0 and 1.**   
4. **Don't submit constant values.**

In [None]:
prediction = pd.DataFrame()
prediction['target_r'] = model_target_r.predict(test_data)
prediction['target_g'] = model_target_g.predict(test_data)
prediction['target_b'] = model_target_b.predict(test_data)

In [None]:
prediction

Unnamed: 0,target_r,target_g,target_b
0,0.477370,0.477370,0.477370
1,0.415001,0.415001,0.415001
2,0.432094,0.432094,0.432094
3,0.519635,0.519635,0.519635
4,0.489073,0.489073,0.489073
...,...,...,...
27323,0.524318,0.524318,0.524318
27324,0.562689,0.562689,0.562689
27325,0.526331,0.526331,0.526331
27326,0.517741,0.517741,0.517741


### Submission
    * Import API_KEY from .env
    * Possible error codes and their descriptions

In [None]:
# Import libraries required for submission

#%pip install python-dotenv 
import os
from dotenv import load_dotenv
import requests


In [None]:

# Import API_KEY from .env

load_dotenv('D:\GitHub_Projects\crunchdao\.env')

API_KEY =  os.getenv('SECRET_KEY')

if API_KEY != None:
    print('Key Import : Nailed It!')
else:
    print('Key Missing!')

Key Import : Nailed It!


In [None]:
# Push Predictions

r = requests.post("https://tournament.crunchdao.com/api/v2/submissions",
    files = {
        "file": ("x", prediction.to_csv().encode('ascii'))
    },
    data = {
        "apiKey": API_KEY
    },
)

if r.status_code == 200:
    print("Submission submitted.")
elif r.status_code == 423:
    print("ERR: Submissions are close")
    print("You can only submit during rounds eg: Friday 7pm GMT+1 to Sunday midnight GMT+1.")
    print("Or the server is currently crunching the submitted files, please wait some time before retrying.")
elif r.status_code == 422:
    print("ERR: API Key is missing or empty")
    print("Did you forget to fill the API_KEY variable?")
elif r.status_code == 400:
    print("ERR: The file must not be empty")
    print("You have send a empty file.")
elif r.status_code == 401:
    print("ERR: Your email hasn't been verified")
    print("Please verify your email or contact a cruncher.")
elif r.status_code == 409:
    print("ERR: Duplicate submission")
    print("Your work has already been submitted with the same exact results, if you think that this a false positive, contact a cruncher.")
    print("MD5 collision probability: 1/2^128 (source: https://stackoverflow.com/a/288519/7292958)")
elif r.status_code == 429:
    print("ERR: Too many submissions")
else:
    print("ERR: Server returned: " + str(r.status_code))
    print("Ouch! It seems that we were not expecting this kind of result from the server, if the probleme persist, contact a cruncher.")

Submission submitted.
