Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross Validation Added #407

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

prakrutisingh24
Copy link
Contributor

No description provided.

@@ -20,6 +20,9 @@
from slugify import slugify
from tornado.gen import coroutine
from tornado.web import HTTPError
from sklearn.metrics import get_scorer
from sklearn.model_selection import cross_val_predict, cross_val_score
from sklearn.model_selection import cross_val_predict, cross_val_score
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line appears twice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra line is unnecessary.

# train the model
target = data[target_col]
train = data[[c for c in data if c != target_col]]
# cross validation
mod = cls.modelFunction()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required. The model is already present as cls.model, see line no: 116.

# cross validation
mod = cls.modelFunction()
CVscore = cross_val_score(mod, train, target)
CV = sum(CVscore)/len(CVscore)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Use CVscore.mean()
  2. Variable naming has to follow a specified style - do pip install flake8 and run the flake8 command against this file, i.e. flake8 mlhandler.py, and check the output.

@jaidevd
Copy link
Member

jaidevd commented May 13, 2021

@prakrutisingh24 In this PR, we are just computing the cross val score when the model is set up for the first time, and simply printing the CV score. What we need is:

  • When any scoring happens, especially if it is triggered by the user through an ?_action=train or ?_action=score - the result should be cross validated score
  • By default, cross validation should happen, but if the user wants, they should be able to turn it off through gramex.yaml, and get scores on the entire training dataset
  • sklearn's cross_val_scoire contains a parameter, cv, that can be set in many different ways. Check the docs for details. MLHandler users should be able to exercise all those options.

Thanks,

@@ -40,6 +44,8 @@
'nums': [],
'cats': [],
'target_col': None,
'CV': True,
'CVargs': []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a single argument, cv, which can take any value, i.e in gramex.yaml, users should be able to write any of the following.

cv: false   # disable cross validation
cv: 5        # Use 5 folds
cv:
  cv: 8   # Use 8 folds
  n_jobs: -1  # with an optional other parameter.

@@ -20,6 +20,9 @@
from slugify import slugify
from tornado.gen import coroutine
from tornado.web import HTTPError
from sklearn.metrics import get_scorer
from sklearn.model_selection import cross_val_predict, cross_val_score
from sklearn.model_selection import cross_val_predict, cross_val_score
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra line is unnecessary.

# cross validation
print('yayyy we are here')
cls.CrossValidation(train,target)
print('should have printed')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the prints.

Copy link
Member

@jaidevd jaidevd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The training is happening in def _fit. Cross validation should also happen there.

@@ -20,6 +20,9 @@
from slugify import slugify
from tornado.gen import coroutine
from tornado.web import HTTPError
from sklearn.metrics import get_scorer
from sklearn.model_selection import cross_val_predict, cross_val_score
from ast import literal_eval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be required.

@@ -40,6 +43,7 @@
'nums': [],
'cats': [],
'target_col': None,
'CV': True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it lowercase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to support three cases for the cv option:

  • If the user sets cv: false - then no cross validation happens
  • If the user sets cv: 4 (or some other integer) pass it straight to cross_val_score
  • The default should be cv: None, and in this case, the user should not have to write anything in gramex.yaml

# train the model
target = data[target_col]
train = data[[c for c in data if c != target_col]]
# cross validation
cls.CrossValidation(train,target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it lowercase.

mclass = model_kwargs.get('class', False)
if mclass:
model = search_modelclass(mclass)(**model_kwargs.get('params', {}))
return model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not required.

if CV:
CVscore = cross_val_score(mod, X=train, y=target, **literal_eval(json.dumps(CV)))
CVavg = sum(CVscore)/len(CVscore)
print('Cross Validation Score : ',CVavg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CV should take place within the train method only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if cv:
    cvscore = cross_val_score(mod, X=train, y=target, cv=cv)
else:
   # Do the usual .fit

# train the model
target = data[target_col]
train = data[[c for c in data if c != target_col]]
# cross validation
cls.cross_validation(train,target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants