GPU usage #3

vicbach · 2021-02-26T19:51:46Z

Dear Manu,

I've settled the whole model and sent it to our "super calculator" but one error remains. It is related to the number of GPU requested. Here is the error:

Multi-Target Regression: using the first target(ds_serv_int) to encode the categorical columns
/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/category_encoders/utils.py:21: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
  elif pd.api.types.is_categorical(cols):
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
Traceback (most recent call last):
  File "/gpfs/workdir/bachv/Notebooks/DL_DS_Poitiers/poitiers_dureeSejour_DL.py", line 168, in <module>
    tabular_model.fit(train=train, 
  File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_tabular/tabular_model.py", line 440, in fit
    train_loader, val_loader = self._pre_fit(
  File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_tabular/tabular_model.py", line 389, in _pre_fit
    self._prepare_trainer(max_epochs, min_epochs)
  File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_tabular/tabular_model.py", line 328, in _prepare_trainer
    self.trainer = pl.Trainer(
  File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 41, in overwrite_by_env_vars
    return fn(self, **kwargs)
  File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 333, in __init__
    self.accelerator_connector.on_trainer_init(
  File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator_connector.py", line 111, in on_trainer_init
    self.trainer.data_parallel_device_ids = device_parser.parse_gpu_ids(self.trainer.gpus)
  File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_lightning/utilities/device_parser.py", line 76, in parse_gpu_ids
    gpus = _sanitize_gpu_ids(gpus)
  File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_lightning/utilities/device_parser.py", line 134, in _sanitize_gpu_ids
    raise MisconfigurationException(f"""
pytorch_lightning.utilities.exceptions.MisconfigurationException: 
                You requested GPUs: [0]
                But your machine only has: []

Attached my code

Thank you in advance for any hint you might have!

Vérifier que les modules suivants sont bien installés

conda install pytorch torchvision -c pytorch


pip install pytorch_tabular[all]
ou
git clone git://github.com/manujosephv/pytorch_tabular
+
python setup.py install


pip install torch_optimizer  #N'existe pas sur conda

conda install -c conda-forge scikit-learn 

conda install -c conda-forge pandas

conda install -c conda-forge seaborn 

conda install -c conda-forge numpy 

conda install -c conda-forge matplotlib

### Import des librairies utiles

#PyTorch Tabular
from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig, NodeConfig
from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig, ExperimentConfig
from torch_optimizer import QHAdam


#Scikit Learn
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import PowerTransformer

#other
import random
import numpy as np
import pandas as pd
import os
import sys




'''
Attention : voir si nécessaire d'enlever les durées d'1 min dans un service (erreur ?) + date de sortie ultérieure à la date d'extraction (réfléchir comment faire)
'''




### Utilisation de PyTorch Tabular

''' Source https://pytorch-tabular.readthedocs.io/en/latest/ '''


## Fonction d'utilité

def print_metrics(y_true, y_pred, tag):
    if isinstance(y_true, pd.DataFrame) or isinstance(y_true, pd.Series):
        y_true = y_true.values
    if isinstance(y_pred, pd.DataFrame) or isinstance(y_pred, pd.Series):
        y_pred = y_pred.values
    if y_true.ndim>1:
        y_true=y_true.ravel()
    if y_pred.ndim>1:
        y_pred=y_pred.ravel()
    val_acc = mean_squared_error(y_true, y_pred)
    val_f1 = mean_absolute_error(y_true, y_pred)
    print(f"{tag} MSE: {val_acc} | {tag} MAE: {val_f1}")



## Préparation des données 

from preprocessing_poitiers_dureeSejour import profilage_sejour

'''On étudie ici la catégorie 1, afin de déterminer si son analyse est suffisante pour avoir des résultats probants, sinon étude des catégories suivantes '''

data = profilage_sejour(1) # Catégorie large de diagnostics (niveau 1)

list_columns = list(data.columns)
target_cols = ['ds_serv_int', 'ds_tot_int']   #colonnes que l'on cherche à prédire
cat_col_names = ['id_service']             #a confirmer 
date_col_names = ['date_debut','date_entree_service','date_sortie_service']
col_not_num = cat_col_names + date_col_names + target_cols 
num_col_names = [x for x in list_columns if x not in col_not_num]

date_col_list = [('date_debut','T'),('date_entree_service','T'),('date_sortie_service','T')]

train, test = train_test_split(data, random_state=42)
train, val = train_test_split(train, random_state=42)


'''Comme nous allons utiliser un taux d'apprentissage plus élevé que sur l'usage basique de PyTorch Tabular, nous augmentons le nombre d'epochs (de 20 à 50) '''

batch_size = 512
steps_per_epoch = int(train.shape[0]/batch_size)
epochs = 50


## Configurations 

# Configuration des données
data_config = DataConfig(
    target=target_cols, #target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
    continuous_cols=num_col_names,
    categorical_cols=cat_col_names,
    date_columns= date_col_list,
    encode_date_columns = True, 
    validation_split = 0.2,         #80% Train + test 20% validation
    continuous_feature_transform="quantile_normal",

)

# Configuration de l'entrainement
trainer_config = TrainerConfig(
    auto_lr_find=False, # A voir si pertinent ?
    batch_size=batch_size,
    max_epochs=epochs,
    early_stopping=None,        # a voir si utile ?
    accumulate_grad_batches=2,
    gpus=1, #index du GPU à utiliser. 0 indique un CPU
)


# Configuration de l'optimisation du taux d'apprentissage
optimizer_config = OptimizerConfig(
    lr_scheduler="OneCycleLR",  #Politique du taux d'apprentissage à un cycle PyTorch (changeant à chaque batch) 
    lr_scheduler_params={"max_lr":2e-3,     #taux d'apprentissage maximal dans le cycle
        "epochs": epochs, 
        "steps_per_epoch":steps_per_epoch}
)




# Configuration du modèle 
''' ici NODE - source : "Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data" - 09/2020 - https://arxiv.org/abs/1909.06312 '''


model_config = NodeConfig(
    task="regression",
    num_layers=2, # Nombre de couches denses
    num_trees=1024, # Nombre d'arbres dans chaque couche
    depth=5, # Profondeur de chaque arbre
    embed_categorical=False, #If True, will use a learned embedding, else it will use LeaveOneOutEncoding for categorical columns
    learning_rate = 1e-3,
    target_range=None
)

# Utilisation de Pytoch Tabular
tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
)


## Entrainement du modèle 

tabular_model.fit(train=train, 
                  validation=val, 
                  target_transform=PowerTransformer(method="yeo-johnson"), 
                  optimizer=QHAdam,         #Quasi-Hyperbolic Adam (voir rapport - https://paperswithcode.com/method/qhadam)
                  optimizer_params={"nus": (0.7, 1.0), "betas": (0.95, 0.998)})


## Résultats 

result = tabular_model.evaluate(test)   # Evaluation du df en utilisant la loss et les metriques paramétrées dans la configuration

pred_df = tabular_model.predict(test)
pred_df.head()

print("Durée de séjour par service")
print_metrics(test['ds_serv_int'], pred_df["ds_serv_int_prediction"], tag="Holdout")
print("Durée de séjour totale")
print_metrics(test['ds_tot_int'], pred_df["ds_tot_int_prediction"], tag="Holdout")




## Sauvegarde du modèle 

model_folder = os.path.join( "/Users", "victoire", "CodingProjects", "ML_Hopia", "Projet3A", "Models") #a changer selon le cas 

tabular_model.TabularModel.save_model(model_folder)


## Utilisation du modèle sur de nouvelles données 

''' 
new_data =

tabular_model.TabularModel.predict(self, new_data)

'''


## Chargement du modèle sauvegardé

'''
model_folder = os.path.join( "/Users", "victoire", "CodingProjects", "ML_Hopia", "Projet3A", "Models") #a changer selon le cas 

tabular_model.TabularModel.load_from_checkpoint(model_folder)

manujosephv · 2021-02-27T03:23:32Z

If you are trying to use GPU, have you installed the cuda version of PyTorch?

GPU available: False, used: False
TPU available: False, using: 0 TPU cores

Shows that PyTorch has not recognized the GPU and then when you try to use it, it says no such GPU.

Can you check the output of this and get back?

torch.cuda.is_available()

vicbach · 2021-02-27T19:50:33Z

Thanks for your quick reply! Everything's fine regarding cuda. However, one remains :

Generating Predictions...: 100%|██████████| 2/2 [00:00<00:00, 15.82it/s]
Traceback (most recent call last):
File "/gpfs/workdir/bachv/Notebooks/DL_DS_Poitiers/poitiers_dureeSejour_DL.py", line 196, in
tabular_model.save_model(model_folder)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_tabular/tabular_model.py", line 642, in save_model
joblib.dump(self.callbacks, os.path.join(dir, "callbacks.sav"))
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 480, in dump
NumpyPickler(f, protocol=protocol).dump(value)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 487, in dump
self.save(obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 931, in save_list
self._batch_appends(obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 955, in _batch_appends
save(x)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 717, in save_reduce
save(state)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 692, in save_reduce
save(args)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 886, in save_tuple
save(element)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 717, in save_reduce
save(state)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 717, in save_reduce
save(state)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 713, in save_reduce
self._batch_setitems(dictitems)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 717, in save_reduce
save(state)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 713, in save_reduce
self._batch_setitems(dictitems)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 717, in save_reduce
save(state)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 282, in save
return Pickler.save(self, obj)
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/pickle.py", line 1070, in save_global
raise PicklingError(
_pickle.PicklingError: Can't pickle <function at 0x2b1c5532d940>: it's not found as pytorch_tabular.models.node.utils.

Here's the source code for saving files (a bit changed compared to yesterday) :

model_folder = os.path.join( "/gpfs","users","bachv", "workdir", "bachv","Notebooks","DL_DS_Poitiers", "Trained_Model", "")
tabular_model.save_model(model_folder)

manujosephv · 2021-03-01T06:41:44Z

This is a little weird. Can you send me the config that you have set to train the model?

vicbach · 2021-03-02T10:28:31Z

Sure ! Please find it attached Le 01.03.2021 à 07:41, manujosephv <notifications@github.com<mailto:notifications@github.com>> a écrit : This is a little weird. Can you send me the config that you have set to train the model? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AP4IAIGPVLWE74FZBZEKIPTTBMZLLANCNFSM4YJAOFRA>.

manujosephv · 2021-03-03T09:06:03Z

I'm not sure the attachment came through. Can you share the config directly in text?

manujosephv · 2021-03-15T01:02:33Z

I hope the issue has been resolved. Closing the issue. Feel free to reopen or open a new one if you still have problems.

vicbach · 2021-03-23T22:32:53Z

Dear Manu, Please forgive me for that extremely long delay… I’ve been totally overwhelmed by other aspects of our startups and couldn’t focus on the DL code recently. I’m getting back to it, the code runs! I need to make some adjustments and might come back to you to keep talking about this great framework! Many thanks Best Victoire Bach CentraleSupélec & ESCP Student +33 6 23 47 16 54 Le 15.03.2021 à 02:02, manujosephv ***@***.******@***.***>> a écrit : I hope the issue has been resolved. Closing the issue. Feel free to reopen or open a new one if you still have problems. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AP4IAICZSB2SB6E7BX7E5J3TDVMDPANCNFSM4YJAOFRA>.

manujosephv added the good first issue Good for newcomers label Mar 1, 2021

manujosephv closed this as completed Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU usage #3

GPU usage #3

vicbach commented Feb 26, 2021 •

edited by manujosephv

manujosephv commented Feb 27, 2021

vicbach commented Feb 27, 2021

manujosephv commented Mar 1, 2021

vicbach commented Mar 2, 2021 via email

manujosephv commented Mar 3, 2021

manujosephv commented Mar 15, 2021

vicbach commented Mar 23, 2021 via email

GPU usage #3

GPU usage #3

Comments

vicbach commented Feb 26, 2021 • edited by manujosephv

manujosephv commented Feb 27, 2021

vicbach commented Feb 27, 2021

manujosephv commented Mar 1, 2021

vicbach commented Mar 2, 2021 via email

manujosephv commented Mar 3, 2021

manujosephv commented Mar 15, 2021

vicbach commented Mar 23, 2021 via email

vicbach commented Feb 26, 2021 •

edited by manujosephv