Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
Fastai Tabular Paperspace.ipynb
Flairbit Challenge - Meetup Slides.pdf
LGBM Tabular 05_20_Fault_Clean_Count.ipynb

DataScienceSeed Logo

Dataset Hands-On Challenge

Marcello's working notebook on the Flairbit Challenge presented on the February 7 2019 event:

Dataset details and download instructions are at the link above, in the Flairbit section of the event.

The dataset

  • Data related to professional coffe machines
  • Dataset categories:
    • Counters
    • Cleanings
    • Faults
  • One file per category per day *type_YYYYMMDDHHMMSS-an.csv (e.g., faults_20190103020001-an.csv)
  • Common dataset feature
    • Machine serial number
    • Machine model
    • Timestamp (YYYY MM DD hh:mm:ss and week number)


Serial YYYY MM dd hh:mm:ss Week Model LabelCounter AbsoluteCounter RelativeCounter
1535632 2016 11 29 04:00:07 49 model 31 numcaffegenerale 179112 0
1535632 2016 11 29 05:00:07 49 model 31 numcaffegenerale 179120 8
1535632 2016 11 29 06:00:07 49 model 31 numcaffegenerale 179158 38


Serial YYYY MM dd hh:mm:ss Week Model Errorcode
1535632 2016 11 29 04:00:07 49 model 10 1
1535632 2016 11 29 05:00:07 49 model 9 1
1535632 2016 11 29 06:00:07 49 model 31 1


Serial YYYY MM dd hh:mm:ss Week Model Errorcode Critical
1535632 2016 11 29 04:00:07 49 model 62 185 WARNING
1535632 2016 11 29 05:00:07 49 model 20 185 WARNING
1535632 2016 11 29 06:00:07 49 model 20 185 WARNING

Warm up: Let’s query the CSV files

  • How many connected machines?
  • Counters types
  • Faults distribution per model
  • Cleanings misses distribution per model



  • Predict faults occurrences based on counters patterns and cleaning misses

Root cause analysis

  • Find correlations between machine usage (counters and cleanings misses) and faults


The code in the repo is in the format of three Jupyter Notebooks

  • Exploration.ipynb : first file to look at, it includes data esploration of the dataset (please note that the dataset itslef is not in the repo! you have to look into the dataset presentation here and find the slide with the link to a dropbox folder. The reason for this is to spread knowledge of the DataScienceSeed community and events. This notebook also takes care of generating intermediate dataset files (the .csv files in the repo) to be used for the forecast.
  • LGBM Tabular 03_01_Fault_Clean_Count.ipynb : failure forecast based on LightGBM algorithm. There is also some data rearranging to play with different time windows of data in the features side and in the target side. The results are fairly good: the models can predict with 85% of f1 score a failure of a machine in the next 5 days. There is also some feature interpretation based on SHAP.
  • Fastai Tabular Paperspace.ipynb : failure forecast based on deep learning, using tabular_learner. I attempted to apply what is described in Lessno4 of MOOC 2018 edition. The results are worst than LGBM, but this is just a dumb attempt!


Thanks to Flairbit for providing this dataset to the DataScienceSeed community! This repository has no commercial purpose it is provided for educational purposes alone.


You can’t perform that action at this time.