Many of the machines, engines and mobile devices today are equipped with multiple sensors that constantly provide a lot of valuable data. This data can often help predict the machine's Remaining Useful Life (RUL), thus preventing unexpected and unwanted breakdowns and service interruptions. The large amount of acquired data and the sometimes quite complicated correlation between it and the failure of the machine prompted the use of machine learning to investigate this condition and sound the alarm in good time to plan the necessary maintenance work. Implementation of classic machine learning classifiers like: Gaussian Naïve Bayes, Bernoulli Naïve Bayes, K Neighbors, Logistic Regression, Gradient Boosting, Random Forest, XG Boost, or Support Vector Classifier can in many cases deliver a very good result that indicates an increased probability of failure for certain sensor values. However, these algorithms will not detect significant changes in sensor data over time, which is crucial for predicting the state of the machine in the future. These algorithms have no "time memory" that would allow time trends to be first learned and then decoded. A good candidate with this function is long-term memory (LSTM). Without going into detail, the LSTM model adds an additional temporal "layer" to previous classic models. In our case, this so-called memory is used to learn and then identify trends in time series. However, this powerful advantage requires even better data cleaning and pre-processing. Most of this prep work is to ensure data consistency. To train a well-performing model, you need to review and annotate the data for a specific period of time. For repetitive events, and sample arriving with the different time stamp the rule of thumb is to follow the Nyquist rate, as is the case with the sampling process. If the event is not repeatable, the number of samples should be long enough to capture at least some "breakdown events". In the case of predictive maintenance, unbalanced data is one of the most difficult conditions because you get very little information about failed events. Most samples, sometimes 99.99%, are OK samples. Another important challenge is to choose the right "observation window". This sliding window should be long enough to understand and decode the trend, but not too large to avoid overfitting. In my study, I analyzed highly unbalanced machine telemetry datasets. The correct cleaning data set and implementation of the LSTM improved Precision and Recall from 80% when implementing classic models to 98–99% with LSTM.
rafalfirlejczyk/Predictive_Maintenance_LSTM
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|