# Predictive Mainatinance - Obtaining the Data (1 of 4)

### Environment Setup

In [1]:
import os

### Overview:

Businesses that use equipments to operate efficiently require a reliable uptime on these equipment. This leading to a need to predict whether an equipment might fail in the near future due to a failure of a certain component. 

Predictive maintenance is a proactive maintenance strategy that tries to predict when a piece of equipment might fail so that maintenance work can be performed just before equipment failure occurs saving businesses massive operational costs while achieve high asset utilization.

This series of notebooks follow in the steps of implementing a predictive maintenance model found in the [Predictive Maintenance Modelling Guide](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Implementation-Guide-1) collection in the [Cortana Intelligence Gallery](https://gallery.cortanaintelligence.com/).

This is a Python adaptation of a [notebook](https://gallery.azure.ai/Notebook/Predictive-Maintenance-Modelling-Guide-Python-Notebook-1) originally created by [Mary Wahl](https://gallery.azure.ai/Home/Author?authorId=F617D473ACF16BEAC5242358F5BE2DF6DFCF35983A6B700F9BAE26DE20EB2F08) with notable extensions to fit the new panda, numpy APIs and some more work on the Exploratory Dataset Analysis.
 

### About the Dataset
The datasets used here are meant for predictive maintenance use case from the energy industry. The input data are simulated (and simplified) telemetry and machine log readings from a hydropower turbine.

Common data sources for predictive maintenance problems are:

- Failure history: The failure history of a machine or component within the machine.
- Maintenance history: The repair history of a machine, e.g. error codes, previous maintenance activities or component replacements.
- Machine conditions and usage: The operating conditions of a machine e.g. data collected from sensors.
- Machine features: The features of a machine, e.g. engine size, make and model, location.
- Operator features: The features of the operator, e.g. gender, past experience

### Data Sources
The data used in this execrcise comes from 5 different sources which are:

- real-time telemetry data collected from machines
- error messages
- failure history
- machine information such as type and age.
- historical maintenance

In [2]:
!wget https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_telemetry.csv -O data/telemetry.csv
!wget https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_errors.csv -O data/errors.csv
!wget https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_maint.csv -O data/maint.csv
!wget https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_failures.csv -O data/failures.csv
!wget https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_machines.csv -O data/machines.csv

--2019-05-04 07:23:06--  https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_telemetry.csv
Resolving azuremlsampleexperiments.blob.core.windows.net (azuremlsampleexperiments.blob.core.windows.net)... 13.65.107.32
Connecting to azuremlsampleexperiments.blob.core.windows.net (azuremlsampleexperiments.blob.core.windows.net)|13.65.107.32|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 80142329 (76M) [application/octet-stream]
Saving to: ‘data/telemetry.csv’


2019-05-04 07:25:39 (514 KB/s) - ‘data/telemetry.csv’ saved [80142329/80142329]

--2019-05-04 07:25:39--  https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_errors.csv
Resolving azuremlsampleexperiments.blob.core.windows.net (azuremlsampleexperiments.blob.core.windows.net)... 13.65.107.32
Connecting to azuremlsampleexperiments.blob.core.windows.net (azuremlsampleexperiments.blob.core.windows.net)|13.65.107.32|:443... connected.
HTTP request sent, awaiting response... 200 OK
Le

In [3]:
#verify the dowloaded files
print(os.listdir('data'))

['errors.csv', 'maint.csv', 'telemetry.csv', 'machines.csv', 'failures.csv']
