# Task

This data was taken from an actual production line near Detroit, Michigan. The goal is to predict certain properties of the line's output from the various input data. The line is a high-speed, continuous manufacturing process with parallel and series stages.

- Content

The data comes from one production run spanning several hours. Liveline Technologies has a large quantity of this type of data from multiple production lines in various locations.

- Challenge

The data comes from a multi-stage continuous flow manufacturing process. In the first stage, Machines 1, 2, and 3 operate in parallel, and feed their outputs into a step that combines the flows. Output from the combiner is measured in 15 locations surrounding the outer surface of the material exiting the combiner.

Primary Goal: Predict measurements of output from first stage.

- Acknowledgements

The Liveline team would like to thank all the technicians and production personnel who assisted with the runs and data collection.

- Data description

Description of physical setup:		
The data comes from a continuous flow process.		
Sample rate is 1 Hz.		
In the first stage, Machines 1, 2,  and 3 operate in parallel, and feed their outputs into a step that combines the flows.		
Output from the combiner is measured in 15 locations. These measurements are the primary measurements to predict.		
Next, the output flows into a second stage, where Machines 4 and 5 process in series.		
Measurements are made again in the same 15 locations. These are the secondary measurements to predict.		
		
Measurements are noisy.		
Each measurement also has a target or Setpoint (setpoints are included in the first row of data).		
The goal is to predict the measurements (or the error versus setpoints) for as many of the 15 measurements as possible.		
Some measurements will be more predictable than others!		
Prediction of measurements after the first stage are the primary interest.		
Prediction of measurements after the second stage are nice-to-have but the data is much more noisy.		
		
Note on variable naming conventions		
~.C.Setpoint		Setpoint for Controlled variable
~.C.Actual		Actual value of Controlled variable
~.U.Actual		Actual value of Uncontrolled variable
Others		Environmental or raw material variables, States / events, etc.
		
Start col	End col	Description

0~0	Time stamp

1~2	Factory ambient conditions

3~6	First stage, Machine 1, raw material properties (material going in to Machine 1)

7~14	First stage, Machine 1 process variables

15~18	First stage, Machine 2, raw material properties (material going in to Machine 2)

19~26	First stage, Machine 2 process variables

27~30	First stage, Machine 3, raw material properties (material going in to Machine 3)

31~38	First stage, Machine 3 process variables

39~41	Combiner stage process parameters. Here we combines the outputs from Machines 1, 2, and 3.

42~71	PRIMARY OUTPUT TO CONTROL: Measurements of 15 features (in mm), along with setpoint or target for each

72~78	Second stage, Machine 4 process variables

79~85	Second stage, Machine 5 process variables

86~115	SECONDARY OUTPUT TO CONTROL: Measurements of 15 features (in mm), along with setpoint or target for each
		
		
		
		


In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('./data/continuous_factory_process.csv')

In [10]:
df.iloc[1000:1005]

Unnamed: 0,time_stamp,AmbientConditions.AmbientHumidity.U.Actual,AmbientConditions.AmbientTemperature.U.Actual,Machine1.RawMaterial.Property1,Machine1.RawMaterial.Property2,Machine1.RawMaterial.Property3,Machine1.RawMaterial.Property4,Machine1.RawMaterialFeederParameter.U.Actual,Machine1.Zone1Temperature.C.Actual,Machine1.Zone2Temperature.C.Actual,...,Stage2.Output.Measurement10.U.Actual,Stage2.Output.Measurement10.U.Setpoint,Stage2.Output.Measurement11.U.Actual,Stage2.Output.Measurement11.U.Setpoint,Stage2.Output.Measurement12.U.Actual,Stage2.Output.Measurement12.U.Setpoint,Stage2.Output.Measurement13.U.Actual,Stage2.Output.Measurement13.U.Setpoint,Stage2.Output.Measurement14.U.Actual,Stage2.Output.Measurement14.U.Setpoint
1000,2019-03-06 11:09:11,16.73,23.93,11.54,200,963.0,247,1249.33,72.1,71.8,...,7.92,7.93,5.66,5.65,2.06,1.85,3.39,2.89,8.19,11.71
1001,2019-03-06 11:09:12,16.73,23.93,11.54,200,963.0,247,1258.85,72.1,71.7,...,8.02,7.93,5.71,5.65,2.12,1.85,3.42,2.89,8.05,11.71
1002,2019-03-06 11:09:13,16.73,23.93,11.54,200,963.0,247,1273.09,72.1,71.7,...,7.83,7.93,5.82,5.65,2.1,1.85,3.41,2.89,0.0,11.71
1003,2019-03-06 11:09:14,16.73,23.93,11.54,200,963.0,247,1297.11,72.1,71.7,...,7.83,7.93,5.82,5.65,2.1,1.85,3.41,2.89,0.0,11.71
1004,2019-03-06 11:09:15,16.73,23.93,11.54,200,963.0,247,1274.94,72.1,71.7,...,7.88,7.93,5.76,5.65,2.11,1.85,3.4,2.89,10.97,11.71


We can see from the data and the previous information. Each row in the table represents a sample taken on the device, one sample taken every second. Each sample includes the material fed into the machine, the sensor value corresponding to the moment of data collection in the machine, and the measured value of the product. The goal is to predict measured values from material and machine perceptron values. However, there is a problem here.

This is also one of the most common problems in the quality control of manufacturing products, that is, the correspondence between influencing factors and quality.

The information collected on the machine at the same time is not specific to the same product. A product can only appear in one position in the line at a given moment in the production process. In other words, none of the other values collected outside the external device are descriptions for the product, and in principle have nothing to do with the quality of the product. And the lack of a description of the pipeline speed in the documentation prevents manual alignment.

Because of the lack of such alignment, intelligence can make overall quality statistics for this data set, but there is no way to predict product quality based on the data of the production process.