### Task:

**dataset: https://archive.ics.uci.edu/ml/datasets/AI4I+2020+Predictive+Maintenance+Dataset**

1. Load the data: "ai4i2020.csv"
2. Profiling of the data.
3. Write your analysis of the data.
4. If there are NAN vals, Imputation to be done.
5. If the data ain't normal, then handle it.
6. Check multicollinearity.
7. Build the model and save it. (Regress against Air tempearature)
8. Compute Model accuracy.
9. 10 test cases and create a final report.

In [1]:
import pandas as pd
import numpy as np
from pandas_profiling import ProfileReport
import os

In [2]:
pwd

'D:\\iNeuron\\12. Machine Learning'

In [3]:
os.chdir("files used or created in the process")

In [4]:
os.listdir()

['advertising.csv',
 'Advertizing_report.html',
 'ai4i2020.csv',
 'ai4i2020.log',
 'ai4i__test_cases.csv',
 'Boxcoxed_ai4i.csv',
 'myfirstmodel.sav',
 'predictive_maintenance.sav',
 'predictive_maintenance_report.html',
 'standardized_ai4i.csv']

In [5]:
### 1. Loading the data:

df = pd.read_csv("ai4i2020.csv")
df.head()

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,0,0,0,0,0
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,0,0,0,0,0
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,0,0,0,0,0
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,0,0,0,0,0
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,0,0,0,0,0


In [6]:
try:
    df.rename(columns = {'UDI': 'UID'}, inplace=True)
    df.set_index('UID', inplace=True)

except Excetion as e:
    print(e)

In [7]:
df.head()

Unnamed: 0_level_0,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,M14860,M,298.1,308.6,1551,42.8,0,0,0,0,0,0,0
2,L47181,L,298.2,308.7,1408,46.3,3,0,0,0,0,0,0
3,L47182,L,298.1,308.5,1498,49.4,5,0,0,0,0,0,0
4,L47183,L,298.2,308.6,1433,39.5,7,0,0,0,0,0,0
5,L47184,L,298.2,308.7,1408,40.0,9,0,0,0,0,0,0


In [8]:
### 2. Profiling the data:

pf_report = ProfileReport(df)

In [9]:
## daving the report genrated locally:

pf_report.to_file("predictive_maintenance_report.html")

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [10]:
pf_report.to_widgets()

Render widgets:   0%|          | 0/1 [00:00<?, ?it/s]

VBox(children=(Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(valu…

In [11]:
### 3. My Analysis:

1. - We have 14 distinct variables in this dataset i.e. 14 feature columns and any one of them can be our label column as per the desire of our client.
   - There's not a single record containing NaN vals.
   - Out of our 14 feature variables 8 are of categorical datatype and 6 are of numerical.
   - Decent dataset, as we're having 10,000 records.
   - Each record is an unique one since there are no duplicate rows.
<br><br>
2. - Air Temperature and Process Temperature somewhat follows normal distribution.
   - The distribution of Rotational speed is right skewed i.e. most of its vals falls right to the peak of the graph.
   - The Torque feature does follows Normal distribution.
   - Tool Wear doesn't follow normal distribution.
<br><br>
3. Correlations: (i.e. if one entity increases, the other one increases or decreases accordingly.)<br>
    - Air temp. and Process temp. are positively correlated, it seems.
    - Change in Air temp. doesn't have any effect on rotational speed and Torque whatsoever.
    - Process temp. is not at all correlated with rotational speed and torque.
    - Rotational speed is negatively correlated with Torque and vice-versa.
    - Machine failure is positively correlated w HDF, PWF and OSF.
    - HDF (Heat Dissipation Failure) is positively correlated with Machine Failure.
    - PWF (Power failure) is positively correlated with Machine failure.
    - OSF (Overstrain failure) is positively correlated with Machine failure.
    <br>**Note: As mentioned in the dataset description that if at least one of the above(HDF, PWF and OSF) failure modes is true, the process fails and the 'machine failure' label is set to 1. 
       It is thus, therefore not transparent to the machine learning method, which of the failure modes has caused the process to fail.**

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10000 entries, 1 to 10000
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Product ID               10000 non-null  object 
 1   Type                     10000 non-null  object 
 2   Air temperature [K]      10000 non-null  float64
 3   Process temperature [K]  10000 non-null  float64
 4   Rotational speed [rpm]   10000 non-null  int64  
 5   Torque [Nm]              10000 non-null  float64
 6   Tool wear [min]          10000 non-null  int64  
 7   Machine failure          10000 non-null  int64  
 8   TWF                      10000 non-null  int64  
 9   HDF                      10000 non-null  int64  
 10  PWF                      10000 non-null  int64  
 11  OSF                      10000 non-null  int64  
 12  RNF                      10000 non-null  int64  
dtypes: float64(3), int64(8), object(2)
memory usage: 1.1+ MB


In [13]:
### 4. If there are NAN vals, Imputation to be done:

In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation".
<br>**As there isn't even a single NaN val in the whole dataset, Imputation needn't be done.**

In [14]:
### 5. If the data isn't normal, then handle it.

**Refer: https://www.geeksforgeeks.org/box-cox-transformation-using-python/**

As per my understanding, we are obliged to convert the non-normal columns into normal distribution for better model optimization and we can very well do that by **Box-Cox Transform**.

**Box-Cox Transformation** only cares about computing the value of **lambda**  which varies from – 5 to 5. A value of **lambda**  is said to be best if it is able to approximate the non-normal curve to a normal curve.

From our fore analysis of the data, the distribution of our columns clearily indicates that the following columns do not follow Normal distribution:

- **Rotational speed** 
- **Tool wear**

Let's make them follow Normal-distribution.

In [15]:
from scipy import stats

In [16]:
# Creating a deep copy of the orginal datset so that changes we're about to make won't be reflected in the orginal.

df1 = df.copy(deep=True)
df1.head()

Unnamed: 0_level_0,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,M14860,M,298.1,308.6,1551,42.8,0,0,0,0,0,0,0
2,L47181,L,298.2,308.7,1408,46.3,3,0,0,0,0,0,0
3,L47182,L,298.1,308.5,1498,49.4,5,0,0,0,0,0,0
4,L47183,L,298.2,308.6,1433,39.5,7,0,0,0,0,0,0
5,L47184,L,298.2,308.7,1408,40.0,9,0,0,0,0,0,0


In [17]:
# transforming the non-normal distribution Rotational Speed

df1['Rotational speed [rpm]'], lamb1 = stats.boxcox(df['Rotational speed [rpm]'])
print(lamb1)

-3.162410421918763


In [18]:
# transforming the non-normal distribution Tool wear

try:
    df1['Tool wear [min]'], lamb4 = stats.boxcox(df['Tool wear [min]'])
except Exception as e:
    print(e)

Data must be positive.


**Finding:** It seems we won't be able to transform to apply Box-cox to **df['Tool waer [min]']** as it's containing non-postive values.<br><br>**For better approximation I will impute the non-postive values by the median of this very column so that this column too can be Box-coxed**

In [19]:
# Calculating the median of the Tool wear column for imputation:

tool_wear_median = df1['Tool wear [min]'].median()
tool_wear_median

108.0

In [20]:
to_be_imputed = df1[df1['Tool wear [min]'] <= 0].index
to_be_imputed

Int64Index([   1,   79,  163,  251,  333,  419,  504,  594,  675,  763,
            ...
            9259, 9341, 9416, 9498, 9578, 9673, 9760, 9835, 9909, 9990],
           dtype='int64', name='UID', length=120)

In [21]:
df1['Tool wear [min]'].mask(df1['Tool wear [min]'] <=0, tool_wear_median, inplace=True)

In [22]:
df1

# As we can see all non-postive vals are replaced by the median of df['Tool wear [min]'], we're good to go

Unnamed: 0_level_0,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,M14860,M,298.1,308.6,0.316214,42.8,108,0,0,0,0,0,0
2,L47181,L,298.2,308.7,0.316214,46.3,3,0,0,0,0,0,0
3,L47182,L,298.1,308.5,0.316214,49.4,5,0,0,0,0,0,0
4,L47183,L,298.2,308.6,0.316214,39.5,7,0,0,0,0,0,0
5,L47184,L,298.2,308.7,0.316214,40.0,9,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9996,M24855,M,298.8,308.4,0.316214,29.5,14,0,0,0,0,0,0
9997,H39410,H,298.9,308.4,0.316214,31.8,17,0,0,0,0,0,0
9998,M24857,M,299.0,308.6,0.316214,33.4,22,0,0,0,0,0,0
9999,H39412,H,299.0,308.7,0.316214,48.5,25,0,0,0,0,0,0


In [23]:
### Finally, transforming the tool wear distribution

df1['Tool wear [min]'], lamb2 = stats.boxcox(df1['Tool wear [min]'])
print(lamb1)

-3.162410421918763


In [24]:
df1

Unnamed: 0_level_0,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,M14860,M,298.1,308.6,0.316214,42.8,37.800123,0,0,0,0,0,0
2,L47181,L,298.2,308.7,0.316214,46.3,1.664740,0,0,0,0,0,0
3,L47182,L,298.1,308.5,0.316214,49.4,3.009166,0,0,0,0,0,0
4,L47183,L,298.2,308.6,0.316214,39.5,4.202091,0,0,0,0,0,0
5,L47184,L,298.2,308.7,0.316214,40.0,5.298889,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9996,M24855,M,298.8,308.4,0.316214,29.5,7.772507,0,0,0,0,0,0
9997,H39410,H,298.9,308.4,0.316214,31.8,9.130594,0,0,0,0,0,0
9998,M24857,M,299.0,308.6,0.316214,33.4,11.249532,0,0,0,0,0,0
9999,H39412,H,299.0,308.7,0.316214,48.5,12.453138,0,0,0,0,0,0


In [25]:
df1.describe()

Unnamed: 0,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,300.00493,310.00556,0.3162145,39.98691,36.464767,0.0339,0.0046,0.0115,0.0095,0.0098,0.0019
std,2.000259,1.483734,8.262172e-12,9.968934,17.043887,0.180981,0.067671,0.106625,0.097009,0.098514,0.04355
min,295.3,305.7,0.3162145,3.8,0.89571,0.0,0.0,0.0,0.0,0.0,0.0
25%,298.3,308.8,0.3162145,33.2,23.177807,0.0,0.0,0.0,0.0,0.0,0.0
50%,300.1,310.1,0.3162145,40.1,37.800123,0.0,0.0,0.0,0.0,0.0,0.0
75%,301.5,311.1,0.3162145,46.8,50.892932,0.0,0.0,0.0,0.0,0.0,0.0
max,304.5,313.8,0.3162145,76.6,70.386199,1.0,1.0,1.0,1.0,1.0,1.0


In [26]:
df1.to_csv("Boxcoxed_ai4i.csv")

In [27]:
ProfileReport(df1).to_widgets()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render widgets:   0%|          | 0/1 [00:00<?, ?it/s]

VBox(children=(Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(valu…

At last, all non-normal data is handled.

### 6. Check Multicollinearity

**Feature columns:** Process temperature, Rotational speed, Torque, Tool wear, Machine failure, TWF, PWF, OSF, RNF.

**Label Column:** Air tempearature

### Multicollinearity is a term which refers to the correlation between the feature columns apart from the one that is treated to be as label column.

**So from our analysis of the profiling report, we can conclude that, there is multicollineariy between the following feature columns:**

- Rotational speed and Torque **(negative correlation).**
- Machine faiure and TWF **(positive correlation)**.
- Machine faiure and HDF **(positive correlation)**.
- Machine faiure and PWF **(positive correlation)**.
- Machine faiure and OSF **(positive correlation)**.
- Machine faiure and RNF **(positive correlation)**.

### 7. Build the model and save it. (Regress against the Air temperature)

Since we have to regress against the Air temperature (our response column or label column) against not one but various explanatory variables, we have to use the **Multiple Linear Regression** (that is just an extension of linear regression to several explanatory variables).

In [28]:
df.columns

Index(['Product ID', 'Type', 'Air temperature [K]', 'Process temperature [K]',
       'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]',
       'Machine failure', 'TWF', 'HDF', 'PWF', 'OSF', 'RNF'],
      dtype='object')

**Note:** From our aforedone analysis, we do know that 'Machine failure' is highly postively correlated with 'TWF', 'HDF', 'PWF', 'OSF', 'RNF'.<br>
And it's even emphasized in the dataset description itself if one of the failures from TWF, HDF, PWF, OSF and RNF is true Machine failure is set to 1.<br>
<br>
Thus I'll put a condition in the predict method of the class LModel that if atleast one of the afore-mentioned failures is true, then set Machine failure to 1 i.e. the class LModel's **predict() method** won't be accepting Machine failure feature values.

In [29]:
from sklearn.linear_model import LinearRegression
import pickle
import logging as lg

In [30]:
class LModel:
    """
    This is a class that is specific to build a regression model to regress against the label column 'Air Temperature [K]'
    """
    
    def __init__(self):
        self.lModel = None
    
    def _features(self):
        """
        A protected method specific to pick the feature columns from the datset.
        """
        try:
            x = df[['Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]',
                    'Machine failure', 'TWF', 'HDF', 'PWF', 'OSF', 'RNF']]
            return x
        
        except Exception as e:
            lg.error(e)
            
    def _label(self):
        """
        A protected method specific to select the label column that is to be regressed against.
        """
        try:
            y = df[['Air temperature [K]']]
            return y
        
        except Exception as e:
            lg.error(e)
            
    def build(self):
        """
        A method specific to build the desired model.
        """
        try:
            self.lModel = LinearRegression()
            lg.info("readying the model..")
            modelBuilt = self.lModel.fit(self._features(), self._label())
            lg.info("Model executed succesfully!")

            return modelBuilt

        except Exception as e:
            lg.error(e)
            
    def predict_(self, process_t, rot_speed, torque, tool_wear, twf, hdf, pwf, osf, rnf):
        """
        A method specific to yield prediction results.
        
        Note: If even one of the twf, hdf, pwf, osf and rnf failures is true, then Machine failure will be set to 1.
              For further clarification, refer to the dataset desrciption.
        """
        try:
            machine_failure = 0
            failures = [twf, hdf, pwf, osf, rnf]
            for i in failures:
                if i == 1: # i.e if any of these failures become true
                    machine_failure = 1 # set the machine failure = true
                    break
                    
#             print("Our Model predicts the value of Air temperature to be: (in Kelvins)")
            return float(self.build().predict([[process_t, rot_speed, torque, tool_wear, machine_failure, twf, hdf,
                                                pwf, osf, rnf]]))
        
        except Exception as e:
            lg.error(e)
            
    def accuracy(self):
        """
        This method calculates the accuracy of the built model.
        """
        try:
            accuracy_ = self.lModel.score(self._features(), self._label())* 100
            
            lg.info(f"The model appears to be {accuracy_}% accurate.")
            return f"The model appears to be {round(accuracy_, 3)} % accurate."
            
        except Exception as e:
            lg.error(e)
        
    def save(self):
        """
        The method to save the model locally.
        """
        try:
            pickle.dump(self.lModel, open("predictive_maintenance.sav", 'wb'))
            lg.info("The model is saved sucessfully!")
            
        except Exception as e:
            lg.error(e)

In [31]:
class Log:
    def __init__(self):
        try:
            self.logFile="ai4i2020.log"
            
            # removing the log file if already exists so as not to congest it.
            if os.path.exists(self.logFile):
                os.remove(self.logFile)
            lg.basicConfig(filename=self.logFile, level=lg.INFO, format="%(asctime)s %(levelname)s %(message)s")
            
            # Adding the StreamHandler to record logs in the console.
            self.console_log = lg.StreamHandler()
            
            # setting level to the console log.
            self.console_log.setLevel(lg.INFO) 
            
            # defining format for the console log.
            self.format = lg.Formatter("%(levelname)s %(asctime)s %(message)s")
            self.console_log.setFormatter(self.format) 
            
            # adding handler to the console log.
            lg.getLogger('').addHandler(self.console_log) 
        
        except Exception as e:
            lg.info(e)
            
        else:
            lg.info("Log Class successfully executed!")

In [32]:
Log()
linear_mod = LModel()

INFO 2022-08-22 19:06:18,579 Log Class successfully executed!


In [33]:
linear_mod.build()

INFO 2022-08-22 19:06:18,592 readying the model..
INFO 2022-08-22 19:06:18,761 Model executed succesfully!


LinearRegression()

In [34]:
### Saving the model.

linear_mod.save()

INFO 2022-08-22 19:06:18,788 The model is saved sucessfully!


### 8. Compute the Model Accuracy 

In [35]:
linear_mod.accuracy()

INFO 2022-08-22 19:06:18,826 The model appears to be 77.5666013666106% accurate.


'The model appears to be 77.567 % accurate.'

### 9. 10 test cases and create a final report.

In [36]:
# dataframe report for 10 test cases:

repo = df.iloc[:,3:][0:0].reset_index(drop=True)
repo

Unnamed: 0,Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF


Here I'll generate those 10 test cases in an automated way by using **np.random.randint(df[column].min(), df[column].max()+1)**.<br><br>
In layman terms, I'm generating random datapoints of a specific column by picking the random values (using np.random.randint()) from between it's minimum and maximum values.

In [37]:
def generateAndMap(column, n=10):
    """
    This function generates random datapoints of a specific column of df by picking the random values 
    (using np.random.randint()) from between it's minimum and maximum values.
    And further map those datapoints to the same column of test cases report "repo".
    
    By default 10 number of datapoints will be generated.
    """
    l = []
    for i in range(n):
        x = np.random.randint(df[column].min(), df[column].max()+1)
        l.append(x)
    
    repo[column] = pd.Series(l)

In [38]:
df.columns

Index(['Product ID', 'Type', 'Air temperature [K]', 'Process temperature [K]',
       'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]',
       'Machine failure', 'TWF', 'HDF', 'PWF', 'OSF', 'RNF'],
      dtype='object')

In [39]:
for i in df.columns[3:]: # picking only the desired feature columns from df
    generateAndMap(i)

In [40]:
repo # (not yet containing the predicted values)

Unnamed: 0,Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
0,312,2801,58,182,1,1,1,0,1,1
1,305,2708,7,38,1,1,0,1,1,0
2,312,1539,38,91,1,0,0,0,0,1
3,305,2872,11,25,1,1,1,0,1,1
4,306,1955,20,225,0,0,1,1,0,0
5,311,2439,46,197,1,1,1,0,1,0
6,310,2573,10,150,0,1,0,1,1,1
7,307,2729,36,249,1,1,0,1,1,1
8,313,2746,17,78,1,1,0,1,1,1
9,306,2431,6,225,0,0,1,1,0,0


In [41]:
# let's now add the column conataining predicted values of the Air temperatures.

In [42]:
predicted_vals = []
for i in range(10):
    predicted_vals.append( linear_mod.predict_(repo['Process temperature [K]'].iloc[i], 
                                        repo['Rotational speed [rpm]'].iloc[i],
                                        repo['Torque [Nm]'].iloc[i],
                                        repo['Tool wear [min]'].iloc[i],
                                        repo['TWF'].iloc[i], repo['HDF'].iloc[i],
                                        repo['PWF'].iloc[i], repo['OSF'].iloc[i],
                                        repo['RNF'].iloc[i]) )

INFO 2022-08-22 19:06:18,988 readying the model..
INFO 2022-08-22 19:06:19,000 Model executed succesfully!
INFO 2022-08-22 19:06:19,007 readying the model..
INFO 2022-08-22 19:06:19,019 Model executed succesfully!
INFO 2022-08-22 19:06:19,020 readying the model..
INFO 2022-08-22 19:06:19,030 Model executed succesfully!
INFO 2022-08-22 19:06:19,032 readying the model..
INFO 2022-08-22 19:06:19,042 Model executed succesfully!
INFO 2022-08-22 19:06:19,044 readying the model..
INFO 2022-08-22 19:06:19,054 Model executed succesfully!
INFO 2022-08-22 19:06:19,056 readying the model..
INFO 2022-08-22 19:06:19,071 Model executed succesfully!
INFO 2022-08-22 19:06:19,073 readying the model..
INFO 2022-08-22 19:06:19,083 Model executed succesfully!
INFO 2022-08-22 19:06:19,084 readying the model..
INFO 2022-08-22 19:06:19,093 Model executed succesfully!
INFO 2022-08-22 19:06:19,094 readying the model..
INFO 2022-08-22 19:06:19,104 Model executed succesfully!
INFO 2022-08-22 19:06:19,106 readying

In [43]:
repo["predicted Air temperature [K]"] = pd.Series(predicted_vals)

In [44]:
repo # final report (10 test cases)

Unnamed: 0,Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF,predicted Air temperature [K]
0,312,2801,58,182,1,1,1,0,1,1,304.371178
1,305,2708,7,38,1,1,0,1,1,0,294.531242
2,312,1539,38,91,1,0,0,0,0,1,302.177509
3,305,2872,11,25,1,1,1,0,1,1,296.142473
4,306,1955,20,225,0,0,1,1,0,0,297.19904
5,311,2439,46,197,1,1,1,0,1,0,303.183837
6,310,2573,10,150,0,1,0,1,1,1,300.327418
7,307,2729,36,249,1,1,0,1,1,1,296.851138
8,313,2746,17,78,1,1,0,1,1,1,303.877407
9,306,2431,6,225,0,0,1,1,0,0,297.285841


In [45]:
repo.to_csv("ai4i__test_cases.csv")