<a href="https://colab.research.google.com/github/heriswn/LatihanDTS/blob/master/Gas_sensor_array_temperature_modulation_Data_Set.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gas sensor array temperature modulation Data Set
## Abstract
Abstract: A chemical detection platform composed of 14 temperature-modulated metal oxide (MOX) gas sensors was exposed during 3 weeks to mixtures of carbon monoxide and humid synthetic air in a gas chamber.
## Data Set Information
A chemical detection platform composed of 14 temperature-modulated metal oxide semiconductor (MOX) gas sensors was exposed to dynamic mixtures of carbon monoxide (CO) and humid synthetic air in a gas chamber. 

The acquired time series of the sensors and the measured values of CO concentration, humidity and temperature inside the gas chamber are provided. 

a) Chemical detection platform: 
The chemical detection platform was composed of 14 MOX gas sensors that generate a time-dependent multivariate response to the different gas stimuli. 
The utilized sensors were made commercially available by Figaro Engineering (7 units of TGS 3870-A04) and FIS (7 units of SB-500-12). 
The operating temperature of the sensors was controlled by the built-in heater, which voltage was modulated in the range 0.2-0.9 V in cycles of 20 and 25 s, following the manufacturer recommendations (0.9 V for 5s, 0.2 V for 20s, 0.9 V for 5 s, 0.2 V for 25 s, ...). 
The sensors were pre-heated for one week before starting the experiments. 
The MOX read-out circuits consisted of voltage dividers with 1 MOhm load resistors and powered at 5V. 
The output voltage of the sensors was sampled at 3.5 Hz using an Agilent HP34970A/34901A DAQ configured at 15 bits of precision and input impedance greater than 10 GOhm. 

b) Generator of dynamic gas mixtures 
Dynamic mixtures of CO and humid synthetic air were delivered from high purity gases in cylinders to a small-sized polytetrafluoroethylene (PTFE) test chamber (250 cm3 internal volume), by means of a piping system and mass flow controllers (MFCs). 
Gas mixing was performed using mass flow controllers (MFC),which controlled three different gas streams (CO, wet air and dry air). These streams were delivered from high quality pressurized 
gases in cylinders. 
The selected MFCs (EL-FLOW Select, Bronkhorst) had full scale flow rates of 1000 mln/min for the dry and wet air streams and 3 mln/min for the CO channel. 
The CO bottle contained 1600 ppm of CO diluted in synthetic air with 21 Â± 1% O2. 
The relative uncertainty in the generated CO concentration was below 5.5%. 
The wet and dry air streams were both delivered from a synthetic air bottle with 99.995% purity and 21 Â± 1% O2. 
Humidification of the wet stream was based on the saturation method using a glass bubbler (Drechsler bottles). 

c) Temperature/humidity values 
A temperature/humidity sensor (SHT75, from Sensirion) provided reference humidity and temperature values inside the test chamber with tolerance below 1.8% r.h. and 0.5 ÂºC, respectively, every 5 s. 
The temperature variations inside the gas chamber, for each experiment, were below 3 ÂºC. 

d) Experimental protocol: 
Each experiment consisted on 100 measurements: 10 experimental concentrations uniformly distributed in the range 0-20 ppm and 10 replicates per concentration. 
Each replicate had a relative humidity randomly chosen from a uniform distribution between 15% and 75% r.h. 
At the beginning of each experiment, the gas chamber was cleaned for 15 min using a stream of synthetic air at a flow rate of 240 mln/min. 
After that, the gas mixtures were released in random order at a constant flow rate of 240 mln/min for 15 min each. 
A single experiment lasted 25 hours (100 samples x 15 minutes/sample) and was replicated on 13 working days spanning a natural period of 17 days.

## Atribute information:
The dataset is presented in 13 text files, where each file corresponds to a different measurement day. The filenames indicate the timestamp (yyyymmdd_HHMMSS) of the start of the measurements. 
Each file includes the acquired time series, presented in 20 columns: Time (s), CO concentration (ppm), Humidity (%r.h.), Temperature (ÂºC), Flow rate (mL/min), Heater voltage (V), and the resistance of the 14 gas sensors: R1 (MOhm),R2 (MOhm),R3 (MOhm),R4 (MOhm),R5 (MOhm),R6 (MOhm),R7 (MOhm),R8 (MOhm),R9 (MOhm),R10 (MOhm),R11 (MOhm),R12 (MOhm),R13 (MOhm),R14 (MOhm) 
Resistance values R1-R7 correspond to FIGARO TGS 3870 A-04 sensors, whereas R8-R14 correspond to FIS SB-500-12 units. 
The time series are sampled at 3.5 Hz.

In [0]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing
from sklearn import utils

In [0]:
df = pd.read_csv('20160930_203718.csv')

In [0]:
df.head()

Unnamed: 0,Time (s),CO (ppm),Humidity (%r.h.),Temperature (C),Flow rate (mL/min),Heater voltage (V),R1 (MOhm),R2 (MOhm),R3 (MOhm),R4 (MOhm),R5 (MOhm),R6 (MOhm),R7 (MOhm),R8 (MOhm),R9 (MOhm),R10 (MOhm),R11 (MOhm),R12 (MOhm),R13 (MOhm),R14 (MOhm)
0,0.0,0.0,49.7534,23.7184,233.2737,0.8993,0.2231,0.6365,1.1493,0.8483,1.2534,1.4449,1.9906,1.3303,1.448,1.9148,3.4651,5.2144,6.5806,8.6385
1,0.309,0.0,55.84,26.62,241.6323,0.2112,2.1314,5.3552,9.7569,6.3188,9.4472,10.5769,13.6317,21.9829,16.1902,24.278,31.1014,34.7193,31.7505,41.9167
2,0.618,0.0,55.84,26.62,241.3888,0.207,10.5318,22.5612,37.2635,17.7848,33.0704,36.316,42.5746,49.7495,31.7533,57.7289,53.6275,56.9212,47.8255,62.9436
3,0.926,0.0,55.84,26.62,241.1461,0.2042,29.5749,49.5111,65.6318,26.1447,58.3847,67.513,68.0064,59.2824,36.7821,66.0832,66.8349,66.9695,50.373,64.8363
4,1.234,0.0,55.84,26.62,240.9121,0.203,49.5111,67.0368,77.8317,27.9625,71.7732,79.9474,79.8631,62.5385,39.6271,68.1441,62.0947,49.4614,52.8453,66.8445


In [0]:
df.describe()

Unnamed: 0,Time (s),CO (ppm),Humidity (%r.h.),Temperature (C),Flow rate (mL/min),Heater voltage (V),R1 (MOhm),R2 (MOhm),R3 (MOhm),R4 (MOhm),R5 (MOhm),R6 (MOhm),R7 (MOhm),R8 (MOhm),R9 (MOhm),R10 (MOhm),R11 (MOhm),R12 (MOhm),R13 (MOhm),R14 (MOhm)
count,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0,295719.0
mean,45457.82366,9.900786,45.966022,26.476381,239.942669,0.355293,14.962249,17.396103,22.23308,18.42967,31.054371,28.773424,31.675358,25.705773,21.29542,25.224047,27.087685,24.833538,21.742677,27.978418
std,26242.177649,6.429229,12.315889,0.211647,1.947498,0.288663,22.368187,26.648398,28.695365,15.210454,26.675759,27.238787,27.594177,18.65203,16.050916,20.22483,20.191156,18.278996,16.783496,21.642733
min,0.0,0.0,17.5,23.7184,0.0,0.199,0.0315,0.056,0.054,0.0402,0.0489,0.0485,0.0534,0.033,0.0292,0.0366,0.031,0.0327,0.033,0.0314
25%,22727.7385,4.44,36.17,26.3,239.8958,0.2,0.4048,0.4841,0.5863,2.0683,1.7852,1.5828,1.8844,11.2915,8.3757,7.4485,10.2553,9.3847,7.5149,9.4998
50%,45460.834,8.89,46.67,26.46,239.9729,0.2,1.6441,1.3561,4.0554,18.5796,32.317,22.9903,31.4054,25.8081,20.6641,23.1211,26.8533,24.903,20.6614,26.2795
75%,68177.907,15.56,55.33,26.62,240.0462,0.207,25.1595,28.8601,45.0994,29.2098,50.562,49.6055,52.4174,39.0413,32.7781,39.3194,41.3344,38.1166,33.3976,43.4362
max,90909.778,20.0,71.96,26.94,275.1803,0.901,113.4868,154.629,182.3433,91.8226,124.1949,138.2019,151.012,102.8265,84.9877,134.999,108.8521,90.2894,75.4135,108.6633


In [0]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 295719 entries, 0 to 295718
Data columns (total 20 columns):
Time (s)              295719 non-null float64
CO (ppm)              295719 non-null float64
Humidity (%r.h.)      295719 non-null float64
Temperature (C)       295719 non-null float64
Flow rate (mL/min)    295719 non-null float64
Heater voltage (V)    295719 non-null float64
R1 (MOhm)             295719 non-null float64
R2 (MOhm)             295719 non-null float64
R3 (MOhm)             295719 non-null float64
R4 (MOhm)             295719 non-null float64
R5 (MOhm)             295719 non-null float64
R6 (MOhm)             295719 non-null float64
R7 (MOhm)             295719 non-null float64
R8 (MOhm)             295719 non-null float64
R9 (MOhm)             295719 non-null float64
R10 (MOhm)            295719 non-null float64
R11 (MOhm)            295719 non-null float64
R12 (MOhm)            295719 non-null float64
R13 (MOhm)            295719 non-null float64
R14 (MOhm)     

In [0]:
df.shape

(295719, 20)

In [0]:
len(df.columns)

20

In [0]:
df.head()

Unnamed: 0,Time (s),CO (ppm),Humidity (%r.h.),Temperature (C),Flow rate (mL/min),Heater voltage (V),R1 (MOhm),R2 (MOhm),R3 (MOhm),R4 (MOhm),R5 (MOhm),R6 (MOhm),R7 (MOhm),R8 (MOhm),R9 (MOhm),R10 (MOhm),R11 (MOhm),R12 (MOhm),R13 (MOhm),R14 (MOhm)
0,0.0,0.0,49.7534,23.7184,233.2737,0.8993,0.2231,0.6365,1.1493,0.8483,1.2534,1.4449,1.9906,1.3303,1.448,1.9148,3.4651,5.2144,6.5806,8.6385
1,0.309,0.0,55.84,26.62,241.6323,0.2112,2.1314,5.3552,9.7569,6.3188,9.4472,10.5769,13.6317,21.9829,16.1902,24.278,31.1014,34.7193,31.7505,41.9167
2,0.618,0.0,55.84,26.62,241.3888,0.207,10.5318,22.5612,37.2635,17.7848,33.0704,36.316,42.5746,49.7495,31.7533,57.7289,53.6275,56.9212,47.8255,62.9436
3,0.926,0.0,55.84,26.62,241.1461,0.2042,29.5749,49.5111,65.6318,26.1447,58.3847,67.513,68.0064,59.2824,36.7821,66.0832,66.8349,66.9695,50.373,64.8363
4,1.234,0.0,55.84,26.62,240.9121,0.203,49.5111,67.0368,77.8317,27.9625,71.7732,79.9474,79.8631,62.5385,39.6271,68.1441,62.0947,49.4614,52.8453,66.8445


In [0]:
models={
    "logit": LogisticRegression(solver="lbfgs", multi_class="auto")
}

In [0]:
#col = ['Temperature (C)']

In [0]:
#cols = ['Heater voltage (V)']

In [0]:
print('[INFO] loading data...')
dataset = df
X = df.iloc[:, : -1].values
y = df.iloc[:, -1].values

(trainX, testX, trainy, testy) = train_test_split(X, y, random_state=3, test_size=0.25)

[INFO] loading data...


In [0]:
clf = LogisticRegression()
clf.fit(X, y)



ValueError: ignored

In [0]:
lab_enc = preprocessing.LabelEncoder()
training_scores_encoded = lab_enc.fit_transform(y)
print(training_scores_encoded)
print(utils.multiclass.type_of_target(y))
print(utils.multiclass.type_of_target(y.astype('int')))
print(utils.multiclass.type_of_target(training_scores_encoded))

In [0]:
#clf = LogisticRegression()
#clf.fit(X, training_scores_encoded)
#print("LogisticRegression")
#print(clf.predict(prediction_data_test))

In [0]:
#model = LinearRegression()

In [0]:
#model.fit(x, y)

In [0]:
#model = LinearRegression().fit(x, y)

In [0]:
#r_sq = model.score(x, y)

In [0]:
#print('coefficient of determination:', r_sq)

In [0]:
#print('intercept:', model.intercept_)

In [0]:
#print('slope:', model.coef_)

In [0]:
from sklearn.metrics import mean_absolute_error

model = models["logit"]
model.fit(trainX, trainy)

print("[INFO] evaluation...")
predictions = model.predict(testX)
print(mean_absolute_error(testy, predictions))