# Exploratory Data Analysis
Here, I want to take some time to explore the smoke detection dataset.  


## 1. Information provided by publisher
First, let's take a look at the information the publisher gave on this dataset.


### 1.1. Context
A smoke detector is a device that senses smoke, typically as an indicator of fire. Smoke detectors are usually housed in plastic enclosures, typically shaped like a disk about 150 millimetres (6 in) in diameter and 25 millimetres (1 in) thick, but shape and size vary.

#### Types of Smoke Detectors
1. Photoelectric Smoke Detector:  
A photoelectric smoke detector contains a source of infrared, visible, or ultraviolet light, a lens, and a photoelectric receiver. In some types, the light emitted by the light source passes through the air being tested and reaches the photosensor. The received light intensity will be reduced due to scattering from particles of smoke, air-borne dust, or other substances; the circuitry detects the light intensity and generates an alarm if it is below a specified threshold, potentially due to smoke. Such detectors are also known as optical detectors.  

2. Ionization Smoke Detector:  
An ionization smoke detector uses a radioisotope to ionize air. If any smoke particles enter the open chamber, some of the ions will attach to the particles and not be available to carry the current in that chamber. An electronic circuit detects that a current difference has developed between the open and sealed chambers, and sounds the alarm.  
  
The author of this dataset has successfully created a smoke detection device with the help of IOT devices and AI model.


### 1.2. About the dataset
Collection of training data is performed with the help of IOT devices since the goal is to develop a AI based smoke detector device.
Many different environments and fire sources have to be sampled to ensure a good dataset for training. A short list of different scenarios which are captured:

- Normal indoor
- Normal outdoor
- Indoor wood fire, firefighter training area
- Indoor gas fire, firefighter training area
- Outdoor wood, coal, and gas grill
- Outdoor high humidity
- etc.

The dataset is nearly 60.000 readings long. The sample rate is 1Hz for all sensors. To keep track of the data, a UTC timestamp is added to every sensor reading.

## 2. Own analysis
Next, we want to get some more insights on the data ourselves. let's take a look.

### 2.1. Load the data
Let's load the data and take a first glance at it

In [2]:
# all import statements
import pandas as pd

In [5]:
df = pd.read_csv("../../data/raw/smoke_detection_iot.csv", index_col=0)

In [7]:
df.columns

Index(['UTC', 'Temperature[C]', 'Humidity[%]', 'TVOC[ppb]', 'eCO2[ppm]',
       'Raw H2', 'Raw Ethanol', 'Pressure[hPa]', 'PM1.0', 'PM2.5', 'NC0.5',
       'NC1.0', 'NC2.5', 'CNT', 'Fire Alarm'],
      dtype='object')

Okay, so let's find an explanation for each of the columns.  

- `UTC`: Timestamp
- `Temperature[C]`: Air Temperature; measured in Celsius
- `Humidity[%]`: Air Humidity; measured in percent
- `TVOC[ppb]`: Total Volatile Organic Compunds; measured in parts per billion
- `eCO2[ppm]`: CO2 equivalent concentration; measured in parts per million
- `Raw H2`: raw molecular hydrogen; measured in ???
- `Raw Ethanol`: raw ethanol gas; measured in ???
- `Pressure[hPa]`: Air Pressure; measured in hectopascal
- `PM1.0`: particulate matter size smaller than 1 µm
- `PM2.5`: particulate matter size between 1 and 2.5 µm
- `NC0.5`: Number concentration of particulate matter. This differs from PM because NC gives the actual number of particles in the air. The raw NC is also classified by the particle size: < 0.5 µm (NC0.5); 
- `NC1.0`: Number concentration of particulate matter. 0.5 µm < 1.0 µm (NC1.0);
- `NC2.5`: Number concentration of particulate matter. 1.0 µm < 2.5 µm (NC2.5);
- `CNT`: Sample counter
- `Fire Alarm`: binary label indicating whether there is a fire (1) or not (0)

Now let's look at the top and bottom of the dataset, just to get a feel for the data.

In [8]:
df.head()

Unnamed: 0,UTC,Temperature[C],Humidity[%],TVOC[ppb],eCO2[ppm],Raw H2,Raw Ethanol,Pressure[hPa],PM1.0,PM2.5,NC0.5,NC1.0,NC2.5,CNT,Fire Alarm
0,1654733331,20.0,57.36,0,400,12306,18520,939.735,0.0,0.0,0.0,0.0,0.0,0,0
1,1654733332,20.015,56.67,0,400,12345,18651,939.744,0.0,0.0,0.0,0.0,0.0,1,0
2,1654733333,20.029,55.96,0,400,12374,18764,939.738,0.0,0.0,0.0,0.0,0.0,2,0
3,1654733334,20.044,55.28,0,400,12390,18849,939.736,0.0,0.0,0.0,0.0,0.0,3,0
4,1654733335,20.059,54.69,0,400,12403,18921,939.744,0.0,0.0,0.0,0.0,0.0,4,0


In [9]:
df.tail()

Unnamed: 0,UTC,Temperature[C],Humidity[%],TVOC[ppb],eCO2[ppm],Raw H2,Raw Ethanol,Pressure[hPa],PM1.0,PM2.5,NC0.5,NC1.0,NC2.5,CNT,Fire Alarm
62625,1655130047,18.438,15.79,625,400,13723,20569,936.67,0.63,0.65,4.32,0.673,0.015,5739,0
62626,1655130048,18.653,15.87,612,400,13731,20588,936.678,0.61,0.63,4.18,0.652,0.015,5740,0
62627,1655130049,18.867,15.84,627,400,13725,20582,936.687,0.57,0.6,3.95,0.617,0.014,5741,0
62628,1655130050,19.083,16.04,638,400,13712,20566,936.68,0.57,0.59,3.92,0.611,0.014,5742,0
62629,1655130051,19.299,16.52,643,400,13696,20543,936.676,0.57,0.59,3.9,0.607,0.014,5743,0
