# Intel Berkeley Research Lab Sensor Data

Data Collected from 54 sensors installed in the lab between Feb 28-Apr 5, 2004

## About Dataset

This data is collected from 54 sensors deployed in the Intel Berkeley Research lab between February 28th and April 5th, 2004.
Mica2Dot sensors with weatherboards collected timestamped topology information, along with humidity, temperature, light, and voltage values once every 31 seconds. Data was collected using the TinyDB in-network query processing system, built on the TinyOS platform.

The sensors were arranged in the lab according to the diagram shown below:

This data was originally collected by Peter Bodik, Wei Hong, Carlos Guestrin, Sam Madden, Mark Paskin, and Romain Thibaux. Intel Berkeley provided hardware.

The permission is granted by the authors to use or reproduce this data in any format or venue, given appropriate acknowledgment of their work is given in any published work.

## 1. Data collection

In [1]:
input_file = "data.txt"
output_file = "data.csv"

with open(input_file, 'r') as infile, open(output_file, 'w') as outfile:
    for line in infile:
        outfile.write(line.replace(' ', ','))

print(f"{input_file} was converted to {output_file}")

data.txt was converted to data.csv


In [2]:
from pyspark.sql import SparkSession
from pyspark.sql.types import (StructType, StructField, StringType, IntegerType, FloatType, TimestampType, DateType)

# Definindo o esquema conforme a descrição fornecida
schema = StructType([
    StructField("date", DateType(), True),
   StructField("time", StringType(), True),  # Aqui estamos assumindo que a coluna de tempo é uma string, mas você pode convertê-la para outro tipo se necessário.
    StructField("epoch", IntegerType(), True),
    StructField("moteid", IntegerType(), True),
    StructField("temperature", FloatType(), True),
    StructField("humidity", FloatType(), True),
    StructField("light", FloatType(), True),
    StructField("voltage", FloatType(), True)
])

# Criando uma sessão Spark
spark = SparkSession.builder \
    .appName("Load TXT to DataFrame") \
    .getOrCreate()

# Nome do seu arquivo
file_name = "data.csv"

# Lendo o arquivo .txt (que é essencialmente um .csv)
df = spark.read.csv(file_name, header=False, schema=schema)

# Mostrando os primeiros registros do DataFrame
df.show()

+----------+---------------+-----+------+-----------+--------+-----+-------+
|      date|           time|epoch|moteid|temperature|humidity|light|voltage|
+----------+---------------+-----+------+-----------+--------+-----+-------+
|2004-03-31|03:38:15.757551|    2|     1|    122.153|-3.91901|11.04|2.03397|
|2004-02-28| 00:59:16.02785|    3|     1|    19.9884| 37.0933|45.08|2.69964|
|2004-02-28| 01:03:16.33393|   11|     1|    19.3024| 38.4629|45.08|2.68742|
|2004-02-28|01:06:16.013453|   17|     1|    19.1652| 38.8039|45.08|2.68742|
|2004-02-28|01:06:46.778088|   18|     1|     19.175| 38.8379|45.08|2.69964|
|2004-02-28|01:08:45.992524|   22|     1|    19.1456| 38.9401|45.08|2.68742|
|2004-02-28|01:09:22.323858|   23|     1|    19.1652|  38.872|45.08|2.68742|
|2004-02-28|01:09:46.109598|   24|     1|    19.1652| 38.8039|45.08|2.68742|
|2004-02-28|  01:10:16.6789|   25|     1|    19.1456| 38.8379|45.08|2.69964|
|2004-02-28|01:10:46.250524|   26|     1|    19.1456|  38.872|45.08|2.68742|

In [3]:
# Encerrando a sessão Spark
spark.stop()