# Working with Pandas example

> This guide assumes that you have set up TimescaleDB with TLSS to export Home Assistant data. For more info check [here](https://nghome.dev/docs/implementation/data_collection/timescale)

This notebook shows how you can work with Pandas to manipulate and process data coming from Home Assistant.

When you're going to use Pandas in your Home Automation setup you'll retrieve the needed data from your TimescaleDB Database. Seeing that this is not an option when working in a Google Colab notebook, I extracted some data from my database and made it available as a CSV that we can load in.
If you are interested in how to retrieve the data from the database, check the full example at the bottom of this workbook.

Good to note is that this notebook is meant as more of a hands-on showcase. I will use a lot of pandas functions that can be a bit confusing, please refer to the official [pandas docs](https://pandas.pydata.org/docs/) for more info.

## Prerequisites

Before we can start using Pandas we need to install the package first. Run the code block below to get pandas install through pip.

In [None]:
!pip install pandas

After that we import the `pandas` package by running the code block below. 

In [2]:
import pandas as pd

## Retrieving data

Now that we have all prerequisites set up we can retrieve the prepared data from the CSV file.

The data in this file has 3 columns:

* `time` - The time at which the state change was recorded.
* `entity_id` - The entity id of the specific sensor.
* `state` - The recorded state of the sensor.

We retrieve this CSV file by using the `read_csv()` file provided by pandas. This function reads the file and parses it into a `DataFrame` object.

We then call the `head()` function on this DataFrame to show its first 5 entries.

In [6]:
# Retrieve the data from CSV
df = pd.read_csv('https://raw.githubusercontent.com/moonen-home-automation/colab-notebooks/main/pandas-example/data/sensor_values.csv')
# Show the first 5 items
df.head()

Unnamed: 0,time,entity_id,state
0,2024-07-12 19:36:04.882570,sensor.indoor_lux,386.0
1,2024-07-12 19:36:28.115302,sensor.sun_azimuth,101.1
2,2024-07-12 19:36:40.420371,sensor.sun_azimuth,227.1
3,2024-07-12 19:36:49.616662,sensor.indoor_lux,515.0
4,2024-07-12 19:36:31.442474,sensor.outdoor_temp,19.6


You can now see that the data has been retrieved successfully and the first 5 entries are presented to you.

We can also retrieve a specific column of the DataFrame as follows:

In [None]:
# Only retrieve the `state` column
df["state"].head()

## Parsing the `time` column

It may not be noticeable at first glance but the values in the `time` column are of type `str`, and not of type `datetime`. We need to convert these types for us to be able to effectively use the `time` data later on.

Luckily pandas has a function called `to_datetime()` that takes in any time formatted string and tries to convert it to a `datetime` object.

When this is done we can use the `dt.round()` function to round up the time values to the nearest minute. This makes the data easier to handle further on.

In [7]:
# Take the values of the time column and parse it into a `datetime` object
df["time"] = pd.to_datetime(df["time"])
df["time"] = df["time"].dt.round("min")
df.head()

Unnamed: 0,time,entity_id,state
0,2024-07-12 19:36:00,sensor.indoor_lux,386.0
1,2024-07-12 19:36:00,sensor.sun_azimuth,101.1
2,2024-07-12 19:37:00,sensor.sun_azimuth,227.1
3,2024-07-12 19:37:00,sensor.indoor_lux,515.0
4,2024-07-12 19:37:00,sensor.outdoor_temp,19.6


As you can see in the output the formatting of the time data has changed and they are rounded to the minute.