# Examples and Tasks with Pandas

Pandas is another Python library that provides great functions to work with data. Though being very handy it is not exactly fast. So be aware when working with large datasets - this is going to be slow with Pandas. However for data exploration and testing of functions Pandas is super helpful.

## Loading data from files

The following three examples show, how to load response time data from three different file types.

In [None]:
import pandas as pd

data = pd.read_csv("data-samples/responsetimes.csv")
data

In [None]:
import pandas as pd

data = pd.read_excel("data-samples/responsetimes.xlsx")
data

In [None]:
import pandas as pd

data = pd.read_excel("data-samples/responsetimes.ods", engine="odf")
data

## Connecting to data bases

As we put data into Timescale, let's connect and see if Pandas can work with query results too.

In [None]:
import psycopg2
import logging
from psycopg2.extras import LoggingConnection

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

db_settings = {
    "user": "postgres",
    "password": "password",
    "host": "localhost",
    "database": "sampledata",
}

conn = psycopg2.connect(connection_factory=LoggingConnection, **db_settings)
conn.initialize(logger)
cursor = conn.cursor()
# use the cursor to interact with your database
cursor.execute("SELECT * FROM public.responsetimes")
print(cursor.fetchone())

df = pd.read_sql_query('SELECT * FROM public.responsetimes',con=conn)
df

Now let's add data to the same database. First let's define again our response time function and use it, to generate some data.

In [8]:
from datetime import datetime
import requests
url = "http://heise.de"

def measureResponseTimes(url, attempts):
    timeticks = []
    values = []
    result = {}

    for i in range(attempts):
        response = requests.post(url)
        timeticks.append(datetime.now())
        values.append(response.elapsed.total_seconds()) 
    result["timeticks"] = timeticks
    result["values"] = values
    return result

result = measureResponseTimes(url, 10)

Now add data to database. TODO debug generated SQL script.

In [9]:
from sqlalchemy import create_engine
engine = create_engine("postgresql://postgres:password@localhost:5432/sampledata")

import pandas as pd

# Convert data to a pandas DataFrame
df = pd.DataFrame(result)

# Add a column for URL (assuming the URL is the same for all records)
df['url'] = url  # Replace with the actual URL or remove if not applicable

# Rename columns to match the table structure
df.rename(columns={'timeticks': 'measuretime', 'values': 'responsetime'}, inplace=True)

# Convert responsetime to BIGINT (nanoseconds)
df['responsetime'] = (df['responsetime'] * 1000000).astype(int)

# Insert data into the PostgreSQL table
df.to_sql('responsetimes', engine, if_exists='append', index=False)

10