# üêç 1. In Python We Trust

You've just embraced kdb+ as a data platform and you've no idea about q. PyKX is here to help you!

In [None]:
import pykx as kx

## Traffic

You follow the documentation to load the traffic set using `csv`. However, it returns a rather peculiar PyKX object!

In [None]:
traffic = kx.q.read.csv("data/traffic.csv", "IPSJS", ",", True)
type(traffic)

No worries! You've read that using `pd`, you can get back your beloved Pandas dataframe.

In [None]:
traffic = traffic.pd()
traffic

Let's clean the data and combine all the station information to gain an overall view of the traffic in Madrid.

In [None]:
traffic = traffic[traffic['error'] == 'N']
traffic = traffic.set_index('fecha')
traffic_mad = traffic[['carga']].groupby(['fecha']).mean()
traffic_mad

## Weather

Now, we follow similar steps to load the weather data and retain only the precipitation data.

In [None]:
weather = kx.q.read.csv("data/weather.csv", "DUIFFFFFFFF", ",", True)
weather = weather.pd()
weather['fecha'] = weather['fecha'] + weather['hora']
weather = weather[weather['precipitacion'].notnull()]
weather = weather.set_index(['fecha'])
weather_mad = weather[['precipitacion']].groupby(['fecha']).mean()
weather_mad

## All Together

Just one final step remains to merge both tables...

In [None]:
import pandas as pd
traffic_weather = pd.merge_asof(traffic_mad, weather_mad, on='fecha', direction='backward')
traffic_weather

## Model

The Python ecosystem is exceptionally rich, particularly in terms of data-related libraries: sklearn, tensorflow, matplotlib, etc.

In [None]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import matplotlib.pyplot as plt

We leverage them to construct a neural network for predicting traffic.

<div class="alert alert-warning">
While using a classic neural network to predict traffic is discouraged, we should have opted for an LSTM, as described in the original post. However, we chose to keep this snippet as simple as possible.
</div>

In [None]:
to_quarter = lambda x: int((x.hour * 60 + x.minute) / 15)
traffic_weather['hora'] = traffic_weather['fecha'].dt.time.apply(to_quarter)

X = traffic_weather[['hora', 'precipitacion']]
y = traffic_weather['carga']

scaler = MinMaxScaler()
X_normalized = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_normalized, y, test_size=0.2, random_state=42)

model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='linear'))

model.compile(optimizer='adam', loss='mean_squared_error')
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test), verbose=1)
predictions = model.predict(X_test)

Now, we analyze the predictions to gain insights

In [None]:
plt.scatter(y_test, predictions, color='blue')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], linestyle='--', color='red', linewidth=2)  # Diagonal line for reference
plt.xlabel('Actual Traffic Load')
plt.ylabel('Predicted Traffic Load')
plt.title('Actual vs. Predicted Traffic Load')
plt.show()

## What Next?
* Easy Maintenance for Current Process
* Not Using Kdb to the Fullest