# Introduction
Nowadays Sharing Economies become ever more present, so that for instance people start preferring to flexibly rent a car whenever they need one instead of owning a car which stands still most of the time. This also transfers to services like bike and e-scooter sharing.
In this weeks tutorial we will use such a [Bikesharing](https://www.kaggle.com/competitions/bike-sharing-demand/) dataset and take a closer look at it by visualizing the data. This will help us gain a better overview of the dataset, identify patterns and trends, and develop a more intuitive understanding of the underlying relationships before moving on to further analysis.

# Data Visualization with PyGWalker

**PyGWalker** is a Python library designed for exploratory **visual data analysis**. It takes a standard Pandas DataFrame and transforms it into an interactive, drag-and-drop style interface — similar to tools like Tableau — allowing analysts to visually explore their data without having to write extensive visualization code.

We’ll use it in this session because it makes it easy to quickly generate visualizations such as charts, histograms, scatter plots, heatmaps, and facet views simply by dragging variables into place. In addition, it provides a built-in data preview and profiling table, which helps us get an immediate sense of the data’s distributions, missing values, and variable types — all in one clear and interactive interface.

# Import necessary Libraries

In this code block, we start by importing the libraries we’ll need for our work. **os** helps us interact with the file system (e.g., loading files), **pandas** is used for handling and analyzing data, **numpy** allows us to efficiently work with numerical data and perform mathematical operations, and **pygwalker** provides an easy way to visually explore datasets. We also import random to generate random values when needed. Finally, we set a random seed (both for NumPy and Python’s random module) to make sure that any random processes are reproducible — meaning we’ll get the same results each time we run the code.

In [None]:
!pip install pygwalker -q

In [None]:
#Imports
import os
import pandas as pd
import numpy as np
import pygwalker as pyg
import random

# Set seed for reproducibility
np.random.seed(42)  # Set seed for NumPy
random.seed(42) # Set seed for random module

# Load Data

Next, we will take a closer look at the BikeSharing dataset. To do this, we first need to load the dataset.

In [None]:
# Loading the data from a csv file
data = pd.read_csv("https://raw.githubusercontent.com/kbrennig/MODS_WS24_25/refs/heads/main/data/BikeSharing.csv")

# Explore Data
First let’s have a look at the data.

We can use the `head()` function to display the first few lines of our data frame.

*Run the code below.*

In [None]:
data.head()

Additionally we can also use the `describe()` function to get an overview of the columns of our data frame and basic descriptive statistics for the numeric columns.

*Run the code below.*

In [None]:
data.describe()

# Exploratory Data Analysis with PyGWalker

Now we initialize PyGWalker on our dataset by calling pyg.walk(df). This opens an interactive visualization interface directly inside the notebook, where we can explore the data without writing additional plotting code.

After executing the code chunk below the pygwalker UI opens and we can see the tabs "Data", "Visualization" and "Chat".

**"Data" tab:** In this tab we can see the raw data, which our dataset consists of. Here we can also modify the data type of the columns. For example, click on the **blue icon** in front of **datetime** to open the options for that column. As this column contains dates we can change the data type to **temporal** to account for it.

**"Visualization" tab:** Now we can switch to the visualization tab and start exploring the distribution of the data and the relationship between the features.
Interesting plots might be:
1. A bar chart showing the total bike rentals per season.
2. A bar chart showing the total bike rentals per weather category colored by season.
3. Plots displaying the mean and total count of bike rentals depending on if it is a holiday or not.
4. Plots displaying the mean and total count of bike rentals depending on if it is a workingday or not.
5. Find some visualizations that show interesting properties of the data on your own.


In [None]:
# Initialize pygwalker on data
walker = pyg.walk(data)

**Visualization of aggregated data to see a different level of detail**

Sometimes you might be interested in relations that are visible on a different level of detail. Below we aggregate the data per day and get either the averages or sums per day. 
In the pygwalker UI we can now for example display the timeseries of total bike rentals per day.
Feel free to experiment with different visualizations as well.

Further interesting patterns might be observable by aggregating per month or per weekday. Sadly, we have to write a bit of code ourself so we can visualize this in pygwalker.


Here we take the **hourly bike rental data** and turn it into a **daily summary** that’s easier to analyze. It first converts the 'datetime' column into a proper date format, then resamples the data by day to calculate total and average values. The **totals** capture how many **rides happened each day**, while the **averages** describe the day’s typical **weather** and **rental patterns**. After that, it cleans up the date column and resets the index so the dataset is neat and ready for use. 

In [None]:
data['datetime'] = pd.to_datetime(data['datetime'])
data_day = pd.DataFrame()
# Aggregate hourly data to daily data 
data_day = data.resample('D', on='datetime')[["count", "casual", "registered"]].sum().rename(columns={"count": "total_count", "casual": "total_casual", "registered": "total_registered"})
data_day = data.resample('D', on='datetime')[["temp", "atemp", "humidity", "windspeed", "count", "casual", "registered"]].mean().rename(columns={"temp": "avg_temp", "atemp": "avg_atemp", "humidity": "avg_humidity", "windspeed": "avg_windspeed", "count": "hourly_avg_count", "casual": "hourly_avg_casual", "registered": "hourly_avg_registered"}).join(data_day)
# Get date from index
data_day['date'] = data_day.index.date
# Reset index
data_day = data_day.reset_index(drop=True)
walker = pyg.walk(data_day)