

<img src="assets/uk_railways.jpg" alt="Prototype 3" width="600" height="200">

# UK Railways Ticket Price Dashboard
DTSA5304

**Stanislav Liashkov**  2024.07.05


## Introduction

This project starts with the basic exploratory data analysis in order to see what kind of data we have and what dimensions
are potentially to be the most interesting for *Railway management and stakeholders*. The **goal** of this project is to *design and evaluate a dashboard for monitoring ticket price distribution for different trips*. In this project, **visual task** that we optimize our dashboard for is "*to discover bussiness-valuable inisghts from data related to ticket price, classes of passengers and payment methods and to be able to quickly get the summary picture for any route*". 

In order to achive the goal and help users complete *visual task*, I have prepared a few **Low-fidelity prototypes** of dashboard and select the most suitable one. Once I decided which prototype should be taken, I started to design the dashboard with *more details* and finally implemented the dashboard using **Altair and Streamlit**.



## Data

The dataset comes from **Kaggle** ( https://www.kaggle.com/datasets/motsimaslam/national-rail-uk-train-ticket-data) and contains data about **UK Railways Train Tickets** such as *ticket price, type of ticket, payment method, passenger class* etc, **spanning from January to April 2024**. It includes detailed information on various aspects of train journeys, providing insights into the travel patterns and pricing within this period. The dataset encompasses the following key attributes:

- **Ticket Type**: The category of the ticket, such as single, return, or season.
- **Journey Date & Time**: The date and time for each train journey, including both departure and arrival times.
- **Departure Station**: The station from which the journey originates.
- **Arrival Station**: The station at which the journey concludes.
- **Ticket Price**: The cost of the ticket in GBP.
- **Other Details**: Additional information that might include passenger class, train service provider, booking reference, and any applicable discounts or offers.





In [1]:
import pandas as pd

df = pd.read_csv("railway.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31653 entries, 0 to 31652
Data columns (total 18 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Transaction ID       31653 non-null  object
 1   Date of Purchase     31653 non-null  object
 2   Time of Purchase     31653 non-null  object
 3   Purchase Type        31653 non-null  object
 4   Payment Method       31653 non-null  object
 5   Railcard             10735 non-null  object
 6   Ticket Class         31653 non-null  object
 7   Ticket Type          31653 non-null  object
 8   Price                31653 non-null  int64 
 9   Departure Station    31653 non-null  object
 10  Arrival Destination  31653 non-null  object
 11  Date of Journey      31653 non-null  object
 12  Departure Time       31653 non-null  object
 13  Arrival Time         31653 non-null  object
 14  Actual Arrival Time  29773 non-null  object
 15  Journey Status       31653 non-null  object
 16  Reas

The **key dimensions** for our defined **visual task** are going to be:
- Ticket Price
- Purchase Type
- Ticket Class
- Ticket Type
- Source & Destination

## Sketching prototypes and Designing dashboard

### Low-fidelity Prototypes
<figure>
  <img src="assets/prototype1.jpeg" alt="Prototype 1" width="500" height="300">
  <figcaption><b>Prototype 1:</b> Blue colors, barcharts </figcaption>
</figure>

<figure>
  <img src="assets/prototype2.jpeg" alt="Prototype 2" width="500" height="300">
  <figcaption><b>Prototype 2:</b> Green colors, smooth histogram + pies</figcaption>
</figure>

<figure>
  <img src="assets/prototype3.jpeg" alt="Prototype 3" width="500" height="300">
  <figcaption><b>Prototype 3:</b> Yellow colors, binned histogram + pies</figcaption>
</figure>



#### Discussion and selection
I've come up with 3 slightly different sketches that can potentially become a prototype for our dashboard. They all share similar goals to show stakeholders **distribution of ticket price along with additional categorical data (Ticket Class and Type, Purchase Type) and option to select any Source and Destination.** The sketched dashboard differ in coloring, positions of plots and type of plots. Let's discuss which prototype should be chosen.

Among all prototypes, I would like to choose **the third (yellow) one** for several reasons:
- Yellow/Golden color fits better as it is associated with money or value (in my opinion)
- Average manager likely finds it simpler to interpret binned discrete histogram rather than density plot
- Pie charts fit better than barplots because it is easier to see what category in what fraction contribute to the whole

I have to point out that in any case, our selected prototype is likely not perfect and have its disadvantages, though for now, we consider this prototype as a good starting point. One additional improvement for this prototype might be some indicator that shows the **average ticker price** because it is likely an important information (It can be inferred from histogram, but you need to put a bit of effort).

Now, it is time to get to implementation of the dashboard.

### Implementation

The dashboard was implemented using graphical library **Altair**, data processing library **pandas** and **Streamlit** for UI and
deployment. The app was deployed via **Streamlit Cloud** and can be reached via *this link:* https://visualizationfinalprojectdtsa5304-6t3nftdumxkjcpwwcdt97n.streamlit.app/

The implemntation accurately follows the defined prototype except the location of selectboxes. It turns out that placing
selectboxes on top of each other will make it more confusing to use  when one of options lists drops down. I also decided 
to go with **dark theme** to make yellow colors more attractive and attaching in comparison to light theme.

The usage of this dashboard is fairly straightforward - you only need to select *Source* and *Destination* stations to get a summary
about tickets for the trip.


<img src="assets/uk_dashboard_screenshot.png" alt="Prototype 3" width="900" height="500">


# Evaluation

#### Journaling
In order to run an evaluation on my implementation and design, I have decided to stick to **Journaling kind of eval**. I have prepared a **set of questions and simple tasks** that people should complete **using a dashboard**. I have interviewed **4 people** and all of them are supposed to be experienced in working with data and data visualization.




*Here's the list of questions and tasks that I used for the interview:*

   1. *Is it intuitively clear how to interact with the visualization (for example, how to select a specific route)?*

   2. *Try to answer the following question using the visualization and briefly describe your steps: On the Manchester - Liverpool route, by what method were most tickets purchased - at the station or online?*

   3. *Try to answer the following question using the visualization and briefly describe your steps: What is the approximate minimum ticket price for the Birmingham - London Paddington route?*

   4. *Try to answer the following question using the visualization and briefly describe your steps: What percentage of people chose "First Class" tickets on the Manchester - Liverpool route?*

   5. *Share what new knowledge/insights you were able to gain from this visualization?*

   6. *The color scheme of the visualization hinders/helps/does not affect me.*

   7. *Rate each aspect from 0 to 10: convenience, informativeness, visual appeal.*

#### People involved

And I should also briefly describe people being interviewed:


* **Mikhail L.** - an experienced Data Scientist (6+ years of experience) with deep knowledge of ML, statistics, and computer science who is currently launching an AI startup. He has enough expertise to be able to evaluate a dashboard properly.

* **Anna B.** - an active CPO in one of the largest banks in Russia. She has seen plenty of presentations and data visualizations during her work experience.

* **Alina A.** - a graphical designer with experience in the GameDev industry. She's able to judge the visual aspect of the dashboard.

* **Yuri O.** - an active Computer Science student with experience in System Administration.

#### Interview Summary


**1. Dashboard Interaction**

As for the first question in our list, all interviewees answered that the interaction with the dashboard is quite intuitive and easy to understand. "You simply need to click two buttons and select Source and Destination stations," as one interviewee said.

**2. Task Performance**

- **First Task (Ticket Purchase Method)**: All interviewees were able to complete the first task successfully. They noted a few things:
  - It would be easier if the percentage was displayed on the sector of the pie.
  - The same color scheme for all three pie charts was slightly misleading.

- **Second Task (Minimum Ticket Price)**: In the first version of this dashboard, people struggled to precisely complete the second task due to the inconvenient granularity of the bars. One bar represented about 5 units, which is quite large on a scale where tickets cost around 10-15 pounds. However, all interviewees were able to select a route and identify the correct bar on the histogram.

- **Third Task (First Class Ticket Percentage)**: The third task was very similar to the first one and was easily completed by all users.

**3. Insights Gained**

Generally, interviewees were able to build some knowledge about the proportion of online/offline payments, ticket classes, and ticket types. Unfortunately, some interviewees mentioned that the distribution of prices across the routes is not easy to follow and make sense of. While the variance in price for a route is clear, it is harder to discern more complex patterns.

**4. Color Scheme**

The yellow color scheme appeared not to affect the interviewees. It neither distracted nor helped them.

**5. Average Ratings**

- **Convenience**: 7.8
- **Informativeness**: 7.2
- **Visual Appeal**: 7.0




## Futher steps


**What is good?**

The qualitative evaluation clearly shows that although the dashboard provides some knowledge and options to explore
the data, it is far from beign perfect. The initial step choice pie charts to represent categorical data was certainly beneficial and worked well for our users since pie charts let you see the whole picture and fractions. Both tooltips for histogram and pie charts are helpful and simplify interaction with the plot.


**What to improve?**

One the fundamental disadvantage of the plot - is that we can't look at higher level picture that contains aggregation on all
routes and keep geospatial relationships between stations. One of alternative that might work well - to represent these 
data as a Graph with stations as nodes and routes as edges (though, this is likely much harder to implement using Streamlit).
Such a visualization will keep the whole picture and geospatial data.

Interviewees also mentioned that the histogram has unstable X-axis and it varies a lot from trip to trip which is misleading and
make it harder to compare distributions because of changing scale. This problem needs time to discover solution and for now, I don't really know how I can improve changing x-axis because routes' price have different price scales.

**Conclusion**

To sum up, Journaling qualitative evaluation was very helpful in terms of collecting feedback from users and build new hypotheses about improving our visualization. Now, we can refine our implemntation and redesign dashboard adjusting to user needs.

