# Bus times project
A basic analysis of the arrival and departure times of buses on the 401 route in Galway city.

## Background
The 401 bus route in Galway city has a reputation for being unreliable, with arrival times sometimes half an hour longer than scheduled. I have chosen this as a basic Data Analysis project to report.
## Variables
 - The buses are scheduled to arrive at a stop every 20 minutes starting on the hour. I have chosen to represent this as a standard Normal Distribution with a standard deviation of 15 minutes or quarter of one hour.
 - Passengers at each stop are modelled to arrive according to the Poisson Distribution with an average of 5.
 - Whether the bus is old or new is modelled as a Binomial Distribution with probability of 0.5.
 - The categorical variable of the experience of the drivers as Experienced or Unexperienced.

## Some code

In [5]:
# Choose the starting point of the analysis as 1st September, 2018, with buses running every 20 minutes or 0.3 H 
# Choose a standard deviation of 15 minutes or 0.25 H
# 100 data points are required

# import the packages

import pandas as pd
import numpy as np

In [24]:
rng = pd.date_range('1/7/2018', periods = 100, freq = '0.3H')


In [25]:
rng

DatetimeIndex(['2018-01-07 00:00:00', '2018-01-07 00:18:00',
               '2018-01-07 00:36:00', '2018-01-07 00:54:00',
               '2018-01-07 01:12:00', '2018-01-07 01:30:00',
               '2018-01-07 01:48:00', '2018-01-07 02:06:00',
               '2018-01-07 02:24:00', '2018-01-07 02:42:00',
               '2018-01-07 03:00:00', '2018-01-07 03:18:00',
               '2018-01-07 03:36:00', '2018-01-07 03:54:00',
               '2018-01-07 04:12:00', '2018-01-07 04:30:00',
               '2018-01-07 04:48:00', '2018-01-07 05:06:00',
               '2018-01-07 05:24:00', '2018-01-07 05:42:00',
               '2018-01-07 06:00:00', '2018-01-07 06:18:00',
               '2018-01-07 06:36:00', '2018-01-07 06:54:00',
               '2018-01-07 07:12:00', '2018-01-07 07:30:00',
               '2018-01-07 07:48:00', '2018-01-07 08:06:00',
               '2018-01-07 08:24:00', '2018-01-07 08:42:00',
               '2018-01-07 09:00:00', '2018-01-07 09:18:00',
               '2018-01-

## Table
A table is generated showing average bus arrivals, with standard deviation of 15 minutes. Average passengers at each stop. Whether bus is old or new. A method of assigning a categorical variable of Experienced/Unexperienced driver is sought.

In [46]:
ts = pd.DataFrame(np.random.normal(3,0.25,len(rng)), index=rng, columns=["Buses"])

In [38]:
ts

Unnamed: 0,Buses
2018-01-07 00:00:00,3.152347
2018-01-07 00:18:00,3.091456
2018-01-07 00:36:00,3.519285
2018-01-07 00:54:00,3.062930
2018-01-07 01:12:00,3.040520
2018-01-07 01:30:00,2.518914
2018-01-07 01:48:00,3.595138
2018-01-07 02:06:00,3.011155
2018-01-07 02:24:00,3.141787
2018-01-07 02:42:00,3.365883


In [40]:
ts = pd.DataFrame(np.random.poisson(5,len(rng)), columns=["Passengers"])

In [41]:
ts

Unnamed: 0,Passengers
0,4
1,4
2,6
3,3
4,3
5,6
6,4
7,6
8,5
9,7


In [47]:
# If Column variable = 1, Bus is Old, if Column variable = 0, Bus is deemed New

In [48]:
ts = pd.DataFrame(np.random.binomial(1,0.5,len(rng)), columns=["Old/New"])

In [49]:
ts

Unnamed: 0,Old/New
0,1
1,1
2,1
3,1
4,1
5,0
6,0
7,1
8,1
9,0


## Comment
 - The random variables produced for Buses is Normally Distributed and needs further investigation. It is meant to read the **times** the buses arrive not a floating value for the Bus.
 - The variable for Passengers is Poisson Distributed and shows the average number of passengers embarking at each stop over 100 stops in the period.
 - Old/New buses are given as either variable 1 or 0.
 - A way of assigning a categorical variable for Experienced/Inexperienced drivers is being sought.