### Generating fake data for Assignments, Incidents and Patient Transfers System (PTS).

We are generating three fake dataframes that will be used throughout the analysis and modelling for this project.


- __Assignments__: lists the individual ambulances which have been assigned to an incident and is used to understand the characteristics of ambulances which completed handovers of one or more patients to Emergency Departments. More specifically, we are generating a dataframe which details the following:
    - incident number (a unique identifier)
    - hospital
    - number of patients transported
    - the time the ambulance arrived at the hospital
    - the time the ambulance has handed the patient over to the emergency department
    - the time the ambulance was clear and ready to take a new job
    - the time the ambulance left the queue
    - the skill levels of each crew member on that ambulance
    - the hospitals easting and northing
    - whether HALOing was carried out
    - the HALOing time in minutes
    - the handover delay in minutes

** Note HALOing occurs when an ambulance leaves the patient with another ambulance so handover has not been completed but an ambulance is clear to leave the scene

- __Incidents__: Used to understand the effect of the age band of the patient and responding priority of the incident. We are generating a dataframe with the following columns:
    - incident number (a unique identifier)
    - age band of the patients involved in the incident
    - repsonding priority of the incident
    
    
- __Patient Transfers System (PTS)__: We are generating a dataframe which consists of the hospital, the time and whether a patient had been admitted to the hospital (+1) or discharged from that hospital (-1)

In [9]:
import sys
sys.path.append("../src")
from data.fake_data_generator import generate_fake_assignments_data, generate_fake_incidents_data, generate_fake_pts_data

Specifying how many rows we want in the dataframes and the range of timestamps. Throughout the notebooks we are setting the number of rows for each dataframe to be 10,000 and have generated random timestamps between the 12th of January 1953 and the 19th of January 1954.

In [10]:
number_of_rows = 10000
start_date = '1953-01-12 17:22:36'
end_date = '1954-01-19 03:30:00'

In [11]:
# Using functions from fake_data_generator.py to create the three generated dataframes  
df_arrived = generate_fake_assignments_data(start_date, end_date, number_of_rows)
df_incidents = generate_fake_incidents_data(number_of_rows)
df_pts = generate_fake_pts_data(start_date, end_date, number_of_rows)

In [12]:
# Example of the assignments fake data
df_arrived.head()

Unnamed: 0,incident_number,hospital,num_patients_transported,time,time_destination,time_handover,time_clear,crew1_skill,crew2_skill,crew3_skill,dest_easting,dest_northing,haloing_done,haloing_time_mins,time_amb_left_queue,handover_delay_mins
0,0,hospital_A,2,1953-02-18 10:23:56,1953-06-02 16:21:36,1953-07-16 20:22:44,1953-03-03 20:13:14,,,"skill_A, skill_D",1401063.0,1877733.0,True,194409.5,1953-03-03 20:13:14,63586.14
1,1,hospital_E,1,1953-07-10 19:59:38,1953-12-22 14:10:04,1953-03-26 00:01:05,1953-04-22 20:03:08,,"skill_D, skill_A",,1713160.0,1345860.0,False,,1953-03-26 00:01:05,0.0
2,2,hospital_A,2,1953-02-04 21:14:28,1953-11-24 05:31:23,1953-02-01 02:00:47,1953-10-15 19:11:00,"skill_A, skill_B",,"skill_E, skill_D",1401063.0,1877733.0,False,,1953-02-01 02:00:47,0.0
3,3,hospital_A,1,1953-09-18 02:43:55,1953-05-06 00:58:40,1953-08-29 19:53:22,1953-06-23 19:33:46,,,"skill_D, skill_A",1401063.0,1877733.0,True,96499.6,1953-06-23 19:33:46,166719.69
4,4,hospital_D,3,1953-08-16 20:08:26,1953-10-15 14:29:56,1953-07-29 12:38:02,1953-09-03 19:50:13,,,"skill_E, skill_E",1150139.0,1525183.0,False,,1953-07-29 12:38:02,0.0


In [13]:
# Example of the incidents fake data
df_incidents.head()

Unnamed: 0,incident_number,age_band,responding_priority
0,0,37 to 54,2.0
1,1,73 to 90,2.0
2,2,19 to 36,7.0
3,3,1 to 18,3.0
4,4,37 to 54,1.0


In [14]:
# Example of the PTS fake data
df_pts.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,flow
hospital,time,Unnamed: 2_level_1
hospital_C,1953-05-01 00:17:49,1
hospital_B,1953-09-29 09:38:19,1
hospital_A,1953-12-22 08:29:02,-1
hospital_A,1953-07-06 18:56:33,-1
hospital_A,1953-03-26 00:39:59,1


In [15]:
# Saving all three dataframes to a file directory, add your own path

df_arrived.to_parquet('../outputs/arrived.parquet')
df_incidents.to_parquet('../outputs/incidents.parquet')
df_pts.to_parquet('../outputs/pts.parquet')