<a href="https://colab.research.google.com/github/shammud/python/blob/main/Copy_of_Bus_data_challenges.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bus emissions challenge 
---


### Introduction

Kent and Medway have the highest proportion of old buses in the country (~40% of fleet). Old buses are detrimental to the environment as the older buses only have Euro III emissions standards which if used for lots of 
journeys will be dramatically impacting the air quality of the area. 

The client therefore would like us to find out some information which could then be used as evidence to make a case  to improve the bus emissions in the Kent and Medway area.

The datasets we will be using are pubically available. Gov.uk provides data on all bus journeys in the UK and when used in conjunction with Arriva buses fleet emissions data (available from bustimes.org, download [here](https://drive.google.com/uc?export=download&id=1ywtiSwR27JYCC5Sf9G1ZCTOTWNxWBk9_ )) we can build a pretty good 
picture of how many of these old buses are being used for bus journeys in Kent and Medway.

The gov.uk bus data is available in XML format via an api. The data refreshes every 10 seconds so each time you download it, it will show you a snapshot of the buses currently in operation at that time. We have downloaded one snapshot of this 
data and converted it to JSON format accessible to download [here](https://drive.google.com/uc?export=download&id=1a9vMs0Kke7Nh4LuxCnKHkVIkFDr-az_Z)





### Load the data
---
#### **Please run the cell below to load the data required for this challenge.**  
The following code will read both the json file and the bus emissions csv file and create a dictionary (`bus_journeys`) and 2 lists (`vehicle_refs`, `emissions`).


In [38]:
import pandas as pd
import json
import urllib.request

url_json = "https://drive.google.com/uc?export=download&id=1a9vMs0Kke7Nh4LuxCnKHkVIkFDr-az_Z"
csv = "https://drive.google.com/uc?export=download&id=1ywtiSwR27JYCC5Sf9G1ZCTOTWNxWBk9_"

def get_saved_data(url_json):
    if url_json is not None:
        try:
            with urllib.request.urlopen(url_json) as url:
                data = json.loads(url.read().decode())
                return data
        except:
            print("An error occurred while reading the file")


def get_dicts_lists():
  df = pd.json_normalize(get_saved_data(url_json))
  regs = pd.read_csv(csv)

  bus = df[['MonitoredVehicleJourney.LineRef','MonitoredVehicleJourney.DirectionRef','MonitoredVehicleJourney.PublishedLineName','MonitoredVehicleJourney.OriginName','MonitoredVehicleJourney.DestinationName','MonitoredVehicleJourney.OriginAimedDepartureTime','MonitoredVehicleJourney.VehicleRef']]
  bus.columns = bus.columns.str.lstrip("MonitoredVehicleJourney.")
  bus_journeys = bus.to_dict('records')
  
  regs.rename({'Last tracked': 'VehicleRef'}, axis=1 , inplace=True)
  vehiclerefs = regs['VehicleRef'].to_list()
  emission_standards = regs['Emission Class'].to_list()
  return bus_journeys, vehiclerefs, emission_standards

def get_emissions_data():

  emissions_data = [
      {"Standard":"EURO III", "CO2":2.1, "Nox":5, "PM":0.1 },
      {"Standard":"EURO IV","CO2":1.5,"Nox":3.5,"PM":0.02 },
      {"Standard":"EURO V","CO2":1.5,"Nox":2,"PM":0.02},
      {"Standard":"EURO VI","CO2":1.5,"Nox":0.4,"PM":0.01}
  ]
  return emissions_data



bus_journeys, vehicle_refs, emission_standards = get_dicts_lists()
emissions_data = get_emissions_data()


### Task 1 - investigate bus_journeys data
---


This is the data dictionary for records in the bus_journeys data (all fields are alphanumeric):  

LineRef--------------------------------------bus route number  
DirectionRef------------------------------current direction of travel, inbound or outbound   
PublishedLineName------------------timetabled service name (may be same as LineRef)  
OriginName-------------------------------start location of the current route  
DestinationName----------------------end destination on the current route  
OriginAimedDepartureTime------the time at which the bus was timetabled to leave its start location    
Ref---------------------------------------------a uniquely identifier for the bus vehicle  

The bus_journeys data contains a list of records with the fields shown above.  This list contains a records for each bus that is currently on a bus route (assuming that all are tranmitting their locations) 
  
**Task**  
Take a look at the `bus_journeys` dictionary

* Print the first record
* Print the last record
* How is an individual bus journey dictionary structured? 
* How many of these dictionary records are in the list?


**Expected Output**   
First record will have `LineRef` 177  
Last record will have `LineRef` 347  

In [6]:
def records_in_bus_journeys():

 print(bus_journeys[0])
 print(bus_journeys[-1])
 print(len(bus_journeys))
 
records_in_bus_journeys() 

{'LineRef': '177', 'DirectionRef': 'inbound', 'PublishedLineName': '177', 'OriginName': 'Village_Centre', 'DestinationName': 'Victoria_Street', 'OriginAimedDepartureTime': '2022-09-07T11:20:00+00:00', 'Ref': '1655'}
{'LineRef': '347', 'DirectionRef': 'anticlockwise', 'PublishedLineName': '347', 'OriginName': 'Bus_Hub', 'DestinationName': 'Coldharbour_Lane_East', 'OriginAimedDepartureTime': '2022-09-07T12:05:00+00:00', 'Ref': '1633'}
137


### Task 2 - investigate vehicle_refs and emission_standards data lists
---
Take a look at the `vehicle_refs` and `emission_standards` lists
* what is the length of each list?
* find how many unique items there are in the emission_standards list - (**hint** : you will need to create another list and use a for loop) *italicised text* 
* print the unique emission_standards items 
* find how many unique items there are in the vehicle_refs list
* print the length of the unique vehicle_ref items 

In [23]:
def get_list_length():
   return(len(emission_standards),len(vehicle_refs))

def get_unique_items(list):
  unique_items=[]
  for i in list:
    if i not in unique_items:
      unique_items.append(i)
  return (len(unique_items))


# Test for get list length
expected=(223,223)
actual=get_list_length()
if actual==expected:
  print("Test Passed",actual)
else:
  print("Test Failed",actual)  

# Test for unique emission standards
expected=4
actual=get_unique_items(emission_standards)
if actual==expected:
  print("Test Passed",actual)
else:
  print("Test Failed",actual)  


#Tesr for unique_items_vehicle_refs
expected=223
actual=get_unique_items(vehicle_refs)
if actual==expected:
  print("Test Passed",actual)
else:
  print("Test Failed",actual)  


Test Passed (223, 223)
Test Passed 4
Test Passed 223


### Task 3 
---
The client is only concerned about bus routes 116 and 132 specifically.

**Task**
Create a new list of dictionaries which contains only the records where the `LineRef` is either 116 or 132. 

*(**hint**: the datatype of the LineRef might not be what you expect - the data came from a .csv file)*

**Expected output**
There will be 14 records in this list

In [33]:
def create_list_specific_lineref():
  
  new_Lineref=[]
  for bus_journey in bus_journeys:
    if bus_journey['LineRef']== "116" or bus_journey['LineRef'] == "132":
     if bus_journey not in new_Lineref:
      new_Lineref.append(bus_journey)
  return (len(new_Lineref))

# Test for create_list_specific_lineref  
expected=14
actual=create_list_specific_lineref()
if actual==expected:
  print("Test Passed",actual) 
else:
  print("Test Failed",actual)    

Test Passed 14


### Task 4 
---

The indexes of `vehicle_refs` match the indexes of `emissions`.   
Create a new list, which contains dictionaries.  Each dictionary will contain a vehicle_ref and its corresponding emission_class. 
*hint: you will need to use a for loop and indexing and should create dictionaries with two keys: vehicle_ref and emission_class*

In [30]:
def create_list_vehicleref_emissionclass():

 new_list=[]
 for vehicle in range(len(vehicle_refs)):
  key=vehicle_refs[vehicle]
 for category in range(len(emission_standards)):
  value=emission_standards[category]
  new_dict={"vehicle_ref":key,"emission_class":value}
  new_list.append(new_dict)    
 return (len(new_list))


#Test for create_list_vehicleref_emissionclass
expected=223
actual=create_list_vehicleref_emissionclass()
if actual==expected:
  print("Test Passed",actual) 
else:
  print("Test Failed",actual)    

Test Passed 223


### Task 5 
--- 

The list of dictionaries you created in the last exercise is very long. A more intuitive way to hold this data would be by collating data. 

Create a dictionary where each unique emission_class is a key and its corresponding value is a list of all vehicle_refs with that emission_class 

*(**hint**: you could think about using the unique_em list you created earlier)*

**Example Output**

{"EURO III": [1234, 4567, 8910], "EURO IV": [1028, 1283, 1234]}

In [64]:
from collections import defaultdict
def create_sorted_dict():

 sorted_dict=defaultdict(list)
 for group,vehicle in zip(emission_standards,vehicle_refs):
   sorted_dict[group].append(vehicle)
 print(sorted_dict) 
 return len(sorted_dict)


# Test for create_sorted_dict
expected=4
actual=create_sorted_dict()
if actual==expected:
  print("Test Passed",actual)
else:
  print("Test Failed",actual) 

defaultdict(<class 'list'>, {'EURO III': ['6260', '1607', '1609', '1616', '6401', '6402', '6404', '6405', '6406', '6407', '6408', '6410', '6411', '6412', '6413', '6417', '6418', '6419', '6420', '6421', '6422', '6423', '6424', '6425', '6427', '6428', '6429', '6430', '6431', '6432', '6436', '6437', '6438', '6439', '6440', '6441', '6442', '6443', '6444', '6445', '6446', '6447', '6448', '6449', '6005', '6007', '6136', '6132', '6135', '6129', '6124', '6125', '6126', '6127', '6152', '6154'], 'EURO IV': ['6139', '6138', '6143', '6141', '6144', '1633', '1634', '1635', '1636', '3984', '3987', '3988', '3994', '3995', '3996', '1637', '1638', '1639', '1640', '1641', '1642', '4007', '4006', '4005', '6146', '6147', '6150', '6151', '6149', '6148', '6237', '6238', '1523'], 'EURO V': ['3906', '3908', '6200', '6201', '6203', '6205', '4046', '4048', '4050', '4051', '4054', '4055', '4056', '4057', '4058', '4059', '4013', '4014', '4060', '4061', '4062', '4063', '4064', '4065', '4066', '4224', '4219', '4220

### Task 6
---
Find all the polluting buses that were running when the data was collected.   
Using the `bus_journeys` dictionary, find all the records where a Euro III bus was used. 

You can find the `Refs` which are polluting from the dictionary you created in the last task. 

* Create a new list of dictionaries which only contains the records from `bus_journeys` which were found as polluting bus. 
* how many polluting buses were being used?


In [44]:
from collections import defaultdict
def find_polluting_buses(key):

 sorted_dict=defaultdict(list)
 for group,vehicle in zip(emission_standards,vehicle_refs):
   sorted_dict[group].append(vehicle)  
 reference_num=sorted_dict[key]
 #print("length_emission_class",key,len(reference_num))  
 polluting_bus=[]
 for bus_journey in bus_journeys:
  if bus_journey['Ref'] in reference_num:
   if bus_journey not in polluting_bus:  
    polluting_bus.append(bus_journey)
 print("length_emission_class_bus_journeys",key,len(polluting_bus))   
 return(len(polluting_bus))
 



#Test for find_polluting_buses(key)
expected=len(bus_journeys)
EURO_III=find_polluting_buses('EURO III') 
EURO_IV=find_polluting_buses('EURO IV') 
EURO_V=find_polluting_buses('EURO V') 
EURO_VI=find_polluting_buses('EURO VI') 
actual=EURO_III+EURO_IV+EURO_V+EURO_VI
if actual == expected:
  print("Test passed!\nExpected: {}\nActual: {}".format(expected, actual))
else:
  print("Test failed!\nExpected: {}\nActual: {}".format(expected, actual))
  print("Reason for failing : IN THE 'bus_journeys' records 3 buses 'Ref' is missing.so total length is 137,Ref length is 134")

length_emission_class_bus_journeys EURO III 30
length_emission_class_bus_journeys EURO IV 15
length_emission_class_bus_journeys EURO V 29
length_emission_class_bus_journeys EURO VI 60
Test failed!
Expected: 137
Actual: 134
Reason for failing : IN THE 'bus_journeys' records 3 buses 'Ref' is missing.so total length is 137,Ref length is 134


# Challenge

Can we find out how much pollution one bus on the 116 route emits?

Can we find out how much pollution one bus on the 132 route emits?

Can we find out how much pollution, in total, all the buses on these routes at the recorded point in time will be emitting?

**Some numbers to play with:**  
NOTE: These are NOT fact checked but give a rough idea of some numbers we might be able to use for a rough first model

*  A typical old diesel bus will typically get 5 miles per gallon, which is 2.126km per litre (divide mpg by 2.352)
*  One litre of diesel fuel has the energy content of 10.8 kWh
*  If the bus's fuel consumption is 2.126km per litre, this gives an energy content of 5.08 kWh/km (divide 10.8 by fuel consumption)

Emissions data is in this variable: ***emissions_data***

**Emissions Data dictionary**

Field------------------------------Data Type------------Description  
Emission Standard-------Alphanumeric------Euro III, IV, V or VI	
CO2-------------------------------Float--------------------grams of CO2 emitted per KWhr  
Nox-------------------------------Float--------------------grams of Nox emitted per KWhr  
PM--------------------------------Float--------------------grams of particulate matter emitted per gm/KWhr  
			
**Route information**  
The 132 route is 12.5km from end to end  
The 116 route is 15.25km  

####**Task**  

Write a function that takes the miles per gallon and the route (LineRef) as a parameter and calculates the emission of each of the 3 pollutants for a return journey on that route.

####**Extension**  

Find all the 116 and 132 buses in the data set (a snapshot of what is on the road at that particular point in time).  

Count how many of these buses are Euro III.  

Then calculate the total emissions for each pollutant for all the buses you have found.

In [94]:
def find_pollution_onebus_emits(bus_route,route_length):

  total_fuel_consumption=round(((route_length)/2.126),2)
  total_energy_content=round((total_fuel_consumption*10.8),2)
  CO2_emission=round((total_energy_content*2.1),2)
  Nox_emission=round((total_energy_content*5),2)
  PM_emission=round((total_energy_content*0.1),2)
  return(CO2_emission, Nox_emission,PM_emission)
  
find_pollution_onebus_emits(116,15.25)
find_pollution_onebus_emits(132,12.5)


#Test for find_pollution_onebus_emits(116,15.25)
expected=(162.62,387.2,7.74)
actual=find_pollution_onebus_emits(116,15.25)
if actual == expected:
  print("Test passed!\nExpected: {}\nActual: {}".format(expected, actual))
else:
  print("Test failed!\nExpected: {}\nActual: {}".format(expected, actual))

# #Test for find_pollution_onebus_emits(132,12.5)
expected=(133.35,317.5,6.35)
actual=find_pollution_onebus_emits(132,12.5)
if actual == expected:
  print("Test passed!\nExpected: {}\nActual: {}".format(expected, actual))
else:
  print("Test failed!\nExpected: {}\nActual: {}".format(expected, actual))


Test passed!
Expected: (162.62, 387.2, 7.74)
Actual: (162.62, 387.2, 7.74)
Test passed!
Expected: (133.35, 317.5, 6.35)
Actual: (133.35, 317.5, 6.35)


In [100]:
from collections import defaultdict
def find_euro3_lineref(bus_route):    

  new_Lineref=[]
  for bus_journey in bus_journeys:
    if bus_journey['LineRef']== bus_route:
     if bus_journey not in new_Lineref:
      new_Lineref.append(bus_journey)
  print("Number of records for ",bus_route," in bus_journeys :",len(new_Lineref))
  sorted_dict=defaultdict(list)
  for group,vehicle in zip(emission_standards,vehicle_refs):
    sorted_dict[group].append(vehicle)  
  reference_num=sorted_dict['EURO III']  
  polluting_bus=[] 
  for Lineref in new_Lineref:
   if Lineref['Ref'] in reference_num:
    if Lineref not in polluting_bus:  
     polluting_bus.append(Lineref)
  print("bus_route ",bus_route,"has",len(polluting_bus),"record in 'EURO III'")   
  print(polluting_bus,bus_route)


  pollution1=find_pollution_onebus_emits(132,12.5)
  pollution2=find_pollution_onebus_emits(116,15.25)
  total_pollution=((pollution1[0]+pollution2[0]),(pollution1[1]+pollution2[1]),(pollution1[2]+pollution2[2]))
  print("TOTAL POLLUTION : ",total_pollution)

find_euro3_lineref('132')
find_euro3_lineref('116')     

Number of records for  132  in bus_journeys : 9
bus_route  132 has 1 record in 'EURO III'
[{'LineRef': '132', 'DirectionRef': 'inbound', 'PublishedLineName': '132', 'OriginName': 'Waterfront_Bus_Station', 'DestinationName': 'Hempstead_Valley_Shopping_Centre', 'OriginAimedDepartureTime': '2022-09-07T12:06:00+00:00', 'Ref': '6411'}] 132
TOTAL POLLUTION :  (295.97, 704.7, 14.09)
Number of records for  116  in bus_journeys : 5
bus_route  116 has 1 record in 'EURO III'
[{'LineRef': '116', 'DirectionRef': 'outbound', 'PublishedLineName': '116', 'OriginName': 'Waterfront_Bus_Station', 'DestinationName': 'Hempstead_Valley_Shopping_Centre', 'OriginAimedDepartureTime': '2022-09-07T11:12:00+00:00', 'Ref': '6430'}] 116
TOTAL POLLUTION :  (295.97, 704.7, 14.09)


# Summary

Having completed the challenges.  What new information do we know?  What more might we try to find?

Add some ideas in the box below (double click to open it)

In task6,the Ref is missing in bus_journeys record.we have to investigate this one.In extension ,I have added the tuples by it's indexes.Have to check for other ways of doing this.