<a href="https://colab.research.google.com/github/futureCodersSE/python-programming-for-data/blob/main/Projects/Bus_data_dicts_lists_challenges.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bus emissions challenge 
---


### Introduction

Kent and Medway have the highest proportion of old buses in the country (~40% of fleet). Old buses are detrimental to the environment as the older buses only have Euro III emissions standards which if used for lots of 
journeys will be dramatically impacting the air quality of the area. 

The client therefore would like us to find out some information which could then be used as evidence to make a case  to improve the bus emissions in the Kent and Medway area.

The datasets we will be using are pubically available. Gov.uk provides data on all bus journeys in the UK and when used in conjunction with Arriva buses fleet emissions data (available from bustimes.org, download [here](https://drive.google.com/uc?export=download&id=1ywtiSwR27JYCC5Sf9G1ZCTOTWNxWBk9_ )) we can build a pretty good 
picture of how many of these old buses are being used for bus journeys in Kent and Medway.

The gov.uk bus data is available in XML format via an api. The data refreshes every 10 seconds so each time you download it, it will show you a snapshot of the buses currently in operation at that time. We have downloaded this 
data and converted it to JSON format accessible to download [here](https://drive.google.com/uc?export=download&id=1a9vMs0Kke7Nh4LuxCnKHkVIkFDr-az_Z)





### Load the data
---
#### **Please run the cell below to load the data required for this challenge.**  
The following code will read both the json file and the bus emissions csv file and create a dictionary (`bus_journeys`) and 2 lists (`vehicle_refs`, `emissions`).


In [None]:
import pandas as pd
import json
import urllib.request

url_json = "https://drive.google.com/uc?export=download&id=1a9vMs0Kke7Nh4LuxCnKHkVIkFDr-az_Z"
csv = "https://drive.google.com/uc?export=download&id=1ywtiSwR27JYCC5Sf9G1ZCTOTWNxWBk9_"

def get_saved_data(url_json):
    if url_json is not None:
        try:
            with urllib.request.urlopen(url_json) as url:
                data = json.loads(url.read().decode())
                return data
        except:
            print("An error occurred while reading the file")


def get_dicts_lists():
  df = pd.json_normalize(get_saved_data(url_json))
  regs = pd.read_csv(csv)

  bus = df[['MonitoredVehicleJourney.LineRef','MonitoredVehicleJourney.DirectionRef','MonitoredVehicleJourney.PublishedLineName','MonitoredVehicleJourney.OriginName','MonitoredVehicleJourney.DestinationName','MonitoredVehicleJourney.OriginAimedDepartureTime','MonitoredVehicleJourney.VehicleRef']]
  bus.columns = bus.columns.str.lstrip("MonitoredVehicleJourney.")
  bus_journeys = bus.to_dict('records')
  
  regs.rename({'Last tracked': 'VehicleRef'}, axis=1 , inplace=True)
  vehiclerefs = regs['VehicleRef'].to_list()
  emissions = regs['Emission Class'].to_list()
  return bus_journeys, vehiclerefs, emissions



bus_journeys, vehicle_refs, emissions = get_dicts_lists()


### Task 1
---


Take a look at the `bus_journeys` dictionary
* How is it structured? 
* Print the first and last record



**Expected Output**   
first record will have `LineRef` 176  
last record will have `LineRef` 347  

### Task 2 
---
Take a look at the `vehicle_refs` and `emissions` lists
* what is the length of each list?
* find how many unique items there are in the emissions list - (*hint: you will need to create another list and use a for loop*) 
* print the unique emissons items 

### Task 3 
---
The client is only concerned about bus routes 116 and 132 specifically.
Create a new list of dictionaries which contains only the records where the `LineRef` is either 116 or 132. 

*hint: the datatype of the LineRef might not be what you expect*

### Task 4 
---

The indexes of `vehicle_refs` match the indexes of `emissions`.   
Create a dictionary where each vehicle ref is a key and its corresponding emission is the value

*hint: you will need to use a for loop and indexing*

### Task 5 
--- 

The dictionary you created in the last exercise is very long. A more intuitive way to hold this data would be through lists and dictionaries. 

Create a dictionary where each unique emission is a key and the corresponding value is a list of all corresponding vehicle refs 

**Example Output**

{"EURO III": [1234, 4567, 8910], "EURO IV": [1028, 1283, 1234]}

### Task 6
---
Find all the polluting buses that were running when the data was collected.   
Using the `bus_journeys` dictionary, find all the records where a Euro III bus was used. 

You can find the `Refs` which are polluting from the dictionary you created in the last task. 

* Create a new list of dictionaries which only contains the records from `bus_journeys` which were found as polluting bus. 
* how many polluting buses were being used?
