### Solid Waste Engineering 2018
#### Semester project: Testing the probability of garbage

Masters students in Environmental Engineering from the École Polytechnique Féderale de Lausanne test the hypothesis that litter densities on Lac Léman are predicatable. The current method is based on the Probability Density Function derived from the logarithm of the pieces/meter of trash (pcs/m) from over 100 samples.

This is a refresher of the basic skills needed to transform the data but not an introduction

References

1. Python for data analysis
2. Think stats: exploratory data analysis
3. [https://pandas.pydata,org](https://pandas.pydata,org)

#### Using a notebook:

1. "shift" + "enter" is how you enter the data from one cell
2. Remember to use the dropdown-menu "Cell" => "Run All" at the very begining
3. how to use a notebook [offcial doc](https://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Notebook%20Basics.html)

In [2]:
# import what you need

In [3]:
import pandas as pd #<----- methods for handling data similar to "R"
import numpy as np #<------ library of standard mathematical methods
import matplotlib #<------- library for creating graphs an plots
import matplotlib.pyplot as plt 
import re #<-------- REGEX
import os #<------ library to use basic operating system commands
from scipy.stats import norm #<-------- the "norm" method will be used quite a bit so get it by name 
import scipy.stats #<----------- for "almost all" of our statistical computing needs
import statsmodels.api as sm #<-------- whats not in scipy.stats is in here
import requests #<------- getting data from the "internet" - http protocols etc..
import json #<----- most of the data from the API is in JSON format

### Getting data from the API

In [4]:
# all the data is available through the api @ https://mwshovel.pythonanywhere.com/dirt/api_home.html
# the data is in JSON format => {property:value, property-two:value-two} like a a python dicitionary
# the first thing to do is indentify the URL that will give the right data set

url = "http://mwshovel.pythonanywhere.com/dirt/daily-total/Lac-Léman/?format=json"#<--- at the end of the url we insist on JSON


In [5]:
#  with the url in hand the data can be requested
# in the prior cell a variable is created that has the value of the desired URL, nothing has happened yet
# use the requests library to "get" the URL and identify the data type .json()
# more about the requests library: http://docs.python-requests.org/en/master/
# more about the json() library: https://docs.python.org/3/library/json.html

data = requests.get(url).json()

In [6]:
# data is an array of dictionaries
# the form is : [{dictionary-one}, {dictionary-two}, ... {dictionary-n}]
# call the first record

data[0]

{'location': 'Baye-de-Montreux-G',
 'date': '2015-11-23',
 'length': 61,
 'total': 349}

### Arrays and dictionaries

#### Dictionaries:

In [7]:
# basics on python dictionaries:https://docs.python.org/3.7/tutorial/datastructures.html#dictionaries
aDict = data[0]

# Question: What information is stored in the aDict?
aDictKeys = aDict.keys()
# a dictionary is a collection of matched pairs in this case: "location:location-name, date:date-of-survey etc..."

# calling:
aDictKeys
# gives the list of 'keys' or 'property-names' of the dictionary

dict_keys(['location', 'date', 'length', 'total'])

In [8]:
# notice that aDictKeys or the .keys() method does not return an array
type(aDictKeys)

dict_keys

In [9]:
# which means you can't index the values like this
# aDictKeys[0]
# it will throw a "TypeError: 'dict_keys' object does not support indexing"
# so if you need to use the keys as variables you need to get them another way
# or turn that into an array

In [10]:
# like this
aListDictKeys = list(aDictKeys)
aListDictKeys

['location', 'date', 'length', 'total']

#### Using dicitionaries

This is not a basic course, the idea is to show the likeley use case in the current application (probability of garbage) and to refresh everybodies memory!

In [11]:
# extract all the results for one location from data:
# couple of ways to do that, first identify a location: "Veveyse"
# veveyse = {"Veveyse":[]}
results = [(x['date'],x['total'],x['length']) for x in data if x['location'] == 'Veveyse']
Veveyse = {"Veveyse":results}
Veveyse

{'Veveyse': [('2015-11-27', 216, 53),
  ('2015-12-01', 52, 53),
  ('2015-12-07', 193, 53),
  ('2015-12-14', 129, 53),
  ('2016-01-08', 147, 53),
  ('2016-01-15', 145, 53),
  ('2016-01-21', 144, 53),
  ('2016-02-09', 126, 53),
  ('2016-03-11', 245, 53),
  ('2016-04-02', 243, 53),
  ('2016-04-12', 172, 53),
  ('2016-04-19', 248, 53),
  ('2016-06-17', 285, 53),
  ('2016-11-14', 211, 62),
  ('2016-11-28', 303, 47),
  ('2016-12-05', 285, 47),
  ('2017-01-05', 292, 43)]}

In [12]:
# so from there we can do same for all the data in our variable:
# first get all the location names into a dictionary:
locationData = {x['location']:[] for x in data}
# get all the location names in a list
locationKeys = list(locationData.keys())
# write a simple for loop to capture all that:

def makeLocationData():
    for a in locationKeys:
        results = [(x['date'],x['total'],x['length']) for x in data if x['location'] == a ]
        locationData.update({a:results})

makeLocationData()

In [13]:
# so now you can call the location data like this:
locationData['Baye-de-Montreux-G']

[('2015-11-23', 349, 61),
 ('2015-12-04', 511, 61),
 ('2015-12-10', 308, 61),
 ('2015-12-17', 257, 61),
 ('2015-12-30', 358, 61),
 ('2016-01-13', 388, 61),
 ('2016-01-19', 75, 61),
 ('2016-02-11', 114, 61),
 ('2016-03-10', 188, 61),
 ('2016-06-22', 220, 61),
 ('2016-07-15', 330, 61),
 ('2016-12-16', 384, 31)]

### Your turn:

1. Use the list "locationKeys" to call some other values out of the dictionary "locationData"
2. Get data from Zurichsee use this url "https://mwshovel.pythonanywhere.com/dirt/daily-total/Zurichsee/?format=json"
3. Make a dictionary like locationData
4. Call a few values

In [14]:
# getting values from locationKeys
locationKeys[2]
# remember the value of locationKeys will change when you use the data from Zurichsee 

'Grand-Clos'