# Introduction

This notebooks contains a basic primer in Python and associated computational tools.

## What is JupyterLab?

It is very powerful multi-purpose web-based computational environment, particularly suited for Jupyter computational notebooks.

A  Jupyter Notebook is a format and a tool to work and experiment with computational tools and libraries. It is composed of editable cell that can contain live code, equations, visualizations and narrative text.

A cell can be edited by clicking in it and then executed by pressing Shift+Enter.

## What is Python?

Python is a general purpose interpreted computer language.

### It's can be used as calculator right away

In [None]:
1000 / 25

Standard operators can be used:
- Addition and substraction (+ and -)
- Product and division (* and /)
- Exponentiation (**) and remainder operation (%)

Similar to standard notation, parenthesis indicate operation precedence.

In [None]:
(2 + 7 ) * 4 - 7

### But also a fully-featured programming language

As tradition mandates, we can start with a [Hello World program](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program):

In [None]:
print("Hello World!")

Some basic features and syntax:
- indexation matters
- print(something) writes text to output
- lines that start with # are comments

In Jupyter notebooks the value of the last line in the cell is automatically printed.

### Variable assigment

In [None]:
# variables can be assiged to a name with the equal sign
current_bitcoin_price = 39860.46
number_of_students = 15

print(current_bitcoin_price/number_of_students)

### Basic Types

In [None]:
# boolean
is_important = True

In [None]:
# integer and floating point
integer_a = 10
float_b = 2.71

In [None]:
# strings
string_c = "this is a string"

Rules are very similar than for any other modern programming language.

In [None]:
# logical operator work as expected
integer_a > 20

In [None]:
# strings have some useful functionality built-in
print("The string has a length of " + str(len(string_c)) + " characters")
print("In uppercase, it would be written as " + string_c.upper())

### Data Structures

Most important data structures in Python are lists and dictionaries.

In [None]:
# list store multiple items in a single variable
fruit_list = ["apple", "banana", "cherry"]

In [None]:
# you can store a mix of anything as a list in Python
weird_list = [28, "Male", 1.93]

In [None]:
# list has some useful built-in funcionality
fruit_list.append("apricot")
fruit_list.sort(reverse=True)
fruit_list

In [None]:
# we can also use indexes to access certain values
# starts with 0
fruit_list[1]

In [None]:
# on the other hand dictonaries are used to store data by key
# they are usually defined like this
car_dictionary = {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}

In [None]:
# they can be assigned by key
car_dictionary["color"] = "red"
car_dictionary["plate"] = "2495GKS"
car_dictionary

In [None]:
# also accesed by key
car_dictionary["model"]

### Control Structures and Functions

Addional power comes from the use of control structures and funcions to create composable and general programs.

Python implements the usual control statements (e.g. if, if..else, for loop and while loop) and also the ability to define funcions.


In [None]:
price =  200

if price < 0.01:
    print("It's too cheap")
elif price > 200:
    print("It's too expensive")
else:
    print("It's OK")

In [None]:
# for loops iterate through whatever you use
capitalized_fruit_list = []
for fruit in fruit_list:
    print(fruit[0].upper()+fruit[1:])
    capitalized_fruit_list.append(fruit)

In [None]:
# this is a function
def is_old_car(car_dictionary):
    
    if car_dictionary["year"] < 1992:
        return True
    else:
        return False

In [None]:
# data structures and functions can be combined to create very powerful abstractions
other_car_dictionary = {
  "brand": "Ford",
  "model": "Mustang",
  "year": 14
}

is_old_car(other_car_dictionary)

### Libraries

Part of the popularity of Python comes from having a large number that support a large number of use cases in different domains (data analytics, text processing, web development, etc).

Part of this libraries come include with Python while others are developed and mantained outside.

In [None]:
# this is how you import library, in this case comes from standard python
import math

# calculated the tangent
math.tan(2)

In [None]:
# another standard library to generate random numbers
import random

# will generate a random float number
random.random()

In [None]:
# here are for example some commonly used libraries for data analysis
import numpy as np
import matplotlib.pyplot as plt

# this calculates 100 equally distributed number from 0 to 10
x = np.linspace(0,10,100)
# we can calculate the sin of those numbers
y = np.sin(x)

# and plot it
plt.plot(x, y);

## Example Real Data Application

We are gonna be doing a basic data analysis of waste containers in the city of Santander, our final goal is gonna be telling the people in charge of picking up the containers which containers should pick up today and which not.

We are gonna be using data from [Santander city open data](http://datos.santander.es/).

In [None]:
# this is useful library to make https requests
import requests

In [None]:
# this asks for the data
response = requests.get("https://datos.santander.es/rest/datasets/residuos_contenedores.json")
response

In [None]:
# example response
print(response.text[0:1000], "...")

In [None]:
# we can convert the response to json and read some of the attributes
response_summary = response.json()["summary"]
print(response_summary)
total_pages = response_summary["pages"]
items = response_summary["items"]

In [None]:
# because this is a paged API we need to do several request to get all data

all_resources = []

for page_index in range(total_pages):
    page_number = page_index + 1
    response = requests.get("https://datos.santander.es/rest/datasets/residuos_contenedores.json",
                            params={"page": page_number})
    all_resources += response.json()["resources"]


In [None]:
print("A total of "+ str(len(all_resources)) + " containers found")
all_resources[0:3]

In [None]:
# we are gonna use on of the most used libraries for this
import pandas as pd

In [None]:
# the basic abstraccion of the library is a DataFrame that is like a table
data_frame = pd.DataFrame(all_resources)
data_frame

In [None]:
# this provides some info aboud the data loaded
data_frame.info()

In [None]:
# convert numeric columns manually to numeric types
data_frame["ayto:nivelLlenado"] = pd.to_numeric(data_frame["ayto:nivelLlenado"])
data_frame["ayto:temperatura"] = pd.to_numeric(data_frame["ayto:temperatura"])
data_frame["ayto:capacidad"] = pd.to_numeric(data_frame["ayto:capacidad"])
# also for latitude and longitude
data_frame["ayto:latitud"] = pd.to_numeric(data_frame["ayto:latitud"])
data_frame["ayto:longitud"] = pd.to_numeric(data_frame["ayto:longitud"])
# convert datetime columns to datetime times
data_frame["dc:modified"] = pd.to_datetime(data_frame["dc:modified"])
data_frame["ayto:fechaAlta"] = pd.to_datetime(data_frame["ayto:fechaAlta"])


In [None]:
# has a lot of useful functionality built-in
# such as computing the max or the min
data_frame["ayto:temperatura"].max()

In [None]:
# or the mean
data_frame["ayto:nivelLlenado"].mean()

In [None]:
# we can also sort the table by any key
data_frame.sort_values("dc:modified", ascending=False)

In [None]:
# we can also group by certain vairiables and compute some properties
data_frame.groupby("ayto:residuo").describe()["ayto:nivelLlenado"]

In [None]:
# we can also create useful visualizations very simply
data_frame.hist(["ayto:nivelLlenado","ayto:temperatura"], figsize=(10,5))

In [None]:
data_frame.groupby(["ayto:residuo","ayto:capacidad"]).count()["dc:identifier"].plot.bar()

In [None]:
# we can filter using simple expressions
is_envases = data_frame["ayto:residuo"] == "Envases"
envases_data_frame = data_frame.loc[is_envases]

In [None]:
# then further filter for example for those that are filled more than 90%
filled_envases = envases_data_frame["ayto:nivelLlenado"] > 90
filled_envases_data_frame = envases_data_frame.loc[filled_envases]
filled_envases_data_frame

In [None]:
# there are many more visualations and analysis posible
# another useful thing is visualizing in a map
# we can use for example the folium library
import folium



# this is an empty map centered around santander
location = (43.471198, -3.801362)
m = folium.Map(location=location,
               zoom_start=13)
m

In [None]:
# we can create another maps with the location of the filled envases containers

# this is an empty map centered around santander
location = (43.471198, -3.801362)
m = folium.Map(location=location,
               zoom_start=13)

for idx, filled_envase in filled_envases_data_frame.iterrows():
    latitude_longitude = (filled_envase["ayto:latitud"],filled_envase["ayto:longitud"])
    fill_level = filled_envase["ayto:nivelLlenado"]
    address = filled_envase["ayto:direccion"]
    text = f"{address} \n Nivel Llenado: {fill_level:.1f}"
    marker = folium.Marker(latitude_longitude, popup=text)
    marker.add_to(m)
    
m

### Exercise: Which Paper Containers Should be Picked up today?

For example let us assume we want to pick all containers over con un **ayto:nivelLlenado** mayor de 95. Produce a map with all containers that should be picked up.

In [None]:
# write here solutions

In [None]:
# several cells can be used