# Analyzing

In this jupyter notebook the dataframe obtained from 'code2_cleaning' will be analized. The objectives are:

- Mosquito per day obtention.

- API (Application Programming Interface) use to get climate data. API is a server that you can use to retrieve and send data to using code. APIs are most commonly used to retrieve data.

- Explain some basic results.

In [6]:
# Data treatment
# ------------------------------------------------------------------------------
import numpy as np
import pandas as pd
from datetime import date, datetime
import holidays
import requests #To check if an API link works
from IPython.core.interactiveshell import InteractiveShell #Show more than one output per cell
InteractiveShell.ast_node_interactivity = "all"
import ast

# API for accessing open weather and climate data
# ------------------------------------------------------------------------------
from meteostat import Point, Daily

# Graphs
# ------------------------------------------------------------------------------
import matplotlib.pyplot as plt
import seaborn as sns

# Warnings configuration
# ------------------------------------------------------------------------------
import warnings
warnings.filterwarnings('once')

# Library to create pickle files.
# ------------------------------------------------------------------------------
import pickle
import os

# Progress bar
# # ------------------------------------------------------------------------------
from tqdm import tqdm



In [7]:
# To show all the columns of our dataframe.
pd.options.display.max_columns=None

In [8]:
# Create the first dataframe containing the data from the csv obtained from Gbif.
df_0 = pd.read_csv('../data/mosquito1_clean.csv', index_col=0)

# Check the first three rows to see how this dataframe looks like.
df_0.head(3)

Unnamed: 0,event_date,year,month,day,country_code,latitude,longitude,witness,issue
0,2022-11-04,2022,11,4,ES,41.51019,2.24589,Roger Eritja,CONTINENT_DERIVED_FROM_COORDINATES
1,2021-08-27,2021,8,27,IT,44.40289,8.98775,Karin Bakran-Lebl;Ana Klobucar;UNIROMA1;Roger ...,CONTINENT_DERIVED_FROM_COORDINATES
3,2022-08-11,2022,8,11,IT,41.70922,12.78512,UNIROMA1;Eleonora Longo;Francesco Severini;Rog...,CONTINENT_DERIVED_FROM_COORDINATES


In [79]:
# Set time period
start = datetime(2018, 1, 1)
end = datetime(2018, 12, 31)

# Get daily data
data = Daily('10637', start, end)
data = data.fetch()

# Check if API works properly. 
response = requests.get(url=url)
state_code = response.status_code
state_reason = response.reason

if state_code == 200:
    print('The petition has been properly done, status code:',state_code,'reason:',state_reason)
elif state_code == 402:
    print('The user could not be authorized, status code:', state_code,'reason:',state_reason)
elif state_code == 404:
    print('Something went wrong, the request was not found, status code:', state_code,'reason:',state_reason)
else:
    print('Something unexpected happened, status code:', state_code,'reason:',state_reason)


Something unexpected happened, status code: 401 reason: 


In [164]:
def get_weather(date,latitude,longitude):
    # Function to obtain past weather and climate data

    #API use:
    # Set time period
    start = pd.to_datetime(date, format='%Y-%m-%d')
    end = pd.to_datetime(date, format='%Y-%m-%d')

    # Create Point (Automatically select weather stations by geographic location)
    location = Point(latitude, longitude)

    # Get daily data
    data = Daily(location, start, end)
    data = data.fetch().reset_index()

    return data[['tavg', 'tmin', 'tmax', 'prcp', 'snow', 'wdir', 'wspd', 'wpgt','pres', 'tsun']]

| Column  | Description                                                                         | Type    |
|---------|-------------------------------------------------------------------------------------|---------|
| station | The Meteostat ID of the weather station (only if query refers to multiple stations) | String  |
| start   | The first year (YYYY) of the reference period                                       | Integer |
| end     | The last year (YYYY) of the reference period                                        | Integer |
| month   | The month, represented as an integer                                                | Integer |
| tavg    | The mean air temperature in °C                                                      | Float64 |
| tmin    | The mean minimum air temperature in °C                                              | Float64 |
| tmax    | The mean maximum air temperature in °C                                              | Float64 |
| prcp    | The mean monthly precipitation total in mm                                          | Float64 |
| wspd    | The mean wind speed in km/h                                                         | Float64 |
| pres    | The mean sea-level air pressure in hPa                                              | Float64 |
| tsun    | The mean sunshine total in minutes (m)                                              | Float64 |

In [165]:
test_1=get_weather('2022-11-04',41.51019,2.24589)
test_1

Unnamed: 0,tavg,tmin,tmax,prcp,snow,wdir,wspd,wpgt,pres,tsun
0,15.4,11.0,20.0,0.0,,271.0,19.5,42.6,1013.8,


In [142]:
test_1.columns

Index(['time', 'tavg', 'tmin', 'tmax', 'prcp', 'snow', 'wdir', 'wspd', 'wpgt',
       'pres', 'tsun'],
      dtype='object')

In [128]:
df_test=df_0.head(10)
print(df_test.shape)
df_test.head(3)

(10, 9)


Unnamed: 0,event_date,year,month,day,country_code,latitude,longitude,witness,issue
0,2022-11-04,2022,11,4,ES,41.51019,2.24589,Roger Eritja,CONTINENT_DERIVED_FROM_COORDINATES
1,2021-08-27,2021,8,27,IT,44.40289,8.98775,Karin Bakran-Lebl;Ana Klobucar;UNIROMA1;Roger ...,CONTINENT_DERIVED_FROM_COORDINATES
3,2022-08-11,2022,8,11,IT,41.70922,12.78512,UNIROMA1;Eleonora Longo;Francesco Severini;Rog...,CONTINENT_DERIVED_FROM_COORDINATES


In [161]:
# Obtain dataframe with weather data.
df_test_meteo=df_test.apply(lambda x:get_weather(x['event_date'],x['latitude'],x['longitude']),axis=1)
df_test_meteo

0        tavg  tmin  tmax  prcp  snow   wdir  wspd  ...
1        tavg  tmin  tmax  prcp  snow  wdir  wspd  w...
3        tavg  tmin  tmax  prcp  snow   wdir  wspd  ...
4        tavg  tmin  tmax  prcp  snow  wdir  wspd  w...
5        tavg  tmin  tmax  prcp  snow  wdir  wspd  w...
7        tavg  tmin  tmax  prcp  snow  wdir  wspd  w...
8     Empty DataFrame
Columns: [tavg, tmin, tmax, pr...
9        tavg  tmin  tmax  prcp  snow   wdir  wspd  ...
10       tavg  tmin  tmax  prcp  snow  wdir  wspd  w...
12       tavg  tmin  tmax  prcp  snow   wdir  wspd  ...
dtype: object

In [124]:
# Obtain dataframe with weather data.
df_past_meteo=df_0.apply(lambda x:get_weather(x['event_date'],x['latitude'],x['longitude']),axis=1,result_type='expand')
df_past_meteo



KeyboardInterrupt: 

In [None]:
df_0[['tavg','tmin','tmax','prcp','snow','wdir','wspd','wpgt','pres','tsun']]=df_0.apply(lambda x:get_weather(x['event_date'],x['latitude'],x['longitude']),axis=1,result_type='expand')
df_0.head(3)

In [None]:
# Function to apply a progress bar to any Pandas function that supports the apply() method.
tqdm.pandas()

# Obtain dataframe with weather data.
df_past_meteo=df_0.progress_apply(lambda x:get_weather(x['event_date'],x['latitude'],x['longitude']),axis=1,result_type='expand')
df_past_meteo.head(3)

In [None]:
Nuestro df contiene las siguientes columnas:

registro = (instant), es el índice
fecha = (dteday)
estacion = (season) Hay que cambiarla, algunas estaciones no coinciden con la fecha.
año = (year) También hay que cambiarla, está en 0 y 1 (2018,2019)
mes = (month)
festivo = (holiday) 0: laborales, 1:festivos, creemos que tomará festivos.
dia_semana = (weekday)
no_laboral = (workingday) Cuenta como 0 el laboral y el 1 el no laboral
clima = ('weathersit')
'temperatura' = (temp)
'sens_termica'= ('atemp')
humedad = (hum)
viento = (windspeed)
ocasionales = (casual)
registrados = (registred)
total = (cnt)

In [None]:
# Check for outliers in month using another method.
mean_month=df_1['month'].mean()
std_month=df_1['month'].std()
upper=mean_month + std_month
lower=mean_month - std_month
ucb=mean_month + std_month * 3
lcb=mean_month - std_month * 3

In [None]:
# Start the graph.
month_graph=sns.histplot(x=df_1['month'],kde=True)
graph.axvline(x=mean_month, c='red',label='mean')

# Plot its standard deviation.
graph.axvline(x=upper,c='green',label='std')
graph.axvline(x=lower, c='green')

# Plot its confidence interval to 99.7%
graph.axvline(x=lcb, c='orange',label='99 lower')
graph.axvline(x=ucb, c='orange',label='99 upper')

plt.legend()