Il caricamento delle librerie (consigliato in cima ad un notebook), 
permette di estendere le funzionalità di python su temi specifici quali l'accesso ai dati (pandas) e visualizzazione (seaborn)

In [None]:
import pandas as pd
import seaborn as sns

Queste sono alcune magic function, le magic function permettono di semplificare l'interazione

https://ipython.readthedocs.io/en/stable/interactive/magics.html

In [None]:
%matplotlib inline
sns.set()
sns.set_context('talk')
# SQL
%load_ext sql

Carichiamo i dati dal file

In [None]:
dmp_data = pd.read_csv('datasets/DMP_PAID_EXAMPLE_FINAL.csv', sep=";")

Visualizziamo i primi 10

In [None]:
dmp_data.head()

Controlliamo i tipi

In [None]:
dmp_data.info()

Rimuoviamo le colonne inutili

In [None]:
dmp_data.drop(['ID_PAID', 'DATA'], axis=1, inplace=True)

Plot pairwise relationships in a dataset. 

In [None]:
sns.pairplot(dmp_data)

Statistiche di base

In [None]:
dmp_data.describe()

Calcoliamo la media delle azioni per genere

In [None]:
dmp_data[["ID_SEX", "NUM"]].groupby("ID_SEX").mean()

In [None]:
dmp_data.corr()

# Database

![](images/upadb.png)

Connessione al DB

In [None]:
%sql sqlite:///datasets/upa.db

Struttura DB (Sqlite Specifico)

In [None]:
%sql SELECT * FROM sqlite_master WHERE type='table'

Primi 10 elementi

In [None]:
%%sql
select * from user limit 10

In [None]:
%%sql
select uid, cookieid, sex, eta 
FROM user
where city='PD'


  # Combine

In [None]:
users = %sql select uid, sex,eta from user

In [None]:
users = users.DataFrame()

Create new columns with categorical values

In [None]:
users['gender']=users['SEX'].astype('category')
users['age-range']=users['ETA'].astype('category')

In [None]:
users

In [None]:
users.info()

In [None]:
df=users[['gender','age-range']]
df

In [None]:
sns.countplot(hue='age-range',x='gender',data=df)

## User per Geo
Credit https://opensource.com/article/20/4/python-map-covid-19

In [None]:
%%sql geo <<
select country, region, city, sum(data) num
from user u
join action a on u.uid=a.uid
group by country, region, city


In [None]:
geo=geo.DataFrame()

In [None]:
geo.info()

In [None]:
geo.head()

Plotly ha alcune geometrie che necessitano codici ISO a 3 cifre, noi abbiamo due cifre serve fare un mapping,
usiamo pycountry

In [None]:
import pycountry

In [None]:
list_countries = geo['COUNTRY'].unique().tolist()
d_country_code = {}  # To hold the country names and their ISO
for country in list_countries:
    try:
        country_data = pycountry.countries.search_fuzzy(country)
        # country_data is a list of objects of class pycountry.db.Country
        # The first item  ie at index 0 of list is best fit
        # object of class Country have an alpha_3 attribute
        country_code = country_data[0].alpha_3
        d_country_code.update({country: country_code})
    except:
        print('could not add ISO 3 code for ->', country)
        # If could not find country, make ISO code ' '
        d_country_code.update({country: ' '})

for k, v in d_country_code.items():
    geo.loc[(geo.COUNTRY == k), 'iso_alpha'] = v

In [None]:
geo

In [None]:
import plotly.express as px

In [None]:
px.choropleth(data_frame=geo,locations='iso_alpha', color='num',hover_name= "COUNTRY")