# [**SDMX1**](https://pypi.org/project/sdmx1/)

- https://sebastianpaulo.net/blog/working-with-data-from-official-providers-a-brief-tour-of-pandasdmx/

sdmxl is the name used on PyPI (because sdmx was already taken, by a package that's unmaintained). 
So to install, e.g. pip install sdmx1. 
Once installed, use import sdmx instead of import pandasdmx

In [None]:
import sdmx # pip install sdmx1
istat = sdmx.Request("ISTAT")

Request class will be removed in v3.0; use Client(…)


In [2]:
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.1f' % x)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## Elenco DataFlow Istat

In [4]:
flow_msg = istat.dataflow() #We use sdmx to download the definitions for all data flows available from our chosen source. 

In [37]:
print(flow_msg.response.url) # see the URL that was queried 

https://esploradati.istat.it/SDMXWS/rest/dataflow/all/all/latest


In [38]:
print(flow_msg.response.headers) # see the response headers

{'Cache-Control': 'no-store,no-cache', 'Pragma': 'no-cache', 'Transfer-Encoding': 'chunked', 'Content-Type': 'application/vnd.sdmx.structure+xml; version=2.1; charset=utf-8', 'Accept-Ranges': 'values', 'Vary': 'Accept,Accept-Encoding', 'Server': 'Microsoft-IIS/10.0, IIS', 'Strict-Transport-Security': 'max-age=2592000, max-age=31536000; includeSubDomains; preload', 'api-supported-versions': '1, 2', 'X-XSS-Protection': '1; mode=block', 'X-Frame-Options': 'SAMEORIGIN', 'X-Content-Type-Options': 'nosniff', 'X-Powered-By': 'ASP.NET', 'Server-Timing': 'dtSInfo;desc="0", dtRpid;desc="1691568029"', 'Set-Cookie': 'dtCookie=v_4_srv_4_sn_48EA650A7A0EE04E912F847C879A98B9_perc_100000_ol_0_mul_1_app-3A521fadc1a45e0215_1; Path=/; Domain=.istat.it', 'Date': 'Tue, 18 Jun 2024 22:00:45 GMT'}


In [39]:
print(flow_msg) #All the content of the response—SDMX data and metadata objects—has been parsed and is accessible from flow_msg. Let’s find out what we have received

<sdmx.StructureMessage>
  <Header>
    id: 'IDREF260'
    prepared: '2024-06-19T00:00:44.872409+02:00'
    receiver: <Agency Unknown>
    sender: <Agency Unknown>
    source: 
    test: False
  response: <Response [200]>
  DataflowDefinition (3605): 101_1015 101_1015_DF_DCSP_COLTIVAZIONI_1 1...
  DataStructureDefinition (682): DCSP_COLTIVAZIONI DCSP_DOPIGP DCSP_MAC...


In [40]:
dataflows = sdmx.to_pandas(flow_msg.dataflow)
type(dataflows)

pandas.core.series.Series

In [41]:
print("The file contains", len(dataflows), "data flow definitions")

The file contains 3605 data flow definitions


In [42]:
dataflows.head() # sdmx.to_pandas(estat.dataflow).head()

101_1015                                                                      Crops
101_1015_DF_DCSP_COLTIVAZIONI_1                 Areas and production - overall data
101_1015_DF_DCSP_COLTIVAZIONI_10                                    Sowing forecast
101_1015_DF_DCSP_COLTIVAZIONI_2     Areas and production - overall data - provinces
101_1030                                          PDO, PGI and TSG quality products
dtype: object

In [43]:
dataflows = sdmx.to_pandas(flow_msg.dataflow) # converting metadata to pandas.Series
print(dataflows.head())

101_1015                                                                      Crops
101_1015_DF_DCSP_COLTIVAZIONI_1                 Areas and production - overall data
101_1015_DF_DCSP_COLTIVAZIONI_10                                    Sowing forecast
101_1015_DF_DCSP_COLTIVAZIONI_2     Areas and production - overall data - provinces
101_1030                                          PDO, PGI and TSG quality products
dtype: object


In [44]:
dataflows[dataflows.str.contains('Gross', case=False)] # True = case sensitive

121_393_DF_DCSC_TRAMAR_6               Goods, number and gross tonnage of vessel by i...
149_327                                Annual gross hours, net hours, holidays and ot...
149_327_DF_DCSC_RETRCONTR1O_1          Annual gross hours, net hours, holidays and ot...
155_374                                      Gross earnings - Enterprises with employees
155_374_DF_DCSC_RETRULAOROS_1_1        Gross earnings per full time equivalent unit (...
155_374_DF_DCSC_RETRULAOROS_1_2        Gross earnings per full time equivalent unit (...
155_374_DF_DCSC_RETRULAOROS_1_3        Gross earnings per full time equivalent unit (...
161_267_DF_DCSP_SBSNAZ_12                                        Gross operating surplus
161_267_DF_DCSP_SBSNAZ_9                                               Gross value added
163_156                                Gross domestic product, expenditure components...
163_156_DF_DCCN_SQCQ_1                 Gross domestic product and expenditure components
163_156_DF_DCCN_SQCQ_

## estrazione diretta

In [None]:
istat = sdmx.Client('Istat')
key = dict(FREQ="A", ITTER107="IT+ITC+ITD+ITE+ITF+ITG", TIPO_AGGR ='B1G_B_W2_S1_R_PS', VAL = 'V' , EDI ='2023M12')
params = dict(startPeriod='2010', endPeriod='2022')
data_msg = istat.data('93_500', key=key, params=params)
data = data_msg.data[0]
data_df = data_msg.to_pandas()
data_df2 = sdmx.to_pandas(data,
        datetime={'dim': 'TIME_PERIOD',
                'freq': 'FREQ',
                'axis': 1})
data_final = data_df2.xs(
    level=("TIPO_AGGR", "VAL", "EDI","ADJUSTMENT", "BRANCA_ATTIV_REV2","DCCN_COICOP_COFOG","PRODOTTI1","TIPPREZ"), # 
    key=("B1G_B_W2_S1_R_PS", "V", "2023M12","N","Z","Z","Z","Z"))        # 
data_final

In [None]:
istat = sdmx.Client('Istat')
key = dict(FREQ="A", ITTER107="IT", VAL = 'V' , EDI ='2023M12')
params = dict(startPeriod='2010', endPeriod='2023')
data_msg = istat.data('93_500', key=key, params=params)
data = data_msg.data[0]
data_df = data_msg.to_pandas()
data_df2 = sdmx.to_pandas(data,
        datetime={'dim': 'TIME_PERIOD',
                'freq': 'FREQ',
                'axis': 1})

data_final = data_df2.xs(
    level=("VAL", "EDI","BRANCA_ATTIV_REV2","ADJUSTMENT", "DCCN_COICOP_COFOG","PRODOTTI1","TIPPREZ"),
    key=("V", "2023M12","Z","N","Z","Z","Z"))        
data_final

In [None]:
data_df2 = sdmx.to_pandas(
        data,
        datetime={
                'dim': 'TIME_PERIOD',
                'freq': 'FREQ',
                'axis': 1
            }
    )
data_df2

## 93_500 DataFlow ID - Principali aggregati territoriali di Contabilità Nazionale

- P3_D_W2_S1_R_POP	consumi finali interni per abitante
- B1GQ_B_W2_S1_R_POP	prodotto interno lordo ai prezzi di mercato per abitante
- D1_D_W2_S1_R_PS	redditi interni da lavoro dipendente per occupato dipendente
- D1_D_W2_S1_R_HW	redditi interni da lavoro dipendente per ora lavorata da occupato dipendente
- D1_D_W2_S1_R_FT	redditi interni da lavoro dipendente per unità di lavoro dipendente
- B6G_B_W2_S14A_R_POP	reddito disponibile delle famiglie consumarici per abitante
- D11_D_W2_S1_R_PS	retribuzioni interne lorde per occupato dipendente
- D11_D_W2_S1_R_HW	retribuzioni interne lorde per ora lavorata da occupato dipendente
- D11_D_W2_S1_R_FT	retribuzioni interne lorde per unità di lavoro dipendente
- B1G_B_W2_S1_R_POP	valore aggiunto per abitante
- **B1G_B_W2_S1_R_PS	valore aggiunto per occupato**
- B1G_B_W2_S1_R_HW	valore aggiunto per ora lavorata
- B1G_B_W2_S1_R_FT	valore aggiunto per unità di lavoro


- creiamo un Client che utilizzeremo per effettuare più query al servizio web SDMX-REST di questo provider:
- scarichiamo un messaggio di struttura contenente il DSD e altre informazioni strutturali a cui fa riferimento. Questi includono metadati strutturali che insieme descrivono completamente i dati disponibili attraverso questo flusso di dati: concetti, elementi misurati, dimensioni, elenchi di codici utilizzati per etichettare ciascuna dimensione, attributi e così via

In [48]:
istat = sdmx.Client("ISTAT")
flow_msg  = istat.dataflow('93_500')
dfd = flow_msg.dataflow['93_500']
dfd

<DataflowDefinition IT1:93_500(1.0): National Accounts regional main aggregates>

In [49]:
flow_msg

<sdmx.StructureMessage>
  <Header>
    id: 'IDREF262'
    prepared: '2024-06-19T00:02:27.498747+02:00'
    receiver: <Agency Unknown>
    sender: <Agency Unknown>
    source: 
    test: False
  response: <Response [200]>
  Categorisation (1): CAT_93_500_99931
  CategoryScheme (1): SEP
  Codelist (16): CL_BASE_YEAR CL_BRANCA_ATTIVITAREV2 CL_CONTAB_NOTE CL_...
  ConceptScheme (2): CROSS_DOMAIN CONTAB_DOMAIN
  DataflowDefinition (1): 93_500
  DataStructureDefinition (1): DCCN_TNA

In [50]:
flow_msg.structure 

{'DCCN_TNA': <DataStructureDefinition IT1:DCCN_TNA(1.0): Principali aggregati territoriali di Contabilità Nazionale>}

In [51]:
dfd.structure 

<DataStructureDefinition IT1:DCCN_TNA(1.0): Principali aggregati territoriali di Contabilità Nazionale>

In [52]:
flow_msg.structure 

{'DCCN_TNA': <DataStructureDefinition IT1:DCCN_TNA(1.0): Principali aggregati territoriali di Contabilità Nazionale>}

In [53]:
dsd = flow_msg.structure 
print(dsd) # Show the data structure definition referred to by the data flow

{'DCCN_TNA': <DataStructureDefinition IT1:DCCN_TNA(1.0): Principali aggregati territoriali di Contabilità Nazionale>}


In [54]:
flow_msg.codelist.CL_BRANCA_ATTIVITAREV2.items

{'V': <Code V: total economic activities>,
 'VNM': <Code VNM: non market economic activities>,
 'VA': <Code VA: agriculture, forestry and fishing>,
 'V01_02': <Code V01_02: crop and animal production, hunting and related service activities, forestry>,
 'V01': <Code V01: crop and animal production, hunting and related service activities>,
 'V02': <Code V02: forestry and logging>,
 'V03': <Code V03: fishing and aquaculture>,
 'VB_F': <Code VB_F: mining and quarrying, manufacturing, electricity, gas, steam and air conditioning supply, water supply, sewerage, waste management and remediation activities, construction>,
 'VB_E': <Code VB_E: mining and quarrying, manufacturing, electricity, gas, steam and air conditioning supply, water supply, sewerage, waste management and remediation activities>,
 'VBEVCDEVDEVE': <Code VBEVCDEVDEVE: mining and quarrying, manufacture of coke and refined petroleum products, electricity, gas, steam and air conditioning supply, water supply, sewerage, waste man

In [55]:
flow_msg.codelist.CL_BASE_YEAR.items

{'0000': <Code 0000: unspecified>,
 '1980': <Code 1980: year 1980>,
 '1981': <Code 1981: year 1981>,
 '1982': <Code 1982: year 1982>,
 '1989': <Code 1989: year 1989>,
 '1990': <Code 1990: year 1990>,
 '1991': <Code 1991: year 1991>,
 '1992': <Code 1992: year 1992>,
 '1993': <Code 1993: year 1993>,
 '1994': <Code 1994: year 1994>,
 '1995': <Code 1995: year 1995>,
 '1996': <Code 1996: year 1996>,
 '1997': <Code 1997: year 1997>,
 '1998': <Code 1998: year 1998>,
 '1999': <Code 1999: year 1999>,
 '2000': <Code 2000: year 2000>,
 '2001': <Code 2001: year 2001>,
 '2002': <Code 2002: year 2002>,
 '2003': <Code 2003: year 2003>,
 '2004': <Code 2004: year 2004>,
 '2005': <Code 2005: year 2005>,
 '2005_12': <Code 2005_12: December 2005>,
 '2006': <Code 2006: year 2006>,
 '2007': <Code 2007: year 2007>,
 '2008': <Code 2008: year 2008>,
 '2009': <Code 2009: year 2009>,
 '2010': <Code 2010: year 2010>,
 '2011': <Code 2011: year 2011>,
 '2015': <Code 2015: year 2015>}

In [56]:
data_msg = istat.data('93_500', key={'REF_AREA': '087015'})

HTTPError: 404 Client Error: Not Found for url: https://esploradati.istat.it/SDMXWS/rest/data/93_500/.087015........

In [57]:
data_msg

NameError: name 'data_msg' is not defined

In [58]:
sdmx.to_pandas(data_msg)

NameError: name 'data_msg' is not defined

## 161_267 - Risultati economici delle imprese

In [None]:
risultati = istat.dataflow('161_267')
print(risultati) # The response includes several classes of SDMX objects

In [None]:
dsd = risultati.structure 
print(dsd) # Show the data structure definition referred to by the data flow

## 150_915 - Employment  rate|DCCV_TAXOCCU1

In [None]:
prova = istat.dataflow("150_915") # download the data flow definition with the ID 
print(prova) # The response includes several classes of SDMX objects. 

## 150_938 - [Occupati (migliaia) - Ateco 2007](https://esploradati.istat.it/databrowser/#/it/dw/categories/IT1,Z0500LAB,1.0/LAB_OFFER/LAB_OFF_EMPLOY/DCCV_OCCUPATIT1/DCCV_OCCUPATIT1_PROVDATA/IT1,150_938_DF_DCCV_OCCUPATIT1_22,1.0)

In [59]:
prova = istat.dataflow("150_938") # download the data flow definition with the ID 
print(prova) # The response includes several classes of SDMX objects. 

<sdmx.StructureMessage>
  <Header>
    id: 'IDREF264'
    prepared: '2024-06-19T00:02:57.652756+02:00'
    receiver: <Agency Unknown>
    sender: <Agency Unknown>
    source: 
    test: False
  response: <Response [200]>
  Categorisation (1): CAT_150_938_80800
  CategoryScheme (1): SEP
  Codelist (20): CL_ATECO2002 CL_ATECO_2007 CL_BASE_YEAR CL_CARATT_OCC ...
  ConceptScheme (2): CROSS_DOMAIN LABWAGES_DOMAIN
  DataflowDefinition (1): 150_938
  DataStructureDefinition (1): DCCV_OCCUPATIT1


In [None]:
istat = sdmx.Client('istat')
cat_response = istat.categoryscheme() # con Istat non funziona

In [None]:
cat_response

In [None]:
sdmx.to_pandas(cat_response.category_scheme)

In [None]:
sdmx.to_pandas(cat_response.category_scheme.Z0400PRI)

In [None]:
dsd = prova.structure
print(dsd)