# D1
## ISTAT SDMX - Migration (Transfer of residence)

[See on IstatData](https://esploradati.istat.it/databrowser/#/it/dw/categories/IT1,POP,1.0/POP_MIGRATIONS/DCIS_MIGRAZIONI/IT1,28_185_DF_DCIS_MIGRAZIONI_3,1.0)

In [1]:
#!pip install pandasdmx requests requests_cache xmltodict

In [2]:
import pandas as pd
import pandasdmx as sdmx
import json
import requests
# from pandasdmx import Request
import xmltodict
from datetime import datetime
import os

We start testing the API for just the immigrants from Africa. We also specify filters in the final part of the url, after the ID of the table, in our case '28_185'. Precisily we will operate on the dimension that are been explored before. The exploratory part of the data is easly to handle with software like Postman.

## 1 - Explore datastructure
`http://sdmx.istat.it/SDMXWS/rest/datastructure/IT1/DCIS_POPRES1/`

## 2 - The meaning of the dimensions of the dataset
`http://sdmx.istat.it/SDMXWS/rest/codelist/IT1/CL_ETA1`

## 3 - Explore values in dimensions
`http://sdmx.istat.it/SDMXWS/rest/availableconstraint/28_185`

## 4 - Test query with filters
`http://sdmx.istat.it/SDMXWS/rest/data/28_185/.TOTAL.AFR.ITTOT.ITTOT....9.`

In [3]:
# 4 - TEST QUERY WITH FILTERS
# Query all immigrant from Africa, both sexes, every age, in the whole Italy.
response = requests.get('http://sdmx.istat.it/SDMXWS/rest/data/28_185/.TOTAL.AFR.ITTOT.ITTOT....9.')
print(response.status_code)

if response.status_code == 200:
    content = response.content
    
    if len(content) > 0:
        try:
            xml_data = xmltodict.parse(content)
            json_string_data = json.dumps(xml_data,
                                    allow_nan = True, # If we hadn't set allow_nan to
                                                      # true we would have got
                                                      # ValueError: Out of range float
                                                      # values are not JSON compliant
                                    indent = 6) # Indentation can be used for pretty-printing
            # Now you can work with the parsed JSON data
        except json.JSONDecodeError as e:
            print("Error decoding JSON:", e)
    else:
        print("Empty content received.")
else:
    print("Request failed with status code:", response.status_code)

print(json_string_data)
type(json_string_data)

200
{
      "message:GenericData": {
            "@xmlns:footer": "http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer",
            "@xmlns:generic": "http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic",
            "@xmlns:message": "http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message",
            "@xmlns:common": "http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common",
            "@xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
            "@xmlns:xml": "http://www.w3.org/XML/1998/namespace",
            "message:Header": {
                  "message:ID": "IREF901d519168974be7871c842342763fef",
                  "message:Test": "true",
                  "message:Prepared": "2023-06-30T18:23:21",
                  "message:Sender": {
                        "@id": "SOME_NSI"
                  },
                  "message:Structure": {
                        "@structureID": "IT1_DCIS_MIGRAZIONI_1_1",
                        "@dimensionAt

str

In [4]:
nested_dict = json.loads(json_string_data)

# Accessing values in the nested dictionary
header_id = nested_dict['message:GenericData']['message:Header']['message:ID']
dataset_action = nested_dict['message:GenericData']['message:DataSet']['@action']
series_values = nested_dict['message:GenericData']['message:DataSet']['generic:Series']['generic:SeriesKey']['generic:Value']
obs_values = nested_dict['message:GenericData']['message:DataSet']['generic:Series']['generic:Obs']

# Printing the values
print("Header ID:", header_id)
print("Dataset Action:", dataset_action)
print("Series Values:", series_values)
print("Obs Values:")
for obs in obs_values:
    obs_dimension = obs['generic:ObsDimension']
    obs_value = obs['generic:ObsValue']
    print(obs_dimension['@id'], ":", obs_dimension['@value'], "-", obs_value['@value'])

print(type(nested_dict))

Header ID: IREF901d519168974be7871c842342763fef
Dataset Action: Information
Series Values: [{'@id': 'FREQ', '@value': 'A'}, {'@id': 'ETA_NUM', '@value': 'TOTAL'}, {'@id': 'PAESE_CITTAD', '@value': 'AFR'}, {'@id': 'TERR_DEST', '@value': 'ITTOT'}, {'@id': 'REF_AREA_O', '@value': 'ITTOT'}, {'@id': 'STATO_EST_DEST', '@value': 'X1033'}, {'@id': 'STATO_EST_PROV', '@value': 'X1033'}, {'@id': 'TIPO_TRASF', '@value': 'FREIGN'}, {'@id': 'SESSO', '@value': '9'}, {'@id': 'TIPO_INDDEM', '@value': 'TREG'}]
Obs Values:
TIME_PERIOD : 2002 - 33256
TIME_PERIOD : 2003 - 69794
TIME_PERIOD : 2004 - 70796
TIME_PERIOD : 2005 - 48001
TIME_PERIOD : 2006 - 43692
TIME_PERIOD : 2007 - 44164
TIME_PERIOD : 2008 - 71191
TIME_PERIOD : 2009 - 68833
TIME_PERIOD : 2010 - 75035
TIME_PERIOD : 2011 - 64283
TIME_PERIOD : 2012 - 65025
TIME_PERIOD : 2013 - 62827
TIME_PERIOD : 2014 - 57644
TIME_PERIOD : 2015 - 66491
TIME_PERIOD : 2016 - 78666
TIME_PERIOD : 2017 - 107193
TIME_PERIOD : 2018 - 89866
TIME_PERIOD : 2019 - 61055
TIM

In [5]:
# Since it is a nested dictonary I try to reach the key with the data as value
obs_value = nested_dict['message:GenericData']['message:DataSet']['generic:Series']['generic:Obs']

# Transforming obs_value into the desired dictionary structure
json_data = {}
continent = "Africa"  # Continent value for the dictionary key

continent_data = {}
for obs in obs_value:
    obs_dimension = obs['generic:ObsDimension']
    obs_value = obs['generic:ObsValue']
    continent_data[obs_dimension['@value']] = obs_value['@value']

json_data[continent] = continent_data

# Saving json_data as clean JSON
with open('data.json', 'w') as json_file:
    json.dump(json_data, json_file, indent=4)

json_data

{'Africa': {'2002': '33256',
  '2003': '69794',
  '2004': '70796',
  '2005': '48001',
  '2006': '43692',
  '2007': '44164',
  '2008': '71191',
  '2009': '68833',
  '2010': '75035',
  '2011': '64283',
  '2012': '65025',
  '2013': '62827',
  '2014': '57644',
  '2015': '66491',
  '2016': '78666',
  '2017': '107193',
  '2018': '89866',
  '2019': '61055',
  '2020': '45878',
  '2021': '57456'}}

In [6]:
# Query all immigrant from Africa, both sexes, every age, in the whole Italy.
response = requests.get('http://sdmx.istat.it/SDMXWS/rest/data/28_185/.TOTAL.EU27_FOR+EUR_NEU27+AFR+ASI+AME+OCE.ITTOT.ITTOT....9.')
print(response.status_code)

if response.status_code == 200:
    content = response.content
    
    if len(content) > 0:
        try:
            xml_data = xmltodict.parse(content)
            json_string_data = json.dumps(xml_data,
                                    allow_nan = True, # If we hadn't set allow_nan to
                                                      # true we would have got
                                                      # ValueError: Out of range float
                                                      # values are not JSON compliant
                                    indent = 6) # Indentation can be used for pretty-printing
            # Now you can work with the parsed JSON data
        except json.JSONDecodeError as e:
            print("Error decoding JSON:", e)
    else:
        print("Empty content received.")
else:
    print("Request failed with status code:", response.status_code)

print(json_string_data)
type(json_string_data)

200
{
      "message:GenericData": {
            "@xmlns:footer": "http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer",
            "@xmlns:generic": "http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic",
            "@xmlns:message": "http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message",
            "@xmlns:common": "http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common",
            "@xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
            "@xmlns:xml": "http://www.w3.org/XML/1998/namespace",
            "message:Header": {
                  "message:ID": "IREFf68f61ae84794965abc5c132acec6f37",
                  "message:Test": "true",
                  "message:Prepared": "2023-06-30T18:23:31",
                  "message:Sender": {
                        "@id": "SOME_NSI"
                  },
                  "message:Structure": {
                        "@structureID": "IT1_DCIS_MIGRAZIONI_1_1",
                        "@dimensionAt

str

In [7]:
data = json.loads(json_string_data)

continent_data = {}

series = data['message:GenericData']['message:DataSet']['generic:Series']

for s in series:
    series_key = s['generic:SeriesKey']['generic:Value']
    obs = s['generic:Obs']

    continent = None

    for key in series_key:
        if key['@id'] == 'PAESE_CITTAD':
            continent = key['@value']
            break

    if continent:
        if continent not in continent_data:
            continent_data[continent] = []

        for o in obs:
            year = int(o['generic:ObsDimension']['@value'])
            value = o['generic:ObsValue']['@value']

            # Set the year as a date type with the last day of the year
            year_date = datetime(year=year, month=12, day=31)

            continent_data[continent].append({"year": year_date.strftime("%Y-%m-%d"), "tot_immigrants": int(value)})

# Convert the continent data to JSON
continent_json = json.dumps(continent_data, indent=4)

print(continent_json)

{
    "AFR": [
        {
            "year": "2002-12-31",
            "tot_immigrants": 33256
        },
        {
            "year": "2003-12-31",
            "tot_immigrants": 69794
        },
        {
            "year": "2004-12-31",
            "tot_immigrants": 70796
        },
        {
            "year": "2005-12-31",
            "tot_immigrants": 48001
        },
        {
            "year": "2006-12-31",
            "tot_immigrants": 43692
        },
        {
            "year": "2007-12-31",
            "tot_immigrants": 44164
        },
        {
            "year": "2008-12-31",
            "tot_immigrants": 71191
        },
        {
            "year": "2009-12-31",
            "tot_immigrants": 68833
        },
        {
            "year": "2010-12-31",
            "tot_immigrants": 75035
        },
        {
            "year": "2011-12-31",
            "tot_immigrants": 64283
        },
        {
            "year": "2012-12-31",
            "tot_immigrants": 6

In [8]:
# Save the new file

# Specify the folder path to save the JSON file
folder_path = "../_datasets/Clean"

# Create the folder if it doesn't exist
os.makedirs(folder_path, exist_ok=True)

# Define the filename for the JSON file
filename = "continent_data.json"

# Generate the file path
file_path = os.path.join(folder_path, filename)

# Save the continent JSON to the file
with open(file_path, "w") as file:
    json.dump(continent_data, file, indent=4)

print(f"JSON data saved to: {file_path}")


JSON data saved to: ./_datasets/Clean/continent_data.json
