# D2
## ISTAT SDMX - Resident foreigners on 1st January

[See on IstatData](https://esploradati.istat.it/databrowser/#/it/dw/categories/IT1,POP,1.0/POP_FOREIGNIM/DCIS_POPSTRRES1/IT1,29_7_DF_DCIS_POPSTRRES1_1,1.0)

In [1]:
#!pip install pandasdmx requests requests_cache xmltodict

In [2]:
import pandas as pd
import pandasdmx as sdmx
import json
import requests
# from pandasdmx import Request
import xmltodict
from datetime import datetime
import os

In [3]:
# 1 -  EXPLORE DATASTRUCTURE
response = requests.get('http://sdmx.istat.it/SDMXWS/rest/datastructure/IT1/DCIS_RICPOPRES2011/')
print(response.status_code)

if response.status_code == 200:
    content = response.content
    
    if len(content) > 0:
        try:
            xml_data = xmltodict.parse(content)
            json_string_data = json.dumps(xml_data,
                                    allow_nan = True, # If we hadn't set allow_nan to
                                                      # true we would have got
                                                      # ValueError: Out of range float
                                                      # values are not JSON compliant
                                    indent = 6) # Indentation can be used for pretty-printing
            # Now you can work with the parsed JSON data
        except json.JSONDecodeError as e:
            print("Error decoding JSON:", e)
    else:
        print("Empty content received.")
else:
    print("Request failed with status code:", response.status_code)

# print(json_string_data)
type(json_string_data)

200


str

By querying the API, we will obtain an XML output that includes the structure:DimensionList tag, which contains the list of dimensions, i.e., the data schema of the dataset. In our case, the dimensions are as follows: `FREQ`,`SESSO`,`TIPO_DATO`, `CLASSE_ETA`, `CITTADINANZA` and `ITTER107`.

To understand the meaning of this abbreviations we can look at the package called `codelist`. It can be queried by the previous discovered IDs. Let's for example explore `CITTADINANZA`. Reading the XML above we see that the ID to query relative to the package `codelist` is: `CL_CITTADINANZA`. Querying the URL `http://sdmx.istat.it/SDMXWS/rest/codelist/IT1/CL_CITTADINANZA` in Postman (some API response are too long to be loaded in a Jupyter Notebook) we can see that this dimension is relative to the kind of citizenship An example of one record is shown below:


`<structure:Code id="FRG" urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=IT1:CL_CITTADINANZA(1.2).FRG">de>`
<br>
` <common:Name xml:lang="it">straniero-a</common:Name>me>`
    <br>
` <common:Name xml:lang="en">foreign</common:Name>me>`
    <br>
`</structure:Code>`

Since we are interested foreing residentons, we find that:
EXTEU27TCD` is the ID foof extra European Union country with 27 Member Statesord`
FRGITD` is the ID foforeignest`
TOTALITC` is the ID fototalest`
EU27ITE`is the ID foof European Union country with 27 Member Statestro`
FRGAPOITF` is the ID foforeign/statelesssud`
ITL`IT` is the ID flitaliana
- `APO` is the ID for `apolide`
- `EU27_NOITL` is the ID for `of European Union country (except Italy)`
  ly`
We will need for our API request fore `FRG`his dimension:C

We also explored the `TIPO_DATO` dimension and we found that we need to specify `JAN` in our query, since it means `population on 1st January`de>

In [4]:
# 2 - Explore the meaning of the dimensions of the dataset

response = requests.get('http://sdmx.istat.it/SDMXWS/rest/codelist/IT1/CL_CITTADINANZA')
print(response.status_code)

if response.status_code == 200:
    content = response.content
    
    if len(content) > 0:
        try:
            xml_data = xmltodict.parse(content)
            json_string_data = json.dumps(xml_data,
                                    allow_nan = True, # If we hadn't set allow_nan to
                                                      # true we would have got
                                                      # ValueError: Out of range float
                                                      # values are not JSON compliant
                                    indent = 6) # Indentation can be used for pretty-printing
            # Now you can work with the parsed JSON data
        except json.JSONDecodeError as e:
            print("Error decoding JSON:", e)
    else:
        print("Empty content received.")
else:
    print("Request failed with status code:", response.status_code)

# print(json_string_data)

200


We explore all the other dimensions and since for our purpose we need all the ages we will have to understand how the are divided and which are the values. In order to do that we need to know the ID of our datasets. We can checkit in the URL on IstatData, and we see that is `164_164`.
We can query the API rest service like this to obtain our result:

In [5]:
# 3 -  EXPLORE VALUES IN DIMENSIONS
response = requests.get('http://sdmx.istat.it/SDMXWS/rest/availableconstraint/164_164')
print(response.status_code)

if response.status_code == 200:
    content = response.content
    
    if len(content) > 0:
        try:
            xml_data = xmltodict.parse(content)
            json_string_data = json.dumps(xml_data,
                                    allow_nan = True, # If we hadn't set allow_nan to
                                                      # true we would have got
                                                      # ValueError: Out of range float
                                                      # values are not JSON compliant
                                    indent = 6) # Indentation can be used for pretty-printing
            # Now you can work with the parsed JSON data
        except json.JSONDecodeError as e:
            print("Error decoding JSON:", e)
    else:
        print("Empty content received.")
else:
    print("Request failed with status code:", response.status_code)

# Print is disable in documentation since the response is too long to be shown here. Uncomment to see it.
# print(json_string_data)

200


We can now compose our final query for retrive all values relative of all ages, for all Italy, divided by sex.
Our final URL will be: http://sdmx.istat.it/SDMXWS/rest/data/164_164/.9.JAN.TOTAL.FRG.IT+ITCD+ITD+ITC+ITE+ITF./

The filters we apply after the `data` request are:
- `TOTAL` where we specify we want the whole ages not divided
-`IT`that is equal to whole Italian nation plus all the regions (nord, nord-est, nord-ovet, center and sud)
- `9`where we specify we want the data not divided by males and females.

In [5]:
# 4 -  FINAL QUERY WITH FILTERS
response = requests.get('http://sdmx.istat.it/SDMXWS/rest/data/164_164/.9.JAN.TOTAL.FRG.IT+ITCD+ITD+ITC+ITE+ITF./')
print(response.status_code)

if response.status_code == 200:
    content = response.content
    
    if len(content) > 0:
        try:
            xml_data = xmltodict.parse(content)
            json_string_data = json.dumps(xml_data,
                                    allow_nan = True, # If we hadn't set allow_nan to
                                                      # true we would have got
                                                      # ValueError: Out of range float
                                                      # values are not JSON compliant
                                    indent = 6) # Indentation can be used for pretty-printing
            # Now you can work with the parsed JSON data
        except json.JSONDecodeError as e:
            print("Error decoding JSON:", e)
    else:
        print("Empty content received.")
else:
    print("Request failed with status code:", response.status_code)

# Print is disable in documentation since the response is too long to be shown here. Uncomment to see it.
# print(json_string_data)

200


Now we crate a well formed JSON sting from the response.

The code snippet performs the following tasks:

1. It takes a JSON string called `json_string_data` and creates a nested dictionary, `nested_dict`, using the `json.loads()` function. This step is essential to process and extract information from the JSON data.

2. It defines a translation dictionary, `sex_translation`, which maps numeric codes to corresponding gender labels ('1' to 'Male' and '2' to 'Female'). This dictionary will be used to translate the sex values later.

3. It initializes an empty list, `result`, which will store the extracted information from the nested dictionary.

4. It iterates over the series data in the nested dictionary. Each series represents a set of observations for a specific combination of variables.

5. Within each series, it retrieves the territory and sex values by searching for specific keys ('ITTER107' and 'SESSO') in the series key. If found, it assigns the corresponding values to the `territory` and `sex` variables, respectively. The sex value is translated using the `sex_translation` dictionary.

6. It retrieves the observation values (`obs_values`) for each series and iterates over them. Each observation contains information about the year, age, and quantity.

7. It creates an entry dictionary that contains the extracted information, including the territory, year, sex, age, and quantity.

8. The entry dictionary is appended to the `result` list.

9. Finally, the `result` list is converted to a JSON string, `immigrants_distribution_2018`, using `json.dumps()`. The type of the `immigrants_distribution_2018` variable is printed to verify that it is a string.

In summary, sintethis code processes the nested dictionary, extracts specific information, translates values, and organizes the extracted data into a clean JSON format suitable for visualization or further analysis.

In [12]:
# Creating a nested dictonary from the response in order to create a clean JSON for our visualization
nested_dict = json.loads(json_string_data)

# Translation dictionary
sex_translation = {
    '1': 'Male',
    '2': 'Female',
    '9': 'TOTAL'
}

# Extracting information
result = []

series_data = nested_dict['message:GenericData']['message:DataSet']['generic:Series']
for series in series_data:
    series_key = series['generic:SeriesKey']
    territory = None
    sex = None

    for value in series_key['generic:Value']:
        if value['@id'] == 'ITTER107':
            territory = value['@value']
        elif value['@id'] == 'SESSO':
            sex_value = value['@value']
            sex = sex_translation.get(sex_value)

    obs_values = series['generic:Obs']
    for obs in obs_values:
        year = obs['generic:ObsDimension']['@value']
        age = series_key['generic:Value'][3]['@value']
        quantity = obs['generic:ObsValue']['@value']

        entry = {
            'Territory': territory,
            'Year': int(year),
            'Sex': sex,
            'Age': age,
            'Quantity': int(quantity)
        }
        result.append(entry)

# Convert result to JSON
immigrants_distribution_2018 = json.dumps(result)
print(type(immigrants_distribution_2018))
print(immigrants_distribution_2018)

<class 'str'>
[{"Territory": "IT", "Year": 2001, "Sex": "TOTAL", "Age": "TOTAL", "Quantity": 1334889}, {"Territory": "IT", "Year": 2002, "Sex": "TOTAL", "Age": "TOTAL", "Quantity": 1341414}, {"Territory": "IT", "Year": 2003, "Sex": "TOTAL", "Age": "TOTAL", "Quantity": 1483277}, {"Territory": "IT", "Year": 2004, "Sex": "TOTAL", "Age": "TOTAL", "Quantity": 1893927}, {"Territory": "IT", "Year": 2005, "Sex": "TOTAL", "Age": "TOTAL", "Quantity": 2269018}, {"Territory": "IT", "Year": 2006, "Sex": "TOTAL", "Age": "TOTAL", "Quantity": 2498411}, {"Territory": "IT", "Year": 2007, "Sex": "TOTAL", "Age": "TOTAL", "Quantity": 2692022}, {"Territory": "IT", "Year": 2008, "Sex": "TOTAL", "Age": "TOTAL", "Quantity": 3151553}, {"Territory": "IT", "Year": 2009, "Sex": "TOTAL", "Age": "TOTAL", "Quantity": 3558853}, {"Territory": "IT", "Year": 2010, "Sex": "TOTAL", "Age": "TOTAL", "Quantity": 3836349}, {"Territory": "IT", "Year": 2011, "Sex": "TOTAL", "Age": "TOTAL", "Quantity": 4101335}, {"Territory": "IT

In [17]:
# Load the JSON dataset
data = immigrants_distribution_2018

# Ensure data is parsed as a list of dictionaries
if isinstance(data, str):
    data = json.loads(data)

# Use a list comprehension to filter data for the year 2018
data_2018 = [entry for entry in data if entry["Year"] == 2018]

# Specify the folder path to save the JSON file
folder_path = "../_datasets/Clean"

# Create the folder if it doesn't exist
os.makedirs(folder_path, exist_ok=True)

# Define the file path for the output JSON file
filename = "filtered_immigrants_distribution_2018.json"

# Generate the file path
file_path = os.path.join(folder_path, filename)

# Save the filtered data as a JSON file with the same name
with open(file_path, "w") as json_file:
    json.dump(data_2018, json_file, indent=4)

print(f"Filtered data for 2018 saved as {filename}")

Filtered data for 2018 saved as filtered_immigrants_distribution_2018.json


The next code snippet performs the following tasks:

1. It converts the `immigrants_distribution_2018` string, which contains a JSON representation, into a JSON object using `json.loads()`. This step allows easier manipulation and access to the data.

2. It specifies the folder path where the resulting JSON file will be saved. In this case, the folder path is "../_datasets/Clean".

3. It creates the specified folder if it does not already exist using `os.makedirs()`. This ensures that the folder is available to store the JSON file.

4. It defines the filename for the JSON file as "immigrants_demographic.json".

5. It generates the complete file path by joining the folder path and filename using `os.path.join()`.

6. It saves the `immigrants_distribution_2018_json` JSON object to a file specified by the file path. This is achieved using `json.dump()` with the file opened in write mode ("w").

7. The JSON data is formatted with an indent of 4 spaces to improve readability within the file.

8. Finally, it prints a message confirming the successful saving of the JSON data, along with the file path where it was saved.

So basically this code snippet takes a JSON object, saves it as a JSON file in a specified directory, and provides feedback on the successful saving of the file.

In [20]:
# Let's create a single json adding 2018's data with the other years

# Load the JSON file with data for other years
with open("../_datasets/Clean/immigrants_distribution.json", "r") as file:
    data_other_years = json.load(file)

# Combine the data from both files into a single list
combined_data = data_other_years + data_2018

sorted_data = sorted(combined_data, key=lambda x: x['Year'])

# Specify the folder path to save the combined JSON file
folder_path = "../_datasets/Clean"

# Define the file path for the output JSON file
filename = "combined_immigrants_distribution.json"

# Generate the file path
file_path = os.path.join(folder_path, filename)

# Save the combined data as a new JSON file
with open(file_path, "w") as json_file:
    json.dump(sorted_data, json_file, indent=4)

print(f"Combined data saved as {filename}")

Combined data saved as combined_immigrants_distribution.json
