Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geographic availability to services #30

Closed
ColinMaudry opened this issue Feb 20, 2023 · 8 comments · Fixed by #31
Closed

Add geographic availability to services #30

ColinMaudry opened this issue Feb 20, 2023 · 8 comments · Fixed by #31
Assignees

Comments

@ColinMaudry
Copy link
Contributor

ColinMaudry commented Feb 20, 2023

ODS PEP RES_PARTENARIALE table has a PERIMETRE_GEO_RSP column that indicates the range of the availability of the service. From the documentation:

0 - National
1 - Régional (liste des régions dans la table DD009_REGION_RESSOURCE)
2 - Direction territoriale OFII (liste des OFII dans la table DD009_DIR_TERRITORIALE_OFII)
3 - Département (liste des départements dans la table DD009_DEPARTEMENT_RESSOURCE)
4 - Bassin d'emploi (liste des bassins dans la table DD009_BASSIN_RESSOURCE)
5 - Commune (liste des communes dans la table DD009_COMMUNE_RESSOURCE)

Problem: the RES_PARTENARIALE doesn't mention directly the place that would map with zone_diffusion_code (commune, département, etc.) and zone_diffusion_nom, we must get it from the address of the structure.

To do:

  • a join with ADRESSE via the RES_PARTENARIALE.ID_ADR (in sources or intermediate?)
  • get the right place code/name depending on the value of PERIMETRE_GEO_RSP
@ColinMaudry
Copy link
Contributor Author

ColinMaudry commented Feb 20, 2023

The ODS PEP data comes with its own reference data, we're supposed to lookup in the following table using RES_PARTENARIALE.ID_RES as common key:

  • REGION_RESSOURCE_1
  • REGION_RESSOURCE_2
  • DEPARTEMENT_RESSOURCE
  • DIR_TERRITORIALE_OFII
  • BASSIN_RESSOURCE
  • COMMUNE_RESSOURCE

@ColinMaudry
Copy link
Contributor Author

ColinMaudry commented Feb 20, 2023

The region reference data is spread in REGION_RESSOURCE_1 and REGION_RESSOURCE_2.

When unioned, the total number of rows is 2 179.

Apparently, a given id_res (service) can appear several times in this table. That would mean a service can have several regions associated to it (to be confirmed).

@ColinMaudry
Copy link
Contributor Author

ColinMaudry commented Feb 21, 2023

Number of matches between RES_PARTENARIALE.ID_RES and ID_RES in the following tables:

Table Rows (unique `ID_RES`) `RES_PARTENARIALE.ID_RES` matches
REGION_RESSOURCE_1 335 (325) 325
REGION_RESSOURCE_2 1 844 (1 730) 0
DEPARTEMENT_RESSOURCE 12 653 (12 391) 12 391
BASSIN_RESSOURCE 597 (466) 466
COMMUNE_RESSOURCE 19 206 (15 037) 15 037
Total 28 219

The total number of RES_PARTENARIALE.ID_RES is 28 262, but it doesn't necessarily mean most of them have a match in the RESSOURCE tables. The same RES_PARTENARIALE.ID_RES sometimes appears several times in the same RESSOURCE table and may appear in several RESSOURCE tables.

@ColinMaudry
Copy link
Contributor Author

ColinMaudry commented Feb 21, 2023

With all the following tables mapped and joined to RES_PARTENARIALE via ID_RES, only 47 rows remain with an empty zone_diffusion_code (out of 32 838 services+zone_diffusion):

  • REGION_RESSOURCE_1
  • REGION_RESSOURCE_2
  • DEPARTEMENT_RESSOURCE
  • DIR_TERRITORIALE_OFII
  • BASSIN_RESSOURCE
  • COMMUNE_RESSOURCE

@ColinMaudry
Copy link
Contributor Author

Certain services are in many zone_diffusion:

id_res count
48818 103
48824 102
51054 86
51553 81
53501 81
48230 58
54019 55
48231 48
13319 47
48448 47

585 services are available in more than one zone_diffusion.

@ColinMaudry
Copy link
Contributor Author

Final step is to join the data from the OFII (Office français de l'immigration et de l'intégration) ressource table. It adds zone_diffusion to only 4 services (table DIR_TERRITORIALE_OFII has 4 rows).

@ColinMaudry ColinMaudry linked a pull request Feb 21, 2023 that will close this issue
@ColinMaudry
Copy link
Contributor Author

ColinMaudry commented Feb 21, 2023

✅ OFII resource table is joined

Number of rows with an empty zone_diffusion_code is down to 43.

@ColinMaudry
Copy link
Contributor Author

ColinMaudry commented Feb 22, 2023

Après avoir fait la jointure avec les données INSEE COG 2023, les communes restent sans libellé (zone_diffusion_nom) :

28093
28115
28204
30157
30190
60475
60649
60690
85060
85166
91182

La requête :

SELECT DISTINCT zone_diffusion_code, zone_diffusion_type, zone_diffusion_nom  FROM public_intermediate.int_odspep__services
WHERE (zone_diffusion_type = 'Département'
OR zone_diffusion_type = 'Région'
OR zone_diffusion_type = 'Commune')
AND zone_diffusion_nom is NULL

La requête retourne également les départements 1 et 7, mais je pense qu'il s'agit d'un bug de padding de zéros dans les données sources.

ColinMaudry added a commit that referenced this issue Feb 22, 2023
in order to have unique id in aggregate_ressource table #30
vmttn pushed a commit that referenced this issue Feb 24, 2023
* feature(pipeline): add odspep mapping for services #27

* feature(pipeline): add zone_diffusion to odspep mapping (only dep and reg) #27

* feature(pipeline): add zone_diffusion to odspep mapping (only com and bas) #27

* feature(pipeline): add zone_diffusion to odspep mapping (only com and bas) #27

* feature(pipeline): add zone_diffusion to odspep mapping (dir ofii) #27

* chore(lint): order select statements in ascending complexity

* doc(dbt): add table documentation in _odspep__models.yml

Co-authored-by: vmttn <taq@protonmail.ch>

* docs(dbt): rephrase table description

Co-authored-by: vmttn <taq@protonmail.ch>

* chore(odspep): use double quotes when renaming columns (vs. transformations)

Co-authored-by: vmttn <taq@protonmail.ch>

* feat(odspep): add surrogate key to *_ressource data

in order to have unique id in aggregate_ressource table #30

* feat(odspep): test id uniqueness in int services and int aggregated_ressource #30

* feat(odspep): enable unique service id using unique zone code #30

* fix(odspep): unique ressource and service id (dbt tested) #30

* feat(odspep): disabled dedup structures, removed aggr ressource id #30

* chore(odspep): renamed aggregated_ressource => zones_diffusion #30

* fix(odspep): readded odspep structures to int__structures (not deduplicated)

* chore(odspep): cleanup

* feat(odspep): basic dedup of structures id_res
@vmttn vmttn closed this as completed in #31 Feb 24, 2023
vmttn added a commit that referenced this issue Mar 7, 2023
Co-authored-by: Valentin Matton <taq@protonmail.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant