Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Municipality assignment for Vespa velutina nesten data #170

Open
mvarewyck opened this issue May 29, 2024 · 6 comments · Fixed by #183
Open

Municipality assignment for Vespa velutina nesten data #170

mvarewyck opened this issue May 29, 2024 · 6 comments · Fixed by #183
Assignees
Labels
bug Something isn't working

Comments

@mvarewyck
Copy link
Collaborator

I think something goes wrong in the data processing for Vespa velutina.
When loading the nesten data it seems to have the province stored where the municipality should be (column NAAM). I read the following columns from the data

  • NAAM = gemeente
  • provincie = provincie
  • GEWEST = gewest

Also suspicious is that the counts differ for NAAM and provincie

> nestenData = sf::st_read("~/Downloads/nesten.geojson")
Reading layer `nesten' from data source `/home/mvarewyck/Downloads/nesten.geojson' using driver `GeoJSON'
Simple feature collection with 8740 features and 36 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 2.567021 ymin: 50.18892 xmax: 6.365131 ymax: 51.46739
Geodetic CRS:  WGS 84
> table(nestenData$NAAM)

           Antwerpen           Henegouwen HoofdstedelijkGewest 
                 983                   52                   65 
             Limburg                 Luik            Luxemburg 
                 243                    6                    2 
               Namen      Oost-Vlaanderen       Vlaams-Brabant 
                   9                 3214                 1689 
       Waals-Brabant      West-Vlaanderen 
                  19                 2458 
> table(nestenData$provincie)

           Antwerpen           Henegouwen HoofdstedelijkGewest 
                 986                   47                   25 
             Limburg                 Luik            Luxemburg 
                 244                    3                    1 
               Namen             onbekend      Oost-Vlaanderen 
                   5                   33                 3245 
      Vlaams-Brabant        Waals-Brabant      West-Vlaanderen 
                1661                   15                 2475 

Although in the script I see some renaming of NAAM as gemeente

Discovered by alien-species-portal PR#74

@SanderDevisscher
Copy link
Collaborator

@soriadelva was allready working on this see #160. I expect it to be merged after PR of 160 - branch

@soriadelva
Copy link
Contributor

The issue seems to be related to the iasset data. Some locations have the wrong coordinates and are manually altered in the script but only for column provincie (and not column NAAM), which explains the discrepancy (see:

data_nest_iasset <- data_nest_iasset %>%
left_join(iAsset_foutecoordinaten,
by = c("insp_order" = "ID nest")) %>%
mutate(provincie = case_when(!is.na(provincie_corrected) ~ provincie_corrected,
!is.na(provincie) ~ provincie,
!is.na(NAAM) ~ NAAM,
TRUE ~ NA_character_))%>%
). A second problem that arises because of this, is that later in the script an intersect and left join with the iasset data and the communes.geojson is done to assign a commune to each coordinate. However, this does not exclude these wrong coordinates, thus assigning the wrong commune for these values (see:
data_nest_final <- data_nest_final %>%
mutate(NISCODE = as.character(NISCODE)) %>%
left_join(gem_df,
by = "NISCODE")%>%
mutate(Gemeente = case_when(provincie=="unknown" | is.na(Gemeente) ~ "unknown",
TRUE ~ Gemeente))
) @jrhillae knows about the issues related to these data and will have a look at it.

@SanderDevisscher
Copy link
Collaborator

SanderDevisscher commented Jun 28, 2024

Personally I think it is cleaner to switch to using gemeente as input to the processing scripts since it is more saying than NAME. A similar case related issue is GEWEST which should be changed to gewest to conform.

@mvarewyck which changes would this imply to the app side ? and the backoffice side ?
@soriadelva what do you think ?

@soriadelva
Copy link
Contributor

Personally I think it is cleaner to switch to using gemeente as input to the processing scripts since it is more saying than NAME. A similar case related issue is GEWEST which should be changed to gewest to conform.

@mvarewyck which changes would this imply to the app side ? and the backoffice side ? @soriadelva what do you think ?

Personally I think it is cleaner to switch to using gemeente as input to the processing scripts since it is more saying than NAME. A similar case related issue is GEWEST which should be changed to gewest to conform.

@mvarewyck which changes would this imply to the app side ? and the backoffice side ? @soriadelva what do you think ?

I agree that this will be a lot clearer. In case we apply this, I think it's best to immediately apply this to all other datasets too so there is no confusion.

@jrhillae
Copy link
Contributor

jrhillae commented Jun 28, 2024

@SanderDevisscher , @soriadelva : I am planning to clean up the coordinates in the iAsset file (wrong coordinates corrected based on the field 'adress' or empty (location unknown), within three weeks

@mvarewyck
Copy link
Collaborator Author

@mvarewyck which changes would this imply to the app side ? and the backoffice side ? @soriadelva what do you think ?

Renaming the columns would imply only a minor change in the code of the app. Just inform me when this is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants