Skip to content
This repository has been archived by the owner on May 5, 2022. It is now read-only.

Handle case where pre- and -post directional are same #784

Open
missinglink opened this issue Feb 1, 2021 · 4 comments
Open

Handle case where pre- and -post directional are same #784

missinglink opened this issue Feb 1, 2021 · 4 comments

Comments

@missinglink
Copy link

missinglink commented Feb 1, 2021

Heya,

I noticed a street name in the San Diego file today "S 39TH ST S" which has the "South" directional added twice:

cat us_ca_san_diego-addresses-county.geojson \
  | grep 'S 39TH ST S' \
  | jq '.properties.street'

"S 39TH ST S"

It seems that the error is caused by the source data including both pre (addrpdir) and post (addrpostd) directional columns with the value 'S':

ogr2ogr -f CSV /vsistdout/ addrapn_datasd.dbf \
  | xsv search -s 'objectid' '854155' \
  | xsv table

objectid  addrnmbr  addrfrac  addrpdir  addrname  addrpostd  addrsfx  addrunit  addrzip  add_type  roadsegid  apn         asource  plcmt_loc  community  parcelid  usng
854155    1261                S         39TH      S          ST                 92113              0          5512003800  M        C          SAN DIEGO  11648     11S MS 89683 17286

Would it be possible to add a check in machine which only adds one of these values to the street field when both are present?

@iandees
Copy link
Member

iandees commented Feb 1, 2021

🤔 Are these one-offs in the data set? Maybe we should ask the county to fix the data?

@missinglink
Copy link
Author

It's definitely uncommon in OA, at least I've never noticed it before.
Within this one file happens a lot:

ogr2ogr -f CSV /vsistdout/ addrapn_datasd.dbf \
  | awk -F, '{ if($4 && $4==$6) {print $0}  }' \
  | xsv count
  
3595

Looking at the source, it could also be that addrpdir isn't what we think it is?
The post field is named addrpostd, I would expect the pre to be called addrpred but it's called addrpdir 🤷‍♂️.

@missinglink
Copy link
Author

missinglink commented Feb 2, 2021

It might still be a good idea to add some logic in machine to catch this

I think whenever the pre and post directional are identical it should always be considered an error?
Only one directional string should be added to the street string in this case.

[edit] If I were to chose which one, I'd favour keeping the post since it's much easier for consumers of the data to detect post-directionals than pre-directionals.

@missinglink
Copy link
Author

missinglink commented Feb 2, 2021

FWIW there are other logical errors in the San Diego geojson file, also because the source file is messy.

One thing I noticed is that machine inserts a space when the field is empty, so in these cases where there is no addrsfx we see a double space.

cat us_ca_san_diego-addresses-county.geojson \
  | jq -r '.properties.street' \
  | grep -E '^[NSEW]\s.{1,3}\s\s[NSEW]$'

W E  W
W E  W
E AVE  E
W E  W
W E  W
E AVE  E
E AVE  E
W E  W
E AVE  E

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants