Skip to content

okasi/swe-pii

Repository files navigation

Swedish — Personal Identifiable Information (PII)

CURRENT version:

  • RegEx
  • Set Lookups from JSON files

FUTURE version:


Dataset sources:

WIP extraction from OpenStreetMap data:

osmium tags-filter sweden-latest.osm.pbf -o addresses.osm.pbf n/addr:street n/addr:postcode n/addr:city n/admin_level=7 n/admin_level=4 w/addr:street w/addr:postcode w/addr:city w/admin_level=7 w/admin_level=4

osmium export addresses.osm.pbf -f geojson -o addresses.geojson

jq -r '.features[] | select(.properties["addr:street"] != null and .properties["addr:postcode"] != null and .properties["addr:city"] != null) | "\(.properties["addr:street"]),\(.properties["addr:postcode"]),\(.properties["addr:city"])"' addresses.geojson | sort | uniq | sed -E 's/([0-9]{3})([0-9]{2})/\1 \2/' > unique_addresses.txt

jq -r '.features[] | select(.properties.admin_level == "7" and .properties.name != null) | "\(.properties.name)"' addresses.geojson | sort | uniq > municipalities.txt

jq -r '.features[] | select(.properties.admin_level == "4" and .properties.name != null) | "\(.properties.name)"' addresses.geojson | sort | uniq > counties.txt

Identifiers & labels

  • Person First Name (PER-FIRST)
  • Person Last Name (PER-LAST)
  • Personnummer (ID-PNR)
  • Samordningsnummer (ID-SNR)
  • Marital Status (MARITAL)
  • Biological Sex (SEX)
  • Nationality (NATION)
  • Education Program (EDU-PROGRAM)
  • Profession (PROF)
  • Disabilities (DISAB)
  • Ethnicity (ETHNIC)
  • Sexual Orientation (SEXOR)
  • Political Opinions (POL)
  • Religious Beliefs (REL)
  • Phone Number (PHONE)
  • Email (EMAIL)
  • Social Media Profiles (SOCM)
  • Street Address (ADDR-STREET)
  • Postal Code (ADDR-POSTAL)
  • Municipality (ADDR-MUNICIPALITY)
  • City (ADDR-CITY)
  • County (ADDR-COUNTY)
  • Bank Account Number (FIN-BANKNUM)
  • IBAN (FIN-IBAN)
  • BIC/SWIFT Code (FIN-BIC)
  • Credit Card Number (FIN-CC)
  • Organization Number (ORG-NUM)
  • Company Name (ORG-WORK)
  • Education Institute (ORG-EDU)
  • IP Address (IP)
  • MAC Address (MAC)
  • Date (DATE)
  • Time (TIME)
  • Vehicle Registration Number (VEH)