Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS quarantine issue appearing as Unicode error #83

Closed
martin-ueding opened this issue Jan 13, 2024 · 3 comments
Closed

macOS quarantine issue appearing as Unicode error #83

martin-ueding opened this issue Jan 13, 2024 · 3 comments
Labels
status: in progress type: bug Something isn't working

Comments

@martin-ueding
Copy link
Owner

A macOS user has trouble opening GPX files. They have sent me the file and I can open it on Linux. There is something weird going on. This is an example traceback:

2024-01-03 21:41:10 geo_activity_playground.importers.directory ERROR Error while parsing file Activities/._route_2023-01-17_5.05pm.gpx:
Traceback (most recent call last):
  File "/home/ecki/.local/lib/python3.10/site-packages/geo_activity_playground/core/activity_parsers.py", line 32, in read_activity
    df = read_gpx_activity(path, opener)
  File "/home/ecki/.local/lib/python3.10/site-packages/geo_activity_playground/core/activity_parsers.py", line 133, in read_gpx_activity
    gpx = gpxpy.parse(f)
  File "/home/ecki/.local/lib/python3.10/site-packages/gpxpy/__init__.py", line 37, in parse
    parser = mod_parser.GPXParser(xml_or_file)
  File "/home/ecki/.local/lib/python3.10/site-packages/gpxpy/parser.py", line 70, in __init__
    self.init(xml_or_file)
  File "/home/ecki/.local/lib/python3.10/site-packages/gpxpy/parser.py", line 84, in init
    text = text.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ecki/.local/lib/python3.10/site-packages/geo_activity_playground/importers/directory.py", line 42, in import_from_directory
    timeseries = read_activity(path)
  File "/home/ecki/.local/lib/python3.10/site-packages/geo_activity_playground/core/activity_parsers.py", line 38, in read_activity
    raise ActivityParseError(f"Encoding issue with {path=}: {e}") from e
geo_activity_playground.core.activity_parsers.ActivityParseError: Encoding issue with path=PosixPath('Activities/._route_2023-01-17_5.05pm.gpx'): 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte

In order to diagnose this further, I've added a bit more logging.

@martin-ueding
Copy link
Owner Author

I got a bit of log output which contains the first 100 bytes of the file. And these are the following:

>>> b = b'\x00\x05\x16\x07\x00\x02\x00\x00Mac OS X        \x00\x02\x00\x00\x00\t\x00\x00\x002\x00\x00\x0e\xb0\x00\x00\x00\x02\x00\x00\x0e\xe2\x00\x00\x01\x1e\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00ATTR\xff\xff\xef\x17\x00\x00\x0e\xe2\x00\x00\x00\x98'

We can then take a look and try to detect the character encoding:

>>> import chardet
>>> chardet.detect(b)
{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}

That means that this could be the Windows-1252 encoding. The user said that the files got converted many times. It could even mean that there are irrecoverable encoding errors and the data is garbled.

As these are GPX files, the data of interest will be in the ASCII section and therefore should be fine with almost any encoding. So perhaps that will work out even if there is not the perfect code page there.

Version 0.17.4 contains some experimental code with that.

@martin-ueding
Copy link
Owner Author

I've let the program emit the first 1000 bytes into the log. And there we find the string com.apple.quarantine. So we have some Apple specific feature active here. The interesting thing is that the file name is Activities/._route_2023-01-17_5.05pm.gpx, so it seems to be some hidden file. I'm not sure what this means exactly. Is there a file Activities/route_2023-01-17_5.05pm.gpx which can be read just fine? Or is that broken? I've asked the user to test a bit more.

@martin-ueding martin-ueding changed the title Unicode error with GPX on macOS macOS quarantine issue appearing as Unicode error Jan 14, 2024
@martin-ueding
Copy link
Owner Author

As the quarantine files start with a period, we can just skip those. That should make it more robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: in progress type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant