You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.
If I remember correctly, when we were discussing the output JSON structure, I also mentioned that this should be made inline with how other archiving related services work. They take URI as the last path parameter after every significant path prefix in the route. This eliminates the need of explicit URL encoding.
Correct me if i am wrong, but isn't that a desired behavior? To clean up
the url from parameters and find the source?
I think non-significant parameters/protocol/subdomain are removed as part of the canonicalization. This is done by most of the web archives, but we can do canonicalization on our end too to take advantage of it in non-archival sources. However, in this report, URL parameters were misses unintentionally, which is a bug.