New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
carbon date truncates arguments with "&" in them #13
Comments
Hello Michael, and everyone,
Correct me if i am wrong, but isn't that a desired behavior? To clean up
the url from parameters and find the source?
On Oct 18, 2017 9:13 PM, "Michael L. Nelson" <notifications@github.com> wrote:
http://carbondate.cs.odu.edu/cd?url=www.cs.odu.edu/foo.cgi&arg1=1&arg2=2
produces:
{
"self": "http://carbondate.cs.odu.edu/cd?url=www.cs.odu.edu/foo.cgi&
arg1=1&arg2=2",
"uri": "http://www.cs.odu.edu/foo.cgi",
"estimated-creation-date": "2006-09-13T19:18:54",
...
}
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#13>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AD_myFT76yyE2myEds5vfhYsMH4u0hmrks5stsxYgaJpZM4P-rVJ>
.
|
I see whats happening, its counting those arg1 and arg2 parameters as part of carbondate.cs.odu.edu rather than that of the URI specified. The parameters can make a difference in finding mementos for some thing like that URI: However for something like youtube.com we definitely need those parameters. To correct this I think I'll remove the "/cd=" parameter and create a route such as "/cd/". Open to other suggestions as well. |
If I remember correctly, when we were discussing the output JSON structure, I also mentioned that this should be made inline with how other archiving related services work. They take URI as the last path parameter after every significant path prefix in the route. This eliminates the need of explicit URL encoding. |
thanks guys. yes, a structure like: http://carbondate.cs.odu.edu/cd/www.youtube.com/watch&v=Tnf_Brn-zdA would be better. |
Hey @HanySalahEldeen, it's great to hear from you. Hope you are doing good.
I think non-significant parameters/protocol/subdomain are removed as part of the canonicalization. This is done by most of the web archives, but we can do canonicalization on our end too to take advantage of it in non-archival sources. However, in this report, URL parameters were misses unintentionally, which is a bug. |
http://carbondate.cs.odu.edu/cd?url=www.cs.odu.edu/foo.cgi&arg1=1&arg2=2
produces:
{
"self": "http://carbondate.cs.odu.edu/cd?url=www.cs.odu.edu/foo.cgi&arg1=1&arg2=2",
"uri": "http://www.cs.odu.edu/foo.cgi",
"estimated-creation-date": "2006-09-13T19:18:54",
...
}
The text was updated successfully, but these errors were encountered: