Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDXJ: Error: no such capture field: method #106

Open
edsu opened this issue May 16, 2022 · 7 comments
Open

CDXJ: Error: no such capture field: method #106

edsu opened this issue May 16, 2022 · 7 comments

Comments

@edsu
Copy link

edsu commented May 16, 2022

When posting a CDXJ file (generated with pywb 2.6.7) to the OutbackCDX on DockerHub (v0.11.0?) like so

curl -X POST --data-binary @index.cdxj http://localhost:8080/coll

I'm seeing the following error get printed to the console:

At line: com,google-analytics)/collect?__wb_method=post&__wb_post_data=dj0xjl92pwo5nizhaxa9mszhptc2ndcxodg1myz0pxbhz2v2awv3jl9zptemzgw9ahr0chmlm0elmkylmkzhcg9klm5hc2euz292jtjgyxbvzcuyrmfwmjiwmza3lmh0bwwmzha9jtjgyxbvzcuyrmfwmjiwmza3lmh0bwwmdww9zw4tdxmmzgu9vvrgltgmzhq9qvbprcuzqsuymdiwmjilmjbnyxjjacuymdclmjatjtiwqsuymexpb24lmjbpbiuyme9yaw9ujnnkpte2lwjpdczzcj0xmzywedewmjamdna9mta1mhg4odamamu9mczfdxrtyt0xmtm2otk1ndmumtc1ntcxnza2mi4xnjuymtq0nja0lje2ntixndq2mjaumty1mje0ndyymc4xjl91dg16ptexmzy5otu0my4xnjuymtq0njiwljeums51dg1jc3ilm0qozglyzwn0ksu3q3v0bwnjbiuzrchkaxjly3qpjtdddxrty21kjtnekg5vbmupjl91dg1odd0xnjuymtq0njk0mdg5jl91pvfbq0nbuufcfizqawq9jmdqawq9jmnpzd0xnzu1nze3mdyylje2ntixndq2mdqmdglkpvvbltmzntizmtq1ltemx2dpzd00ntc1ndm3mc4xnjuymtq0nja0jmnkmt1oqvnbjmnkmj1oqvnbjtiwlsuymgfwb2qubmfzys5nb3ymy2qzptiwmtgxmdewjtiwdjqumsuymc0lmjbvbml2zxjzywwlmjbbbmfsexrpy3mmy2q0pxvuc3bly2lmawvkjtnbyxbvzc5uyxnhlmdvdizjzdu9dw5zcgvjawzpzwqlm0fhcg9klm5hc2euz292jmnknj1odhrwcyuzqsuyriuyrmrhcc5kawdpdgfsz292lmdvdiuyrlvuaxzlcnnhbc1gzwrlcmf0zwqtqw5hbhl0awnzlu1pbi5qcyzjzdc9ahr0chmlm0emej0xmjc2mdq0mjew 20220510010455 {"url":"https://www.google-analytics.com/collect","mime":"image/gif","status":"200","digest":"B5HJFHOVXMSWJ55LTR3DHDQE4KJKIKWO","length":"651","offset":"49132028","method":"POST","requestBody":"__wb_post_data=dj0xJl92PWo5NiZhaXA9MSZhPTc2NDcxODg1MyZ0PXBhZ2V2aWV3Jl9zPTEmZGw9aHR0cHMlM0ElMkYlMkZhcG9kLm5hc2EuZ292JTJGYXBvZCUyRmFwMjIwMzA3Lmh0bWwmZHA9JTJGYXBvZCUyRmFwMjIwMzA3Lmh0bWwmdWw9ZW4tdXMmZGU9VVRGLTgmZHQ9QVBPRCUzQSUyMDIwMjIlMjBNYXJjaCUyMDclMjAtJTIwQSUyMExpb24lMjBpbiUyME9yaW9uJnNkPTE2LWJpdCZzcj0xMzYweDEwMjAmdnA9MTA1MHg4ODAmamU9MCZfdXRtYT0xMTM2OTk1NDMuMTc1NTcxNzA2Mi4xNjUyMTQ0NjA0LjE2NTIxNDQ2MjAuMTY1MjE0NDYyMC4xJl91dG16PTExMzY5OTU0My4xNjUyMTQ0NjIwLjEuMS51dG1jc3IlM0QoZGlyZWN0KSU3Q3V0bWNjbiUzRChkaXJlY3QpJTdDdXRtY21kJTNEKG5vbmUpJl91dG1odD0xNjUyMTQ0Njk0MDg5Jl91PVFBQ0NBUUFCfiZqaWQ9JmdqaWQ9JmNpZD0xNzU1NzE3MDYyLjE2NTIxNDQ2MDQmdGlkPVVBLTMzNTIzMTQ1LTEmX2dpZD00NTc1NDM3MC4xNjUyMTQ0NjA0JmNkMT1OQVNBJmNkMj1OQVNBJTIwLSUyMGFwb2QubmFzYS5nb3YmY2QzPTIwMTgxMDEwJTIwdjQuMSUyMC0lMjBVbml2ZXJzYWwlMjBBbmFseXRpY3MmY2Q0PXVuc3BlY2lmaWVkJTNBYXBvZC5uYXNhLmdvdiZjZDU9dW5zcGVjaWZpZWQlM0FhcG9kLm5hc2EuZ292JmNkNj1odHRwcyUzQSUyRiUyRmRhcC5kaWdpdGFsZ292LmdvdiUyRlVuaXZlcnNhbC1GZWRlcmF0ZWQtQW5hbHl0aWNzLU1pbi5qcyZjZDc9aHR0cHMlM0Emej0xMjc2MDQ0MjEw","filename":"apod.warc.gz"}
java.lang.IllegalArgumentException: no such capture field: method
	at outbackcdx.Capture.put(Capture.java:548)
	at outbackcdx.Capture.fromCdxjLine(Capture.java:434)
	at outbackcdx.Capture.fromCdxLine(Capture.java:385)
	at outbackcdx.Webapp.post(Webapp.java:249)
	at outbackcdx.Webapp.lambda$new$3(Webapp.java:102)
	at outbackcdx.Web$Route.handle(Web.java:312)
	at outbackcdx.Web$Router.handle(Web.java:236)
	at outbackcdx.Webapp.handle(Webapp.java:594)
	at outbackcdx.Web$Server.serve(Web.java:50)
	at outbackcdx.NanoHTTPD$HTTPSession.execute(NanoHTTPD.java:848)
	at outbackcdx.NanoHTTPD$1$1.run(NanoHTTPD.java:207)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

Other CDXJ files seem to work normally however.

@edsu
Copy link
Author

edsu commented May 16, 2022

If it's helpful to have the WARC and CDXJ files please let me know!

@ato
Copy link
Member

ato commented May 16, 2022

OutbackCDX does not (yet) support storing arbitrarily named fields.

@ato ato changed the title Error: no such capture field: method CDXJ: Error: no such capture field: method May 16, 2022
@ato
Copy link
Member

ato commented May 17, 2022

Note the "Things it doesn't do (yet): CDXJ" in the README. :-) While it can now map CDX11 fields to CDXJ for input/output it doesn't actually support storing arbitrary CDXJ data.

I don't have any short term plans to implement this myself but would be happy to accept a pull request.

@edsu
Copy link
Author

edsu commented May 17, 2022

Now I'm confused why another CDXJ file worked.

@edsu
Copy link
Author

edsu commented May 17, 2022

I think I understand now: current OutbackCDX can store CDXJ data of a known shape? And the method property is not something it is expecting?

@ato
Copy link
Member

ato commented May 17, 2022

Yes if the CDXJ input is limited to just the basic CDX11 fields it works.

edsu added a commit to edsu/outbackcdx that referenced this issue May 17, 2022
Allow for a a `method` property in the CDXJ, which is occasionally
emitted by pywb.

Fixes nla#106
@ato
Copy link
Member

ato commented May 30, 2023

Commit 9d73df3 added support for storing arbitrary extra CDXJ fields using a CBOR-based record encoding when can be enabled with --index-version 5. This is still experimental and a little more work is needed to actually make use of the method and requestBody fields when constructing the urlkey for compatibility with pywb.

ato added a commit that referenced this issue Jun 9, 2023
This should improve compatibility with Pywb for POST and PUT requests.

#106
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants