Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

Fixed $extratInfo typo and added a brute fix for last modification date on PDFs #72

Closed
wants to merge 1 commit into from

Conversation

ScottConroy
Copy link

I'm finding several variations on the creation and last modification dates on different file types, but this fix works for the specific PDFs I have on hand.

@ryangrimm
Copy link
Contributor

I put in place an easy way to normalize last modification metadata metadata into a corona:modDate element. Am also running anything that looks like a date through the date parser to (hopefully) get out xs:dateTime values. Typos also cleaned up.

Thanks!

@ryangrimm ryangrimm closed this Dec 15, 2011
@ScottConroy
Copy link
Author

Your solution is obviously MUCH more graceful than mine! I very much
appreciate the fast turnaround. I'll be putting this to use right away.

Any thoughts about making this usable outside of Corona? I think folks
that are using other mechanisms to load binary content would benefit
greatly from this. The default result of an xdmp:document-filter doesn't
really cut it...

On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm <
reply@reply.github.com

wrote:

I put in place an easy way to normalize last modification metadata
metadata into a corona:modDate element. Am also running anything that
looks like a date through the date parser to (hopefully) get out
xs:dateTime values. Typos also cleaned up.

Thanks!


Reply to this email directly or view it on GitHub:
#72 (comment)

@ScottConroy
Copy link
Author

I'm getting an invalid cast as dateTime when I attempt to upload PDF's.
Tried with more than one. I didn't check your parsing since I know you can
do it faster than I can. Here's an example doc.

On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy conroys@avalonconsult.comwrote:

Your solution is obviously MUCH more graceful than mine! I very much
appreciate the fast turnaround. I'll be putting this to use right away.

Any thoughts about making this usable outside of Corona? I think folks
that are using other mechanisms to load binary content would benefit
greatly from this. The default result of an xdmp:document-filter doesn't
really cut it...

On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm <
reply@reply.github.com

wrote:

I put in place an easy way to normalize last modification metadata
metadata into a corona:modDate element. Am also running anything that
looks like a date through the date parser to (hopefully) get out
xs:dateTime values. Typos also cleaned up.

Thanks!


Reply to this email directly or view it on GitHub:
#72 (comment)

@ryangrimm
Copy link
Contributor

I noticed a couple more formats that the date parsing library wasn't handling and added those.

I suspect that the problem is in your range index. Is this an index that you created via Corona or the MarkLogic admin interface?

I'm putting the parsed date into a normalized-date attribute and leaving the original content as a text node. So make sure that the range index is pointing to the attribute and let me know if that gives you some success.

--Ryan

On Dec 15, 2011, at 10:19 AM, Scott Conroy wrote:

Forgot to mention that I have an index on modDate. Obviously the upload works if I get rid of the index. But as you can guess I'm trying to facet on modDate (across a variety of content).

On Thu, Dec 15, 2011 at 1:12 PM, Ryan Grimm wrote:
Doesn't look like Git allows attachments. Feel free to email me the PDF directly and I'll fix it up.

--Ryan

On Dec 15, 2011, at 10:07 AM, Scott Conroy wrote:

I'm getting an invalid cast as dateTime when I attempt to upload PDF's.
Tried with more than one. I didn't check your parsing since I know you can
do it faster than I can. Here's an example doc.

On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy conroys@avalonconsult.comwrote:

Your solution is obviously MUCH more graceful than mine! I very much
appreciate the fast turnaround. I'll be putting this to use right away.

Any thoughts about making this usable outside of Corona? I think folks
that are using other mechanisms to load binary content would benefit
greatly from this. The default result of an xdmp:document-filter doesn't
really cut it...

On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm <
reply@reply.github.com

wrote:

I put in place an easy way to normalize last modification metadata
metadata into a corona:modDate element. Am also running anything that
looks like a date through the date parser to (hopefully) get out
xs:dateTime values. Typos also cleaned up.

Thanks!


Reply to this email directly or view it on GitHub:
#72 (comment)


Reply to this email directly or view it on GitHub:
#72 (comment)

<install.pdf>

@ScottConroy
Copy link
Author

Sorry, I just figured that out while you were emailing me. Much
appreciated.

On Thu, Dec 15, 2011 at 1:33 PM, Ryan Grimm <
reply@reply.github.com

wrote:

I noticed a couple more formats that the date parsing library wasn't
handling and added those.

I suspect that the problem is in your range index. Is this an index that
you created via Corona or the MarkLogic admin interface?

I'm putting the parsed date into a normalized-date attribute and leaving
the original content as a text node. So make sure that the range index is
pointing to the attribute and let me know if that gives you some success.

--Ryan

On Dec 15, 2011, at 10:19 AM, Scott Conroy wrote:

Forgot to mention that I have an index on modDate. Obviously the upload
works if I get rid of the index. But as you can guess I'm trying to facet
on modDate (across a variety of content).

On Thu, Dec 15, 2011 at 1:12 PM, Ryan Grimm wrote:
Doesn't look like Git allows attachments. Feel free to email me the PDF
directly and I'll fix it up.

--Ryan

On Dec 15, 2011, at 10:07 AM, Scott Conroy wrote:

I'm getting an invalid cast as dateTime when I attempt to upload PDF's.
Tried with more than one. I didn't check your parsing since I know
you can
do it faster than I can. Here's an example doc.

On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy <
conroys@avalonconsult.com>wrote:

Your solution is obviously MUCH more graceful than mine! I very much
appreciate the fast turnaround. I'll be putting this to use right
away.

Any thoughts about making this usable outside of Corona? I think
folks
that are using other mechanisms to load binary content would benefit
greatly from this. The default result of an xdmp:document-filter
doesn't
really cut it...

On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm <
reply@reply.github.com

wrote:

I put in place an easy way to normalize last modification metadata
metadata into a corona:modDate element. Am also running anything
that
looks like a date through the date parser to (hopefully) get out
xs:dateTime values. Typos also cleaned up.

Thanks!


Reply to this email directly or view it on GitHub:
#72 (comment)


Reply to this email directly or view it on GitHub:
#72 (comment)

<install.pdf>


Reply to this email directly or view it on GitHub:
#72 (comment)

@ryangrimm
Copy link
Contributor

No worries.

I just created a new issue (#74) to make it easier to create range indexes on binary metadata without knowing all of the details.

--Ryan

On Dec 15, 2011, at 10:44 AM, Scott Conroy wrote:

Sorry, I just figured that out while you were emailing me. Much
appreciated.

On Thu, Dec 15, 2011 at 1:33 PM, Ryan Grimm <
reply@reply.github.com

wrote:

I noticed a couple more formats that the date parsing library wasn't
handling and added those.

I suspect that the problem is in your range index. Is this an index that
you created via Corona or the MarkLogic admin interface?

I'm putting the parsed date into a normalized-date attribute and leaving
the original content as a text node. So make sure that the range index is
pointing to the attribute and let me know if that gives you some success.

--Ryan

On Dec 15, 2011, at 10:19 AM, Scott Conroy wrote:

Forgot to mention that I have an index on modDate. Obviously the upload
works if I get rid of the index. But as you can guess I'm trying to facet
on modDate (across a variety of content).

On Thu, Dec 15, 2011 at 1:12 PM, Ryan Grimm wrote:
Doesn't look like Git allows attachments. Feel free to email me the PDF
directly and I'll fix it up.

--Ryan

On Dec 15, 2011, at 10:07 AM, Scott Conroy wrote:

I'm getting an invalid cast as dateTime when I attempt to upload PDF's.
Tried with more than one. I didn't check your parsing since I know
you can
do it faster than I can. Here's an example doc.

On Thu, Dec 15, 2011 at 12:43 PM, Scott Conroy <
conroys@avalonconsult.com>wrote:

Your solution is obviously MUCH more graceful than mine! I very much
appreciate the fast turnaround. I'll be putting this to use right
away.

Any thoughts about making this usable outside of Corona? I think
folks
that are using other mechanisms to load binary content would benefit
greatly from this. The default result of an xdmp:document-filter
doesn't
really cut it...

On Thu, Dec 15, 2011 at 12:39 PM, Ryan Grimm <
reply@reply.github.com

wrote:

I put in place an easy way to normalize last modification metadata
metadata into a corona:modDate element. Am also running anything
that
looks like a date through the date parser to (hopefully) get out
xs:dateTime values. Typos also cleaned up.

Thanks!


Reply to this email directly or view it on GitHub:
#72 (comment)


Reply to this email directly or view it on GitHub:
#72 (comment)

<install.pdf>


Reply to this email directly or view it on GitHub:
#72 (comment)


Reply to this email directly or view it on GitHub:
#72 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants