-
Notifications
You must be signed in to change notification settings - Fork 119
[MRG+1] Add dublincore metadata #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Add dublincore metadata #101
Conversation
Codecov Report
@@ Coverage Diff @@
## master #101 +/- ##
==========================================
+ Coverage 87.3% 88.65% +1.34%
==========================================
Files 11 12 +1
Lines 457 520 +63
Branches 97 111 +14
==========================================
+ Hits 399 461 +62
Misses 52 52
- Partials 6 7 +1
Continue to review full report at Codecov.
|
|
What you think @lopuhin ? |
|
@joaquingx thanks for the PR! I had just a quick look, the PR looks great 👍 Lines 111 to 133 in 3ab5592
their goal is to convert the syntax as provided by the extractor into a more uniform syntax which is similar between different extractors, to make it easier to write code which supports all processors. Do you think such converter will make sense for dublincore? |
|
@lopuhin I add the uniform option, can you confirm that it works as intended 🙏 ? |
lopuhin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @joaquingx , looks good! Left a small comment regarding the code.
Also it would be great to update README.rst, adding dublincore at the start (at "Currently, extruct supports:"), and also adding it to the list of valid syntaxes https://github.com/scrapinghub/extruct#select-syntaxes and to https://github.com/scrapinghub/extruct#uniform. I don't think we should add it under https://github.com/scrapinghub/extruct#single-extractors (not that it's different from other syntaxes, but we should just refer to tests here instead of repeating them in README - this is unrelated to this PR).
Codecov Report
@@ Coverage Diff @@
## master #101 +/- ##
==========================================
+ Coverage 87.3% 88.65% +1.34%
==========================================
Files 11 12 +1
Lines 457 520 +63
Branches 97 111 +14
==========================================
+ Hits 399 461 +62
Misses 52 52
- Partials 6 7 +1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #101 +/- ##
==========================================
+ Coverage 89.23% 90.24% +1.00%
==========================================
Files 12 13 +1
Lines 539 605 +66
Branches 122 136 +14
==========================================
+ Hits 481 546 +65
Misses 52 52
- Partials 6 7 +1
Continue to review full report at Codecov.
|
|
Hey @lopuhin , thanks for feedback 🙏, I just have a doubt here:
How's the correct way to do that, should I just add a title |
|
@joaquingx thanks, the way you updated README looks perfect to me 👍 |
lopuhin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks 👍
| def _udublincore(extracted): | ||
| out = [] | ||
| for obj in extracted: | ||
| context = obj.pop('namespaces', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to not modify the passed object here, and do a copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by list(extracted) you're doing a shallow copy; changes in list elements still happen in-place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @joaquingx! Do you think you can address the comment above? It seems this is the only remaining issue, the PR has everything: a clean implementation, docs, tests, ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure ill work on that 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Aside for the comments left by @kmike this PR looks good to me. Thank you @joaquingx! |
|
@kmike is it good to merge now? |
|
Hey @joaquingx! Could you please check this comment: #101 (comment)? |
3dc5a69 to
e5b51a7
Compare
|
Finally changed the shallow copy to deep copy 🙈, review it @kmike please 🙏 |
e471d95 to
043a479
Compare
Gallaecio
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
|
Thanks @joaquingx for your persistence! |
This PR adds Dublincore schema to extruct. To implement parsing I used this document as guide: http://dublincore.org/documents/dcq-html/ (Specially on 3. that explains how a DC consumer should act).
More references:
Fixes #10