Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save items data in df #75

Merged
merged 3 commits into from
May 2, 2019
Merged

Save items data in df #75

merged 3 commits into from
May 2, 2019

Conversation

manycoding
Copy link
Contributor

@manycoding manycoding commented Apr 29, 2019

This is a pull request to prepare some work for #69
I am getting rid of dict, so it won't slow down implementing new df API. The changes are big (I also included a tad which is not related to pr), but I am going to comment some code here to help you understand.
Feel free to skip a review if you find it too complex :)

P.S. Data can be validated at once without any iteractions, but jsonschema is awfully slow for this and it will require creating different schemas. Which, in turn, make them incompatible with current spidermon validation. Thus, at this point I don't see this bottleneck as critical to address. It shouldn't be slower than it is now anyway.

@codecov
Copy link

codecov bot commented Apr 29, 2019

Codecov Report

Merging #75 into master will increase coverage by 0.6%.
The diff coverage is 79.36%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master      #75     +/-   ##
=========================================
+ Coverage   65.47%   66.08%   +0.6%     
=========================================
  Files          24       24             
  Lines        1596     1592      -4     
  Branches      278      274      -4     
=========================================
+ Hits         1045     1052      +7     
+ Misses        527      515     -12     
- Partials       24       25      +1
Impacted Files Coverage Δ
src/arche/tools/api.py 54.54% <100%> (-4.11%) ⬇️
src/arche/data_quality_report.py 31.31% <40%> (+1.01%) ⬆️
src/arche/rules/json_schema.py 72.97% <50%> (ø) ⬆️
src/arche/readers/items.py 81.11% <77.27%> (+4.28%) ⬆️
src/arche/arche.py 69.56% <80%> (-0.65%) ⬇️
src/arche/tools/json_schema_validator.py 96% <93.75%> (+17.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d43e8c...1e62d7a. Read the comment docs.

@@ -147,12 +148,6 @@ def data_quality_report(self, bucket: Optional[str] = None):
raise ValueError("Collections are not supported")
if not self.schema:
raise ValueError("Schema is empty")
if not self.report.results:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be dealt with in data_quality_report.py

@manycoding manycoding added this to the 0.4.0 milestone May 2, 2019
Copy link

@ejulio ejulio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants