Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Split large files with ijson #366

Merged
merged 17 commits into from
Apr 30, 2020
Merged

[WIP] Split large files with ijson #366

merged 17 commits into from
Apr 30, 2020

Conversation

yolile
Copy link
Member

@yolile yolile commented Apr 24, 2020

ref #154

Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
…e into 154-fix-buenos-aires

Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
…e into 154-fix-buenos-aires

Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
@yolile yolile requested a review from jpmckinney April 27, 2020 19:10
kingfisher_scrapy/base_spider.py Show resolved Hide resolved
kingfisher_scrapy/base_spider.py Outdated Show resolved Hide resolved
kingfisher_scrapy/spiders/argentina_buenos_aires.py Outdated Show resolved Hide resolved
kingfisher_scrapy/base_spider.py Outdated Show resolved Hide resolved
kingfisher_scrapy/base_spider.py Outdated Show resolved Hide resolved
kingfisher_scrapy/base_spider.py Outdated Show resolved Hide resolved
kingfisher_scrapy/base_spider.py Outdated Show resolved Hide resolved
kingfisher_scrapy/base_spider.py Outdated Show resolved Hide resolved
kingfisher_scrapy/base_spider.py Outdated Show resolved Hide resolved
Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
@yolile yolile requested a review from jpmckinney April 29, 2020 01:08
package = self.get_package(f_package, array_field_name)

for number, items in enumerate(util.grouper(ijson.items(f_list, '{}.item'.format(array_field_name)), size), 1):
package[array_field_name] = [item for item in items if item is not None]
Copy link
Member

@jpmckinney jpmckinney Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case is an item None? You can also do filter(None, items).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yolile Can you answer this question? Otherwise, PR looks good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot about this one! the grouper method returns nulls values to complete the number of items set at 'size' parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, thanks!

kingfisher_scrapy/util.py Outdated Show resolved Hide resolved
kingfisher_scrapy/util.py Outdated Show resolved Hide resolved
kingfisher_scrapy/util.py Outdated Show resolved Hide resolved
kingfisher_scrapy/util.py Outdated Show resolved Hide resolved
kingfisher_scrapy/util.py Outdated Show resolved Hide resolved
Comment on lines 1 to 5
import itertools
import json
from decimal import Decimal

from ijson import utils, ObjectBuilder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run isort -w 119 to fix the test failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, don't forget to add ijson>=3 to requirements.in and run:

pip-compile; pip-compile requirements_dev.in

Copy link
Member Author

@yolile yolile Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got errors running pip-compile using pip 20.1, the current pip-compile version doesn't support pip>=20, the 5.5 version does. Besides, can we document these development processes somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! I thought that only the documentation about the standard development was there. Should we add a reference to that in kingfisher somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue to discuss what to do with that content: open-contracting/standard-development-handbook#225

Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
Signed-off-by: Yohanna Lisnichuk <yohanitalisnichuk@gmail.com>
@yolile yolile requested a review from jpmckinney April 30, 2020 19:24
@jpmckinney jpmckinney merged commit 4820717 into master Apr 30, 2020
@jpmckinney jpmckinney deleted the 154-fix-buenos-aires branch April 30, 2020 20:22
@yolile yolile mentioned this pull request May 7, 2020
@jpmckinney jpmckinney mentioned this pull request Feb 17, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants