Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting up ISO 3166 2 per country #12

Closed
henriquemoody opened this issue Apr 1, 2019 · 6 comments
Closed

Splitting up ISO 3166 2 per country #12

henriquemoody opened this issue Apr 1, 2019 · 6 comments

Comments

@henriquemoody
Copy link
Contributor

First of all, thank you very much for maintaining this library, very much appreciated! I am very interested to incorporate this library into a library that I have been maintaining in a while called Respect\Validation because I don't want to deal with keeping this type of data in sync.

I noticed that you load all the subdivisions (ISO 3166 2) at once and then you iterate on them to fetch them by Country. Wouldn't it be optimal to have once file per country instead?

I get that this is the implementation of iso-codes, but they don't have to worry with web environment as most of PHP code has to deal with.

@sokil
Copy link
Owner

sokil commented Apr 2, 2019

Yes, that code is not optimal.

+-------------------+--------------------------+-----+------+------+------------+--------------+--------------+----------------+
| benchmark         | subject                  | set | revs | iter | mem_peak   | time_rev     | comp_z_value | comp_deviation |
+-------------------+--------------------------+-----+------+------+------------+--------------+--------------+----------------+
| SubdivisionsBench | benchGetAllByCountryCode | 0   | 100  | 0    | 5,392,224b | 43,757.890μs | -1σ          | -2.68%         |
| SubdivisionsBench | benchGetAllByCountryCode | 0   | 100  | 1    | 5,392,224b | 46,171.430μs | +1.00σ       | +2.68%         |
+-------------------+--------------------------+-----+------+------+------------+--------------+--------------+----------------+

Database file is large, and it needs to be loaded to memory. I thought to add some cache layer, because parsing of ~500KB JSON file and creading value objects not very optimal, but then decide that this layer must be on the client side.

Maybe there is a reason to split database json file to php files with one country's subdivisions inside. Opcache will help to cache and locate this parts.

@sunchayn
Copy link

sunchayn commented Apr 2, 2019

Why not adding both ( one file and isolated JSON files per country ) and make it configurable ? This will give people more choices. If the original data is begin delivered in one single file I think it's sane to create a little script that automate the extraction part to make life easier for future database updates.

@sokil
Copy link
Owner

sokil commented Apr 3, 2019

If there is profit with single file in some use cases, maybe both variants will stay. Not shure about configuration, choosing of strategy may be done dependin on concrete method without configuration, benchmarking will show better way. And of course splitting must be automated. There is also languages database (ISO 639-3) with 847KB file.

@henriquemoody
Copy link
Contributor Author

The cache layer would work for the loading but it wouldn't work for the searching, at least with the current implementation of Subdivisions.

I think keeping both variants is not a good idea: more options = more code = mode maintenance points = more hassle. I think there are not many use cases in which you need all subdivisions listed if not by country, and if that's really the case it makes sense to get them by country (as most of the interface of Subdivisions suggests).

And talking about really splitting it up, it will require breaking BC because both Subdivisions and AbstractDatabase would need to be updated (if we don't wanna do workaround), I suggest to make this change on version 3.0.

henriquemoody added a commit to henriquemoody/Validation that referenced this issue Apr 6, 2019
There is no much benefit from having individual rules for each country's
subdivision, quite the opposite. It increases the amount of code and
makes it hard to change the implementation of these rules. Right now,
the only sane way to change those rules is with a customized script.

This commit will remove the Subdivision Code rules per country and
instead will put that information into JSON files.

We both wouldn't like to keep this in this library anymore, and we are
considering having another library to deal with this data [1], but since
it seems like it may take some time, looks better to do it temporarily
here.

[1]: sokil/php-isocodes#12

Co-authored-by: Mazen Touati <mazen_touati@hotmail.com>
Signed-off-by: Henrique Moody <henriquemoody@gmail.com>
henriquemoody added a commit to henriquemoody/Validation that referenced this issue Apr 6, 2019
There is no much benefit from having individual rules for each country's
subdivision, quite the opposite. It increases the amount of code and
makes it hard to change the implementation of these rules. Right now,
the only sane way to change those rules is with a customized script.

This commit will remove the Subdivision Code rules per country and
instead will put that information into JSON files.

We both wouldn't like to keep this in this library anymore, and we are
considering having another library to deal with this data [1], but since
it seems like it may take some time, looks better to do it temporarily
here.

[1]: sokil/php-isocodes#12

Co-authored-by: Mazen Touati <mazen_touati@hotmail.com>
Signed-off-by: Henrique Moody <henriquemoody@gmail.com>
@sokil
Copy link
Owner

sokil commented May 4, 2019

+-------------------+--------------------------+-----------------+------+------+------------+--------------+--------------+----------------+
| benchmark         | subject                  | set             | revs | iter | mem_peak   | time_rev     | comp_z_value | comp_deviation |
+-------------------+--------------------------+-----------------+------+------+------------+--------------+--------------+----------------+
| CountriesBench    | benchIterator            | 0               | 500  | 0    | 1,592,912b | 1,897.402μs  | -1σ          | -0.57%         |
| CountriesBench    | benchIterator            | 0               | 500  | 1    | 1,592,912b | 1,919.286μs  | +1.00σ       | +0.57%         |
| SubdivisionsBench | benchIterator            | non_partitioned | 100  | 0    | 5,452,328b | 37,605.940μs | 0.00σ        | 0.00%          |
| SubdivisionsBench | benchIterator            | partitioned     | 100  | 0    | 5,441,320b | 35,458.310μs | 0.00σ        | 0.00%          |
| SubdivisionsBench | benchGetByCode           | non_partitioned | 100  | 0    | 5,959,096b | 39,724.930μs | +1.00σ       | +1.59%         |
| SubdivisionsBench | benchGetByCode           | non_partitioned | 100  | 1    | 5,959,096b | 38,479.290μs | -1σ          | -1.59%         |
| SubdivisionsBench | benchGetByCode           | partitioned     | 100  | 0    | 1,393,976b | 105.110μs    | -1σ          | -0.61%         |
| SubdivisionsBench | benchGetByCode           | partitioned     | 100  | 1    | 1,393,976b | 106.390μs    | +1.00σ       | +0.61%         |
| SubdivisionsBench | benchGetAllByCountryCode | non_partitioned | 100  | 0    | 5,959,112b | 40,804.560μs | +1.00σ       | +2.40%         |
| SubdivisionsBench | benchGetAllByCountryCode | non_partitioned | 100  | 1    | 5,959,112b | 38,892.030μs | -1σ          | -2.4%          |
| SubdivisionsBench | benchGetAllByCountryCode | partitioned     | 100  | 0    | 1,393,960b | 101.910μs    | -1σ          | -2.04%         |
| SubdivisionsBench | benchGetAllByCountryCode | partitioned     | 100  | 1    | 1,393,960b | 106.160μs    | +1.00σ       | +2.04%         |
+-------------------+--------------------------+-----------------+------+------+------------+--------------+--------------+----------------+

@sokil sokil closed this as completed Jun 3, 2019
@sokil
Copy link
Owner

sokil commented Jun 3, 2019

Done in 3.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants