Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem with the Chinese phone numbers. #44

Closed
olegsklyarov72 opened this issue Sep 26, 2014 · 17 comments
Closed

The problem with the Chinese phone numbers. #44

olegsklyarov72 opened this issue Sep 26, 2014 · 17 comments
Assignees
Labels

Comments

@olegsklyarov72
Copy link

Hello everybody. I get some problems with en/86.php. This file is too big compare with others (4,8Mb).
So, I must spend big amount of valuable memory (almost 75Mb) when use geocoding for Chinese phone numbers. When I add necessary size of memory by ini_set (although it is not good) I anyway get problems with geocoding some numbers. For example I try use geocoding for 86-157-9662-1289.
Actualy there is no such number and I should get empty result, instead I get fatal error, because it is need even more memory. Why that, I don't no. Anybody knows how to resolve this problem with memory wasting? May be should split the file 86.php into smaller files. Please, help me.

@giggsey
Copy link
Owner

giggsey commented Sep 26, 2014

Yikes, that array contains ~145,800 items.

I'll have a look at a way of solving this.

@giggsey giggsey added the bug label Sep 26, 2014
@giggsey giggsey self-assigned this Sep 26, 2014
giggsey added a commit that referenced this issue Sep 28, 2014
Reduces PHP memory usage when large PHP arrays are loaded
giggsey added a commit that referenced this issue Sep 28, 2014
* GeoLocationMemoryReduction:
  Split Chinese PrefixMapper files into more files (#44)
@giggsey
Copy link
Owner

giggsey commented Sep 28, 2014

I've split up the Chinese prefix mapper (geolocation & carrier mapper) files into smaller files, which has greatly reduced the memory usage when testing your number on my local PC.

Once Google release a new version, it'll get tagged, but for the time being, you can test it by using dev-master.

Demo Code

<?php

$number = PhoneNumberUtil::getInstance()->parse("86-157-9662-1289", "CN");
$geocoder = PhoneNumberOfflineGeocoder::getInstance();

$startMemory = memory_get_usage();
$location = $geocoder->getDescriptionForNumber($number, "en");
$endMemory = memory_get_usage();

$memoryUsed = $endMemory - $startMemory;

Before

Memory usage: 27260968 bytes (27 MB)

find . -name "*.php" -type f -not -path "./vendor/*" | xargs wc -l|sort -rn|grep -v ' total$' | head -n 10
  145802 ./src/libphonenumber/geocoding/data/zh/86.php
  145802 ./src/libphonenumber/geocoding/data/en/86.php
    9473 ./src/libphonenumber/carrier/data/en/55.php
    5584 ./src/libphonenumber/geocoding/data/en/33.php
    5210 ./src/libphonenumber/geocoding/data/en/49.php
    3330 ./Tests/libphonenumber/Tests/core/PhoneNumberUtilTest.php
    3180 ./src/libphonenumber/PhoneNumberUtil.php
    2653 ./src/libphonenumber/geocoding/data/en/91.php
    2093 ./src/libphonenumber/geocoding/data/en/359.php
    2093 ./src/libphonenumber/geocoding/data/bg/359.php

After

Memory Usage: 38288 bytes (0.03MB)

find . -name "*.php" -type f -not -path "./vendor/*" | xargs wc -l|sort -rn|grep -v ' total$' | head -n 10
   51459 ./src/libphonenumber/geocoding/data/zh/8613.php
   51459 ./src/libphonenumber/geocoding/data/en/8613.php
   41858 ./src/libphonenumber/geocoding/data/zh/8618.php
   41858 ./src/libphonenumber/geocoding/data/en/8618.php
   41269 ./src/libphonenumber/geocoding/data/zh/8615.php
   41269 ./src/libphonenumber/geocoding/data/en/8615.php
    9473 ./src/libphonenumber/carrier/data/en/55.php
    6986 ./src/libphonenumber/geocoding/data/zh/8614.php
    6986 ./src/libphonenumber/geocoding/data/en/8614.php
    5584 ./src/libphonenumber/geocoding/data/en/33.php

@olegsklyarov72
Copy link
Author

The problem with memory usage was solved. But another one appeared. When I use geocoding I receive only one result: China. That is all. There is no detalization as it was befor: no province, no city.
It seems, that we can not get information from splited files.

@giggsey
Copy link
Owner

giggsey commented Sep 29, 2014

Is it the same number as per the original report? If so, I tried that on
the demo before I started making changes, and it was only reporting "China"
then.

I just tried that number in Google's demo page, and that said "China" too.

The problem with memory usage was solved. But another one appeared. When I
use geocoding I receive only one result: China. That is all. There is no
detalization as it was befor: no province, no city.
It seems, that we can not get information from splited files.


Reply to this email directly or view it on GitHub
#44 (comment)
.

@olegsklyarov72
Copy link
Author

Yes, the original number does not description, so the answer China is correct. But I try many others, for
example: 86-150-3657-7264, 86-136-3323-3696, 86-131-2270-1411. They have detalization and previous version showed them. Now it does not work.
86-150-3657-7264 - Luoyang, Henan;
86-136-3323-3696 - Zhangjiakou, Hebei;

@giggsey
Copy link
Owner

giggsey commented Sep 29, 2014

Okay, I can see that's an issue.

I'll work on it today.

@giggsey
Copy link
Owner

giggsey commented Sep 29, 2014

Okay, I believe I've fixed that now.

@giggsey
Copy link
Owner

giggsey commented Sep 29, 2014

The tests are failing because it's now reporting a higher memory usage. This is because it's finding the files to load (and not finding an entry for your original number, but is for the second set of numbers).

Can you check to see if it's still causing you memory issues with the latest dev-master?

@olegsklyarov72
Copy link
Author

Yes, the problem with memory return.
Allowed memory size of 36700160 bytes exhausted (tried to allocate 7864320 bytes).
It works perfectly with existing numbers and required extra memory for non-existing.

@olegsklyarov72
Copy link
Author

I have the problem with existing number also.
86-131-2270-1411 - Shanghai. It required extra memory.
But for 86-150-3657-7264 - works good.

@olegsklyarov72
Copy link
Author

I am testing with many combiantion of area codes and phone numbers. The problem only with area-code 130 - 139. These codes stored in 8613.php. By the way this is the bigest file(1.7Mb) and contains many items.

@giggsey
Copy link
Owner

giggsey commented Sep 29, 2014

We might have to store them with 5 digits instead of 4 then.

Just a note that the default memory_limit for PHP is 128M, and yours is set to 32M. ( http://php.net/manual/en/ini.core.php#ini.memory-limit ).

@olegsklyarov72
Copy link
Author

Please, could you answer what do we do now? Is your recommendation of increasing memory_limit final? The resolving of this problem is very important for us.

@giggsey
Copy link
Owner

giggsey commented Sep 30, 2014

I'm looking at splitting up the files into the first 5 digits.

giggsey added a commit that referenced this issue Sep 30, 2014
Also fix a bug with PhoneNumberToCarrierMapper with getInstance()
@giggsey
Copy link
Owner

giggsey commented Sep 30, 2014

Can you give it another try?

Both the original number, and 86-131-2270-1411 now geo-locate using less than 5MB of memory.

@olegsklyarov72
Copy link
Author

It is working perfectly now. Thanks a lot. That is a good luck for us.

@giggsey
Copy link
Owner

giggsey commented Sep 30, 2014

Excellent.

Like I said in a previous comment, it'll be in a tagged release once Google release a new version upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants