Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
## 3.0 (2019-05-04)
* PHP >=7.1 required.
* Cluster index is now lazy loaded. If you use iterator methods directly to iterate through all database,
you must call `rewind` first.
* Factory now creates database instances but not store them, so next call of factory method will create new instance.
* Subdivisions and languages databases are large, so they have now optimisation to load
* entries from separate files instead of one single file..

## 2.2 (2018-11-29)
* Added possibility to configure directory with databases and messages, and manually update them without updating composer
* Constants `AbstractDatabase::DATABASE_PATH` and `AbstractDatabase::MESSAGES_PATH` now contain directory name instead of directory path
Expand Down
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,45 @@ $isoCodes = new \Sokil\IsoCodes\IsoCodesFactory($databaseBaseDir);
* [Currencies database (ISO 4217)](#currencies-database-iso-4217)
* [Languages database (ISO 639-3)](#languages-database-iso-639-3)

### Factory

All databases may be create through factory:

```php
<?php
$isoCodes = new \Sokil\IsoCodes\IsoCodesFactory();
$languages = $isoCodes->getLanguages();
```

There are large databases: subdivisions and languages.
Loading of entire database into memory may require lot of RAM and time to create all entries in memory.

So there are scenarios of usage: with optimisations of memory and with optimisation of time.

#### Memory optimisation

Database splits into partition files.

Fetching some entry will load only little part of database.
Loaded entries not stored statically.

This scenario may be useful when just few entries need
to be loaded, for example on web request when one entry fetched.

This may require a lot of file read operations.

####s Input-output optimisations

Entire database loaded into memory from single JSON file once.

All entries created and stored into RAM. Next read of save
entry will just return it without io operations with files and building objects.

This scenario may be useful for daemons to decrease file operations,
or when most entries will be fetched from database.

This may require a lot of RAM for storing all entries.

### Countries database (ISO 3166-1)

Get localized name of country by it's alpha2 code:
Expand Down
104 changes: 104 additions & 0 deletions benchmarks/LanguagesBench.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
<?php
declare(strict_types=1);

namespace Sokil\IsoCodes\Databases;

use Sokil\IsoCodes\Database\Languages;
use Sokil\IsoCodes\Database\LanguagesPartitioned;

class LanguagesBench
{
public function databaseProvider(): array
{
return [
'non_partitioned' => [
'database' => Languages::class,
],
'partitioned' => [
'database' => LanguagesPartitioned::class,
],
];
}

/**
* @ParamProviders({"databaseProvider"})
* @Revs(100)
* @Iterations(1)
*/
public function benchIterator(array $params): void
{
/** @var Languages|LanguagesPartitioned $database */
$database = new $params['database'];

$database->toArray();
}

/**
* @ParamProviders({"databaseProvider"})
* @Revs(100)
* @Iterations(1)
*/
public function benchGetBySameAlpha2(array $params): void
{
/** @var Languages|LanguagesPartitioned $database */
$database = new $params['database'];

$database->getByAlpha2('sv');
$database->getByAlpha2('sv');
$database->getByAlpha2('sv');
$database->getByAlpha2('sv');
$database->getByAlpha2('sv');
}

/**
* @ParamProviders({"databaseProvider"})
* @Revs(100)
* @Iterations(1)
*/
public function benchGetByDiffAlpha2(array $params): void
{
/** @var Languages|LanguagesPartitioned $database */
$database = new $params['database'];

$database->getByAlpha2('sv');
$database->getByAlpha2('ku');
$database->getByAlpha2('ny');
$database->getByAlpha2('az');
$database->getByAlpha2('tw');
}

/**
* @ParamProviders({"databaseProvider"})
* @Revs(100)
* @Iterations(1)
*/
public function benchGetBySameAlpha3(array $params): void
{
/** @var LanguagesPartitioned|Languages $database */
$database = new $params['database'];

$database->getByAlpha2('zpz');
$database->getByAlpha2('zpz');
$database->getByAlpha2('zpz');
$database->getByAlpha2('zpz');
$database->getByAlpha2('zpz');
}

/**
* @ParamProviders({"databaseProvider"})
* @Revs(100)
* @Iterations(1)
*/
public function benchGetByDiffAlpha3(array $params): void
{
/** @var LanguagesPartitioned|Languages $database */
$database = new $params['database'];

$database->getByAlpha2('svx');
$database->getByAlpha2('kuz');
$database->getByAlpha2('nyy');
$database->getByAlpha2('azz');
$database->getByAlpha2('twy');
}

}
81 changes: 73 additions & 8 deletions benchmarks/SubdivisionsBench.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,93 @@

namespace Sokil\IsoCodes\Databases;

use Sokil\IsoCodes\IsoCodesFactory;
use Sokil\IsoCodes\Database\Subdivisions;
use Sokil\IsoCodes\Database\SubdivisionsPartitioned;

class SubdivisionsBench
{
public function databaseProvider(): array
{
$countries = \array_column(
\json_decode(
file_get_contents(__DIR__ . '/../databases/iso_3166-1.json'),
true
)['3166-1'],
'alpha_2'
);

return [
'non_partitioned' => [
'database' => Subdivisions::class,
'countries' => $countries,
],
'partitioned' => [
'database' => SubdivisionsPartitioned::class,
'countries' => $countries,
],
];
}

/**
* @ParamProviders({"databaseProvider"})
* @Revs(100)
* @Iterations(1)
*/
public function benchIterator(): void
public function benchIterator(array $params): void
{
$isoCodes = new IsoCodesFactory();
$isoCodes->getSubdivisions()->toArray();
/** @var Subdivisions|SubdivisionsPartitioned $database */
$database = new $params['database'];

$database->toArray();
}

/**
* @ParamProviders({"databaseProvider"})
* @Revs(100)
* @Iterations(2)
*/
public function benchGetAllByCountryCode(): void
public function benchGetByCodeSameCode(array $params): void
{
$isoCodes = new IsoCodesFactory();
$subDivisionDatabase = $isoCodes->getSubdivisions();
$subDivisionDatabase->getAllByCountryCode('UA');
/** @var Subdivisions|SubdivisionsPartitioned $database */
$database = new $params['database'];

$database->getByCode('UA-43');
$database->getByCode('UA-43');
$database->getByCode('UA-43');
$database->getByCode('UA-43');
$database->getByCode('UA-43');
}

/**
* @ParamProviders({"databaseProvider"})
* @Revs(100)
* @Iterations(2)
*/
public function benchGetAllByCountryCodeSameAlpha2CountryCode(array $params): void
{
/** @var Subdivisions|SubdivisionsPartitioned $database */
$database = new $params['database'];

$database->getAllByCountryCode('UA');
$database->getAllByCountryCode('UA');
$database->getAllByCountryCode('UA');
$database->getAllByCountryCode('UA');
$database->getAllByCountryCode('UA');
}

/**
* @ParamProviders({"databaseProvider"})
* @Revs(100)
* @Iterations(2)
*/
public function benchGetAllByCountryCodeDiffAlpha2CountryCode(array $params): void
{
/** @var Subdivisions|SubdivisionsPartitioned $database */
$database = new $params['database'];

foreach ($params['countries'] as $countryAlpha2) {
$database->getAllByCountryCode($countryAlpha2);
}
}

}
47 changes: 47 additions & 0 deletions bin/iso_3166-2_split.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
<?php

define('DATABASES_DIR', rtrim($argv[1], '/'));
define('SOURCE_DATABASE_PATH', DATABASES_DIR . '/iso_3166-2.json');
define('TARGET_DATABASE_DIR', DATABASES_DIR . '/iso_3166-2');

if (!is_dir(DATABASES_DIR)) {
throw new \InvalidArgumentException('Invalid databases dir specified');
}

if (!is_writable(DATABASES_DIR)) {
throw new \InvalidArgumentException('Databases dir is not writable');
}

// parse database
if (!file_exists(SOURCE_DATABASE_PATH)) {
throw new \InvalidArgumentException(sprintf(
'Database file %s not found. Please, update database',
SOURCE_DATABASE_PATH
));
}

$database = json_decode(file_get_contents(SOURCE_DATABASE_PATH), true);

$countryAlpha2ToSubdivisionsMap = [];

foreach ($database['3166-2'] as $countrySubdivision) {
[$countryAlpha2, $countrySubdivisionCode] = explode('-', $countrySubdivision['code']);
$countryAlpha2ToSubdivisionsMap[$countryAlpha2][] = [
'code' => $countrySubdivision['code'],
'name' => $countrySubdivision['name'],
'type' => $countrySubdivision['type'],
];
}

// store splitted database
if (!file_exists(TARGET_DATABASE_DIR)) {
mkdir(TARGET_DATABASE_DIR, 0775);
}

foreach ($countryAlpha2ToSubdivisionsMap as $countryAlpha2 => $countrySubdivisions) {
// save JSON file
file_put_contents(
sprintf('%s/%s.json', TARGET_DATABASE_DIR, $countryAlpha2),
json_encode($countrySubdivisions)
);
}
68 changes: 68 additions & 0 deletions bin/iso_639-3_split.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
<?php

define('DATABASES_DIR', rtrim($argv[1], '/'));
define('SOURCE_DATABASE_PATH', DATABASES_DIR . '/iso_639-3.json');
define('TARGET_DATABASE_DIR', DATABASES_DIR . '/iso_639-3');

if (!is_dir(DATABASES_DIR)) {
throw new \InvalidArgumentException('Invalid databases dir specified');
}

if (!is_writable(DATABASES_DIR)) {
throw new \InvalidArgumentException('Databases dir is not writable');
}

// parse database
if (!file_exists(SOURCE_DATABASE_PATH)) {
throw new \InvalidArgumentException(sprintf(
'Database file %s not found. Please, update database',
SOURCE_DATABASE_PATH
));
}

$database = json_decode(file_get_contents(SOURCE_DATABASE_PATH), true);

$languages = [];

foreach ($database['639-3'] as $language) {
// alpha3
$partitionFileName = substr($language['alpha_3'], 0, 2);
$languages['alpha3/' . $partitionFileName][] = [
'name' => $language['name'],
'alpha_3' => $language['alpha_3'],
'scope' => $language['scope'],
'type' => $language['type'],
'inverted_name' => $language['inverted_name'] ?? null,
'alpha_2' => $language['alpha_2'] ?? null,
];

// alpha2
if (!empty($language['alpha_2'])) {
$partitionFileName = substr($language['alpha_2'], 0, 1);
$languages['alpha2/' . $partitionFileName][] = [
'name' => $language['name'],
'alpha_3' => $language['alpha_3'],
'scope' => $language['scope'],
'type' => $language['type'],
'inverted_name' => $language['inverted_name'] ?? null,
'alpha_2' => $language['alpha_2'] ?? null,
];
}
}

// store partitioned database
if (!file_exists(TARGET_DATABASE_DIR . '/alpha2')) {
mkdir(TARGET_DATABASE_DIR . '/alpha2', 0775);
}

if (!file_exists(TARGET_DATABASE_DIR . '/alpha3')) {
mkdir(TARGET_DATABASE_DIR . '/alpha3', 0775);
}

foreach ($languages as $partitionFileName => $countrySubdivisions) {
// save JSON file
file_put_contents(
sprintf('%s/%s.json', TARGET_DATABASE_DIR, $partitionFileName),
json_encode($countrySubdivisions)
);
}
11 changes: 11 additions & 0 deletions bin/update_iso_codes_db.sh
Original file line number Diff line number Diff line change
Expand Up @@ -80,3 +80,14 @@ done
# add copyright notice
echo -e "This files is part of iso-codes library.\nSee license agreement at ${PKG_ISOCODES_REPO}" > $DATABASES_DIR/LICENSE
echo -e "This files is part of iso-codes library.\nSee license agreement at ${PKG_ISOCODES_REPO}" > $MESSAGES_DIR/LICENSE

# database postprocessing
echo -e "\033[0;32mDatabase post-processing\033[0m"

# Split ISO 3166-2 to per-country files
echo -e " * Split ISO 3166-2 database"
php $CURRENT_DIR/iso_3166-2_split.php $DATABASES_DIR

# Split ISO639-3 to chunks
echo -e " * Split ISO 639-3 database"
php $CURRENT_DIR/iso_639-3_split.php $DATABASES_DIR
Loading