Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store 2-letter country codes in database #113

Merged
merged 3 commits into from Jul 15, 2020
Merged

Conversation

liZe
Copy link
Contributor

@liZe liZe commented May 14, 2020

This PR includes these changes:

  • DB migration allowing both alpha2 and alpha3 codes in PostgreSQL,
  • SQLAlchemy changes allowing only alpha2 alpha2 and alpha3 codes when inserting new records,
  • CLI commands added for country code migration in ping_v1 and activation_v1.

https://phabricator.endlessm.com/T29210

@liZe liZe requested review from wjt and adarnimrod May 14, 2020 14:33
@liZe liZe force-pushed the lize/country-alpha-2 branch 2 times, most recently from 2d77bb9 to b46f3e3 Compare May 14, 2020 15:31
@liZe liZe force-pushed the lize/country-alpha-2 branch 2 times, most recently from 6c6b0b9 to 57f74ee Compare May 20, 2020 08:20
@liZe
Copy link
Contributor Author

liZe commented May 20, 2020

This PR has been updated to allow both alpha2 and alpha3 codes for new rows. We can keep this version on production, update eos-activation-server to send alpha2 codes, and then merge the lize-country-alpha-2-only branch.

@liZe
Copy link
Contributor Author

liZe commented May 20, 2020

⚠️ We should wait for #112 to be merged before changing the Alembic deployment version and merging this PR. ⚠️

@liZe
Copy link
Contributor Author

liZe commented Jun 9, 2020

We should wait for #112 to be merged before changing the Alembic deployment version and merging this PR.

This PR includes migrations for ping and activation, and #112 includes a migration for metrics. It means that we don’t need to rebase these deployments, and that we can deploy #112 and #113 in any order.

Copy link
Contributor

@adarnimrod adarnimrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I tested it in the test environment (where there are no pings nor activations) and in the dev environments (with just 2 activations and 3 pings). My only concern is the migration command of ping_v1 in production where there are 23M records, but I don't think that trying to do that in raw SQL is better.

azafea/event_processors/endless/country.py Outdated Show resolved Hide resolved
azafea/event_processors/endless/country.py Outdated Show resolved Hide resolved
azafea/event_processors/endless/country.py Outdated Show resolved Hide resolved
Comment on lines 150 to 154
for chunk_number, chunk in enumerate(query, start=1):
for ping in chunk:
ping.country = tranform_alpha_3_into_2(ping.country)
dbsession.commit()
progress(chunk_number * args.chunk_size, num_records)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we have an issue with another similar migration recently where this proved extremely slow? Might it perform better to do something like (pseudocode):

for alpha_3, alpha_2 in COUNTRIES.items():
    dbsession.execute("UPDATE activation_v1 SET country = ? WHERE country = ?", alpha_2, alpha_3)

which would allow the work to live in the database, rather than having to stream all 23 million rows (in chunks) in and out of the database?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can try it and see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see @adarnimrod had the same observation above. I would naively expect the UPDATE to perform better. I could be wrong!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can try it and see.

I can try on the test database.

tranform_alpha_3_into_2

Oops, there’s a typo…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration takes about 3.5 days, it’s too long.

We can generate one request for each country value, that’s dirty but should be fast enough. I can try to find a better solution if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one UPDATE per country code is a clean solution, personally :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one UPDATE per country code is a clean solution, personally :)

😄

The new solution takes 10 minutes for pings. Tests pass, the result seems to be OK, but I’d prefer to have another review, just to be sure…

Countries are stored in ping_v1 and activation_v1 tables. As many third-party
tools use 2-letter ISO codes, we use these commands to store alpha2 instead of
alpha3 codes.
Copy link
Member

@wjt wjt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@adarnimrod adarnimrod merged commit 45714b4 into master Jul 15, 2020
@adarnimrod adarnimrod deleted the lize/country-alpha-2 branch July 15, 2020 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants