Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Remove country name from Location field #224

Open
jonafato opened this issue Jun 12, 2024 · 6 comments
Open

RFC: Remove country name from Location field #224

jonafato opened this issue Jun 12, 2024 · 6 comments

Comments

@jonafato
Copy link
Collaborator

jonafato commented Jun 12, 2024

I'd like two remove the country name from the Location field. This field is both redundant with the more-sepcific Country field and inconsistently used (e.g. sometimes not used, sometimes with spelling variations like "USA" vs. "United States of America". I'm opening this issue first to:

  1. get feedback in case there are cases where the country information cannot be expressed with the three-letter country code
  2. identify any tooling that uses this repository that would not handle this change gracefully or would need additional updates in order to do so
  3. see if this would actually be a net-negative change, e.g. because ISO 3166-1 alpha-3 country codes are machine-readable but are not necessarily obvious to people reading the CSV files directly
@invisibleroads
Copy link
Member

invisibleroads commented Jun 12, 2024 via email

@invisibleroads
Copy link
Member

I think my rationale in the past for being more specific about location in certain cases was to reduce ambiguity.

I agree removing the country from the location might be a good idea since we already have the three digit code.

I just want to make the point that the location should be specific enough to remove ambiguity, for example, if a country has two cities with the same name like Lexington, Kentucky vs Lexington, Massachusetts.

@jonafato
Copy link
Collaborator Author

I just want to make the point that the location should be specific enough to remove ambiguity, for example, if a country has two cities with the same name like Lexington, Kentucky vs Lexington, Massachusetts.

I agree with this, and I'm not suggesting that we remove state / province / etc. kind of details, just the country information that's already stored in a dedicated field.

@invisibleroads
Copy link
Member

Your third point is valid. Some of the three digit codes are not immediately clear. It would take away from the readability of the CSV especially if some people refer directly to the github repository and not a third party calendar or website

@JesperDramsch
Copy link
Contributor

As a downstream user of the CSVs it would be a minor inconvenience as I have to adjust my scripts, but since I already have to do a bunch of data cleaning anyways, it would just be adjusting my scripts.

I agree with most points, but to add something of substance:

On the positive side, this would also circumvent "data problems" around the self-determination of countries, such as Turkiye asking not to be called Turkey and Czechia asking to rather not be called the Czech Republic.

On the negative side, PyCon DE, with the 3-letter code DEU, would be thoroughly confusing for most people who don't already know.

So, I have to say, as long as the data is consistent across the data set, it'd probably be okay. But if it changes halfway through the 2024-file, I'd probably struggle slightly downstream.

@jonafato
Copy link
Collaborator Author

On the positive side, this would also circumvent "data problems" around the self-determination of countries, such as Turkiye asking not to be called Turkey and Czechia asking to rather not be called the Czech Republic.

This would be another benefit of the benefits of this change. A repository covering a set of global conferences is already going to encounter language and translation issues, so this would remove one point of confusion.

On the negative side, PyCon DE, with the 3-letter code DEU, would be thoroughly confusing for most people who don't already know.

This is mostly a question of how end-users are consuming this data. Automated tooling perform lookups (e.g. we use https://pypi.org/project/iso3166/ for some CI here), and I would imaging most conference participants are either familiar with their local events or fine with clicking through to the conference website. This is good feedback, though, and the reason that I'm opening this issue up for discussion.

So, I have to say, as long as the data is consistent across the data set, it'd probably be okay. But if it changes halfway through the 2024-file, I'd probably struggle slightly downstream.

Any change implemented here would be a global update in a single commit. As long as tools are able to deal with a new version of the data set, they shouldn't need to worry about supporting mixed formats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants