-
-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix number formatting for numeric columns #709
Conversation
3d5bf83
to
8181eb3
Compare
Actually I don't think we even need PercentageColumn and all those variations. Their magnitudes (and The Ratio column for some reason had From the numeric columns, I think we only need |
In the way I've implemented it, we also can't change the |
I actually like formatValueVeryShort, or maybe
I think they are used in the Covid Explorer? But this is a very good point, we don't have a connection yet between the authored Explorer files in
This one is a very interesting topic. I think we should go super granular--"type the world! schema.org style"--with an expanding hierarchy of types. I'm thinking we should be prepared for hundreds, if not thousands+ of types. The reason for this is training good program synthesis models becomes exponentially easier the more specific your types. So Grapher v3 someone could paste a CSV in, and we would detect a "Population" type and a "GDP" type, and could immediately suggest adding a "GDP Per Capita" column, and suggest good chart types, colors to use, etc. Authors don't have to get too specific, and if they stick with just "Number" that's fine, you would just be helping us train better models and provide better suggestions if you added that extra bit of information. |
I think (but need to check), that we could just ditch |
@danielgavrilov This looks good! What your PR still doesn't achieve, though, is that the author-provided I've briefly looked into it and the reason for this is that the map tooltip uses the axes' |
Yeah it doesn't sound bad to add it actually.
Ah good point I totally forgot that these are in the config, I was just thinking from implementation point of view. Will bring them back.
Yeah, again I missed the point. From a config point of view it's better to have as much semantic information as possible, so agree we should make super granular types. In the current setup I don't think they scale very well though, I'm already confused what each inherits from its ancestors. Can be a long chain sometimes. We could/should make these classes very "stupid" and predictable, so that you don't have to follow the inheritance chain.
There are many many units. These are just some: SELECT DISTINCT display->"$.unit"
FROM variables
WHERE display->"$.unit" IS NOT NULL We'll always need chart-level overridesOn the chart-level, I think we'll always need the option to override units. Often the units will be mentioned in the title, and the column/axis/formatting units only serve to disambiguate. E.g. on a ScatterPlot you might specify the unit in detail, but on a LineChart you'd try to be as concise as possible, as the unit is almost always in the title. We should standardize dataset-level typesI am definitely in favour of making these more standardized on the dataset-level. It doesn't make that much sense that we have the same unit expressed in many different ways across datasets. It would massively help long-term to standardize them, as we're only creating a bigger and bigger mess. It would be mostly an effort of the authors though, and those who upload datasets. So I think we need to be able to add/remove/update/browse types very easily, without touching the code each time. The hierarchy as we've started it doesn't matter so much I think, and I think we'll definitely run into more and more cases where it would become hard to decide what to inherit from, and would make the hierarchy less relevant. Primitive types with behaviour in code, semantic types in external collection?It feels to me like this might be the best way to go. We need Numeric, Percentage, Integer, String, Color, Boolean, etc types in the codebase in order to specify different behaviour. But beyond that, the types can be semantic-only and specify some config defaults (like If we want to have some associations between the semantic types, I think it should be possible to point to multiple parents, or we can just have "related types". I think a clear hierarchy is not possible, mostly because the variables themselves are often derived from multiple variables. (I think. But if we want to find out we can go through the top 50 OWID units or something, and try out strategies for associations.) So:
But overall, I think to make a semantic type library we'd need to research a bit more and find out what works and doesn't. And think a bit about what we want to get out of it. E.g. one idea I'm thinking about: Some of the units in my screenshot are derived units, e.g. |
0de2bf8
to
d7dd484
Compare
Thanks Marcel! Should be fixed now. |
d7dd484
to
912b1dc
Compare
This is handled in the ScatterPlot transformTable() now
We need super-short formatting, and this seems appropriate.
72392d3
to
5462417
Compare
I used These overrides are still complicated to follow with all the |
No longer retrievable since they are all merged together
5462417
to
14cb8a9
Compare
I just noticed this chart has wrong data on this branch, but is alright on Looks like the |
Ah great idea—already can see a good library of types.
You can set the Title to anything you want for any chart, right "title = *". So if you wanted to mention them in the title, you could? One idea for a pattern we could implement is less "chart synthesis at runtime", and more "synthesis at write time". So instead of doing switches to build things like titles, we could employ N strategies while the author is editing and provide suggested titles, which they then select and hard code into the chart. Not too sure just thinking out loud. The ability for an author to manually set title, subtitle, footer, etc, coupled with a rich column type library and an ability to cheaply add columns via "transforms", might reduce the need for a lot of display params?
I like the declarative idea you had—having declared types and less supers. Could probably whip up a quick DSL that we can use to define new types, that authors could easily edit as well.
Great plan. Makes sense.
There are a few open source projects we could look into tapping (https://fossies.org/linux/units/definitions.units). A cool relevant project is typedefs (https://news.ycombinator.com/item?id=24972271) |
I haven't looked at all the use places we use these things, but these are cheap methods to create and we don't have to worry about backwards compat, so I would say lets create more of them, look for any patterns, and then trim.
This is a good point. Moving away from these as classes, and to a more declarative style makes a lot of sense. And instead of extending, for DRY we can do something analogous to:
|
👍 I introduced I think we can leave the declarative type library for 2021 unless anyone is feeling ambitious. I will merge this since it (hopefully) fixes more things than it breaks but feel free to change! |
Fixes:
All formatting methods now accept overrides. These are necessary I think because the formatting depends on the context the number is formatted for. E.g. in the country picker we want to use short number prefixes (M = million, B = billion), but not in the tooltip. Both are using
formatValueShort
, becauseformatValueLong
uses a long unit. We could introduce something likeformatValueVeryShort
for the country picker use case, but that seems a bit much.I removed thePopulation
,PopulationDensity
andAge
columns. I don't think they are used anywhere?It probably doesn't make sense to go that granular? Age could be used in different contexts – age of the planet? of people? life expectancy? They all might need slightly different settings. I think the current numeric types + formatting config are enough.Reverted. More comments below.