Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle null item times and adjust cost thresholds in getHeroItemPopularity #2664

Merged
merged 17 commits into from
Jul 3, 2023

Conversation

ff137
Copy link
Contributor

@ff137 ff137 commented Jun 23, 2023

getHeroItemPopularity can error if item.time is null. #2663

Also adjust cost thresholds used in defining early, mid, and late game items. See review comments

@ff137
Copy link
Contributor Author

ff137 commented Jun 23, 2023

Here is the existing logic:

    const startGameItems = countItemPopularity(
      items.filter((item) => item.time <= 0)
    );
    const earlyGameItems = countItemPopularity(
      items.filter(
        (item) => item.time > 0 && item.time < 60 * 10 && item.cost > 700
      )
    );
    const midGameItems = countItemPopularity(
      items.filter(
        (item) =>
          item.time >= 60 * 10 && item.time < 60 * 25 && item.cost > 2000
      )
    );
    const lateGameItems = countItemPopularity(
      items.filter((item) => item.time >= 60 * 25 && item.cost > 4000)
    );

My first proposition is that start game items should be redefined:

    const startGameItems = countItemPopularity(
      items.filter((item) => item.time <= 0 && item.cost <= 600)
    );

because, apparently null <= 0 will return true in js. So, in case a null time is recorded for a lategame item, we can add item.cost <= 600 to filter any of those potential edge cases. 600 is the starting gold.

My other proposition is to adjust the item.cost filters used in mid and late game items. Because, 2000 and 4000 feel like really high thresholds, so will discard a lot of other items, especially for supports. My recommendation would be:

    const midGameItems = countItemPopularity(
      items.filter(
        (item) =>
          item.time >= 60 * 10 && item.time < 60 * 25 && item.cost > 1200
      )
    );
    const lateGameItems = countItemPopularity(
      items.filter((item) => item.time >= 60 * 25 && item.cost > 1800)
    );

@howardchung what do you think?

@howardchung
Copy link
Member

I feel like if we want to check for null, we should explicitly do that check rather than relying on the 600 gold filter (since that could change)

store/queries.js Outdated
@@ -315,7 +315,7 @@ function getHeroItemPopularity(db, redis, heroId, options, cb) {
.flatMap((purchaseLog) => purchaseLog.purchase_log)

.map((item) => {
const time = parseInt(item.time, 10);
const time = item.time ? parseInt(item.time, 10) : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does item?.time work?

Also I think the very next line is going to encounter the same crash when we try item.key. So maybe we should be using filter(Boolean) to remove all null items before we try doing something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item.time ? parseInt(item.time, 10) : 0 will default to 0 when item.time is null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree filtering any null would make sense. I'm just curious what would cause the null item.time. If it only ever happens for pre-game items, then defaulting to 0 would make sense. If it can happen to late game items, then the item.cost <= 600 would filter that. But would also make sense to just remove all nulls from the start, assuming it's not too prevalent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if item is null then item.key will crash. I think you want .filter(Boolean) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added .filter((item) => item !== null && item.key !== null)

Copy link
Contributor Author

@ff137 ff137 Jun 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null time will then still default to 0, with const time = item.time ? parseInt(item.time, 10) : 0;,
so if it was a pre-game purchase it'll be counted by :

    const startGameItems = countItemPopularity(
      items.filter((item) => item.time <= 0 && item.cost <= 600)
    );

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think .filter(Boolean) is simpler and does what we want there. But if you want to check item and item.key specifically then use != to include checking for undefined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add filtering to remove cases that have null item.time too. I just can't know what makes most sense, given I can't see how many cases this filters, and what causes those cases.

If item.time defaults to 0, then it can potentially only be counted in starting items, and only if it's less than 600 gold. That feels to me like an adequate solution. But I'm open to just remove null cases for time too, it won't change much

@ff137
Copy link
Contributor Author

ff137 commented Jun 23, 2023

So, I added a filter for null keys, just in case.

I'm trying to inspect cases where time is null, playing around with the /explorer. I finally got GPT to help write a query that runs, but it just times out, even with select 1.

It's basically like this:

SELECT purchase_log
FROM player_matches
WHERE purchase_log::text LIKE '%"time":null%'
LIMIT 1

which times out:
https://api.opendota.com/api/explorer?sql=SELECT%20purchase_log%20FROM%20player_matches%20WHERE%20purchase_log::text%20LIKE%20%27%25%22time%22:null%25%27%20LIMIT%201;

Anyway, sharing that in case you can query without timeout and see an example of a problematic row

store/queries.js Outdated
);
const earlyGameItems = countItemPopularity(
items.filter(
(item) => item.time > 0 && item.time < 60 * 10 && item.cost > 700
(item) => item.time > 0 && item.time < 60 * 10 && item.cost >= 600
Copy link
Contributor Author

@ff137 ff137 Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>= 600 instead of >700

Essentially will now add: Bottle (675), Voodoo Mask, Ring of Health, Voice Stone (700)

Copy link
Contributor Author

@ff137 ff137 Jun 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardchung last thing I'm wondering about is if there's any justification to reducing this threshold further, maybe to >=500, to include items like boots of speed, bracer, null talisman, wraith band, chainmail.

Reducing to >= 400 would include things like magic wands, buckler, gloves of haste, blades of attack, belt of strength, etc.

So it's not a HUGE amount of items being added by reducing threshold, but it's up to you if it makes sense to include magic wand, blades of attack, etc as "early game items". I don't see why not. I think 500 threshold makes most sense to include bracers/etc. Maybe 450 / 400 if you're okay with including wand, band of elvenskin, robe of magi, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like it may be interesting to see how common boots of speed is in first 10 minutes for a hero. Should be almost every game, but will be differences across heroes. And bracer/wraith/null builds are quite common too, would be interesting for stats to see that too.

store/queries.js Outdated
)
);
const midGameItems = countItemPopularity(
items.filter(
(item) =>
item.time >= 60 * 10 && item.time < 60 * 25 && item.cost > 2000
item.time >= 60 * 10 && item.time < 60 * 25 && item.cost >= 1200
Copy link
Contributor Author

@ff137 ff137 Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>= 1200 instead of >2000

Essentially will now add: Cornucopia, Point Booster (1200), Arcane Boots, Talisman of Evasion (1300), Claymore (1350), Aghs Shard, Platemail, Perseverance, Pavise (1400), Ghost Scepter, Oblivion Staff, Phase Boots (1500), Veil of Discord (1525), Mithril Hammer (1600), Drum of Endurance (1650), Vanguard (1700), Mask of Madness, Mekansm (1775), Dragon Lance (1900), Crystalys (1950), Hyperstone (2000).

)
);
const lateGameItems = countItemPopularity(
items.filter((item) => item.time >= 60 * 25 && item.cost > 4000)
items.filter((item) => item.time >= 60 * 25 && item.cost >= 2000)
Copy link
Contributor Author

@ff137 ff137 Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>= 2000 instead of >4000

Essentially ... will include every item between 2000 and 4000 which was previously ignored

@ff137 ff137 changed the title Handle null item purchase times Handle null item times and adjust cost thresholds in getHeroItemPopularity Jun 23, 2023
@howardchung
Copy link
Member

The LIKE query is going to be slow because there isn't a text index on that column.

@ff137 ff137 requested a review from howardchung June 25, 2023 08:59
@ff137
Copy link
Contributor Author

ff137 commented Jul 2, 2023

@howardchung thoughts on these changes?

Copy link
Contributor Author

@ff137 ff137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardchung I noticed that PublicMatchesResponse had some missing params in its definition. So, patching that in this PR

},
cluster: {
type: "integer",
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here*

@howardchung
Copy link
Member

My only blocking concern is changing the filter condition to:
.filter((item) => item && item.key)

@ff137
Copy link
Contributor Author

ff137 commented Jul 3, 2023

My only blocking concern is changing the filter condition to: .filter((item) => item && item.key)

How do you mean? I've added: .filter((item) => item !== null && item.key !== null)
Do you want to add item.time !== null?

@howardchung
Copy link
Member

That depends on the behavior we want if time is null/undefined. If we want to default it to 0, then we shouldn't add it.

But the change I'm asking for is either to just check for item && item.key or item != null && item.key != null to handle undefined (which may not be possible, but I figure it's safer)

.flatMap((purchaseLog) => purchaseLog.purchase_log)

.filter((item) => item && item.key && item.time)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardchung I edited as follows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to not need !== null check - I'm new to javascript so wasn't sure on diff between the two, but asserting truthiness is probably better than only asserting not null. Filtering by time also makes sense because should be an anomaly

store/queries.js Outdated
.flatMap((purchaseLog) => purchaseLog.purchase_log)

.filter((item) => item !== null && item.key !== null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use != here to catch undefined in addition to null.

@howardchung howardchung merged commit 50320c3 into odota:master Jul 3, 2023
@ff137
Copy link
Contributor Author

ff137 commented Jul 3, 2023

@howardchung thanks 👍

@ff137 ff137 deleted the fix/hero-item-popularity-query branch August 13, 2023 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants