Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

recent stories missing tags (not getting processed fully?) #725

Open
rahulbot opened this issue Sep 3, 2020 · 2 comments
Open

recent stories missing tags (not getting processed fully?) #725

rahulbot opened this issue Sep 3, 2020 · 2 comments
Assignees
Labels
bug data-quality Not urgent but good to keep track of

Comments

@rahulbot
Copy link
Contributor

rahulbot commented Sep 3, 2020

I noticed that recent stories don't have any tags on them. Perhaps some services aren't running as we transition?

q = '*'
fq = mc.dates_as_query_clause(dt.date(2020,8,20), dt.date(2020,8,24))
tag_sets_id = mediacloud.tags.TAG_SET_NYT_THEMES_VERSION
# all stories
total = mc.storyCount(q, fq)['count']
# stories with nyt themes
with_themes = sum([t['count'] for t in mc.storyTagCount(q, fq, tag_sets_id=tag_sets_id)])
"{:.2%} stories have been processed for themes".format(with_themes/total)

This prints out that just 36% of stories between 8/20 and 8/24 have been processed by the theme engine. Of course we can go back and reprocess them, but this will skew results people see in certain Explorer and Topic Mapper widgets.

If I run the same thing with mediacloud.tags.TAG_SET_GEOCODER_VERSION to see how many have been run through CLIFF, I get the same result - 36%.

@rahulbot rahulbot added bug data-quality Not urgent but good to keep track of labels Sep 3, 2020
@rahulbot

This comment has been minimized.

@pypt
Copy link
Contributor

pypt commented Sep 29, 2020

Moved the second complaint to #729.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug data-quality Not urgent but good to keep track of
Projects
None yet
Development

No branches or pull requests

3 participants