Media Cloud: Topics: Entities
=============================

At this point you have a topic created in Media Cloud - a corpus of open-news web content related to an issue you want to investigate, discovered on mulitple platforms across the internet. Topics don't expose special endpoints for examining the entities mentioned - people, places and organizations. See the "entities" notebook for more about how to investigate entities overall. All that information applies here.

To examine entities within a topic corpus, you simply call `storyTagCount` with the `timespans_id` you are interested in. This shortcut is generally useful for experimenting - you can limit any non-topic call to a timespan by quering with a `timespans_id:12345` clause.

## Setup a Connection and Some Constants

In [None]:
# Grab your API key from the environment variable and create a client for talking to Media Cloud
import os, mediacloud.api
from dotenv import load_dotenv
from IPython.display import JSON
load_dotenv()  # load config from .env file
mc = mediacloud.api.MediaCloud(os.getenv('MC_API_KEY'))
mediacloud.__version__

In [None]:
# we'll use this topic for the explanantion
SOURDOUGH_TOPIC = 4138
# find the latest snapshot
snapshots = mc.topicSnapshotList(SOURDOUGH_TOPIC)
latest_snapshot_id = snapshots[0]['snapshots_id'] # grab the id of the latest snapshot
# pull out the automatically-generated monthly timespans, and the overall one
timespans = mc.topicTimespanList(SOURDOUGH_TOPIC)
overall_timespan = [t for t in timespans if t['period'] == 'overall'][0]
monthly_timespans = [t for t in timespans if t['period'] == 'monthly']
# grab a subtopic to work with as well
focal_sets = mc.topicFocalSetList(SOURDOUGH_TOPIC)
reddit_foci_id = focal_sets[0]['foci'][0]['foci_id']
# and some timespans in the reddit subtopic
reddit_timespans = mc.topicTimespanList(SOURDOUGH_TOPIC, foci_id=reddit_foci_id)
reddit_overall_timespan = [t for t in reddit_timespans if t['period'] == 'overall'][0]
reddit_monthly_timespans = [t for t in reddit_timespans if t['period'] == 'monthly']

## Who is Being Mentioned?

In [None]:
# let's see who is being mentioned most in the corpus
import mediacloud.tags
results = mc.storyTagCount("timespans_id:{}".format(overall_timespan['timespans_id']),
                           tag_sets_id=mediacloud.tags.TAG_SET_CLIFF_PEOPLE)
[t['tag'] for t in results[:10]]

## What Organizations are Being Mentioned?

In [None]:
# let's which organizations are being mentioned most in stories about "climate change" in US national sources
import mediacloud.tags
results = mc.storyTagCount("timespans_id:{}".format(overall_timespan['timespans_id']),
                           tag_sets_id=mediacloud.tags.TAG_SET_CLIFF_ORGS)
[t['tag'] for t in results[:10]]

## What Places are Being Talked About?

In [None]:
# let's which places are being mentioned most in stories about "climate change" in US national sources
import mediacloud.tags
results = mc.storyTagCount("timespans_id:{}".format(overall_timespan['timespans_id']),
                           tag_sets_id=mediacloud.tags.TAG_SET_CLIFF_PLACES)
[t['description'] for t in results[:10]]