Media Cloud: Measuring Attention
================================

In [None]:
# Grab your API key from the environment variable and create a client for talking to Media Cloud
import os
from dotenv import load_dotenv
from IPython.display import JSON
import mediacloud.api
load_dotenv()  # load config from .env file
mc = mediacloud.api.MediaCloud(os.getenv('MC_API_KEY'))
mediacloud.__version__

At this point you're ready to query Media Cloud for data. You can use boolean query syntax - [read our query guide](https://mediacloud.org/support/query-guide) for more details about the exact syntax (it runs a [SOLR search](https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html#the-standard-query-parser) under the hood). **This notebook demonstrates how to quickly measure attention paid to a topic of interest**.

## Attention from a Single Media Source
You can start by looking at attention from a single media source to a topic you are interested in. We have almost a million media sources in our system, but only about 100,000 of them are ones that we regularly collect stories from, via RSS feeds or more recently from their sitemaps. You can get the internal id number for any source by searching for it in our [Source Manager tool](https://sources.mediacloud.org/) and noting the ID number just under the large title on that page.

In [None]:
# check how many stories mention "modi" in the the Hindustan Times (media id #39872)
my_query = 'modi and media_id:39872'
mc.storyCount(my_query)

In [None]:
# see how many stories we have mentioning "modi" from the Hindustan Times in 2019
# note that the publish_day clause *must* be in the second argument (because of the way we have set up our DB)
mc.storyCount(my_query, 'publish_day:[2019-01-01T00:00:00Z TO 2019-12-31T00:00:00Z]')

In [None]:
# but that date syntax is kind of ugly, so we have a helper to produce it for you from python dates
import datetime
start_date = datetime.date(2019,1,1)
end_date = datetime.date(2019,12,31)
date_range_2019 = mc.publish_date_query(start_date, end_date)
mc.storyCount(my_query, date_range_2019)

In [None]:
# you can see this over time by using the `split` argument
# this defaults to results by day, but you can pass a split_period in (day, week, month, or year)
mc.storyCount(my_query, date_range_2019, split=True, split_period='month')
# JSON(mc.storyCount(my_query, date_range_2019, split=True, split_period='month'))

### Normalizing within a Source

Looking at absolute attention at the story level is intriguing, but you probably want to normalize this in some way to support comparisons between sources. To do this, we typically compare attention to the total number of stories we have from a source within that same timespan.

In [None]:
# check how many stories about "Aadhaar" (India's unique id system) we have from the the Hindustan Times (media id #39872)
relevant_stories = mc.storyCount(my_query, date_range_2019)
total_stories = mc.storyCount('media_id:39872', date_range_2019)
source_ratio = relevant_stories['count'] / total_stories['count']
"{:.2%} of stories from Hindustan Times in 2019 mentioned Modi ".format(source_ratio)

### Research Within a Country - using collections

[We have wide global coverage](https://sources.mediacloud.org/#/collections/country-and-state), with sources published in a country grouped into collections. For many of these countries we also have collections of media sources published in the various states and provinces. Lets compare the source-level attention to country-level attention Modi received in 2019.

In [None]:
# how many stories mentioned "modi" in our collection of country-level Indian media sources
india_query = 'modi and tags_id_media:34412118'
mc.storyCount(india_query, date_range_2019)

In [None]:
# lets normalize this attention in the same way as we did previsouly
relevant_stories = mc.storyCount(india_query, date_range_2019)
total_stories = mc.storyCount('tags_id_media:34412118', date_range_2019)
india_country_ratio = relevant_stories['count'] / total_stories['count']
"{:.2%} of stories from national-level Indian media sources in 2019 mentioned Modi".format(india_country_ratio)

In [None]:
# now we can compare this to the source-level coverage
coverage_ratio = source_ratio / india_country_ratio
"Modi received {:.2} times as much coverage in the Hindustan Times than you might expect based on other Indian papers".format(coverage_ratio)

In [None]:
# or compare to another country and their president, such as trump in the US
relevant_stories = mc.storyCount('trump and tags_id_media:34412234', date_range_2019)
total_stories = mc.storyCount('tags_id_media:34412234', date_range_2019)
us_country_ratio = relevant_stories['count'] / total_stories['count']
print("{:.2%} of stories from national-level US media sources in 2019 mentioned Trump".format(us_country_ratio))
coverage_ratio = us_country_ratio / india_country_ratio
"Trump received {:.2} times the national coverage that Modi did (in their respective countries)".format(coverage_ratio)