# A3: Political Bias
In this assignment, you will be trying to calcualate the average political bias and reliability of tweets in tweet searches.

We have already provided you code that goes through all the tweets in a search and finds the political bias and reliability of url web addresses mentioned in the tweets. You will need to add loop variables to calculate these averages (see chapter 8 practice and demos).

Then you will need to try your code with some different searches, and then you will answer some reflection questions on them.

First, we'll do our normal twitter login steps (though we'll not use fake-tweepy for this assignment, only real twitter).

## Tweepy Setup

In [None]:
import tweepy

In [None]:
import bot_keys

In [None]:
# log into tweepy
client = tweepy.Client(
    bearer_token=bot_keys.bearer_token,
    consumer_key=bot_keys.consumer_key, consumer_secret=bot_keys.consumer_secret,                   
    access_token=bot_keys.access_token, access_token_secret=bot_keys.access_token_secret
)

## Load Bias and Reliability Info
Our measure of media website bias and reliability is based on the Media Bias Chart 9.0 (https://adfontesmedia.com/static-mbc)

I have made a file with some of the sites shown on that chart. I made my own simplification based on the grid in the graphic with bias ranging from -4 to +4, and reliability from -4 to +2 as follows:
![Media bias chart with grid, showing the range labels](./imgs/bias_chart_divisions.jpg)


Note: If you want to look more at the file, it is saved as `media_info.csv`(a comma separated variable file).
- Each row of text is the info for one media site. For example:
`wsws.org,-3,-1`
Means that the site wsws.org has a bias of -3 (Hyper-Partisan Left) and a reliability of -1 (Opinion OR High Variation in Reliability). 
- You can open the by right-clicking media_info.csv and selecting Open With -> Editor. You can then modify entries, or add new ones for more sites (like from the interactive media bias chart here: https://adfontesmedia.com/interactive-media-bias-chart/). Then save it and rerun this code and the code below.

The code below loads the bias and reliability info and makes three things that the code will use later:
- a `find_matching_site(url)` function, which given a url to an article or something, tries to find the site it's from (e.g., "https://www.npr.org/sections/codeswitch/" is from the site "npr.org"
- `media_bias_lookup` a dictionary that when given a site (like "npr.org"), finds the bias for the site (for npr.org: -1)
- `media_reliability_lookup` a dictionary that when given a site (like "npr.org"), finds the reliability for the site (for npr.org: 1)

In [None]:
# load the pandas code library, which helps us use the csv file
import pandas as pd

# load the media_info.csv file
media_info_df = pd.read_csv('media_info.csv')

# get a list of the sites we have information for in the media_info.csv
media_sites = media_info_df['site']

def find_matching_site(url):
    for site in media_sites:
        if site in url:
            return site   

# make a lookup dictionary, where they key is the site, and the value is the bias for that site
media_bias_lookup = {m_info['site']: m_info['bias'] for i, m_info in media_info_df.iterrows()}

# make a lookup dictionary, where they key is the site, and the value is the reliability for that site
media_reliability_lookup = {m_info['site']: m_info['reliability'] for i, m_info in media_info_df.iterrows()}

## TODO: Modify the code below (Run Search)
The code below runs a search on twitter, and the loops through each tweet. For each of the tweets it then loops through all the url web addresses in that tweet, and if it can find a match for the site it is from, calculates the bias and reliability for that site.

__Your job__ is to add loop variables to the code to calculate the number of matched urls and then the total bias and total reliability for those urls. Then you can use that at the end to calculate the average bias and average reliability.


In [None]:
# The search query starts with the search term and then has options to
#   make sure it isn't a retweet and that there are url links in the tweet that aren't for twitter.com
query = 'senate   -is:retweet has:links -url:twitter.com'

max_results = 20 # You can go up to 100 tweets at time in your search
    
tweet_search_results = client.search_recent_tweets(query=query, max_results=max_results, tweet_fields=["entities"])


#### TODO: make three loop variables here: ####
#   number_matched_urls
#   total_url_bias
#   total_url_reliability


# go through all the tweets
for tweet in tweet_search_results.data:
    
    print(tweet.text)

    # go through all the urls in the tweet
    for url in tweet.entities['urls']:
        # get the "unwound" url, so it isn't a shortened url like t.co/Drh9zcdEQz
        unwound_url =url['unwound_url']

        print("url: " + url['unwound_url'])

        # try to find the url site from the sites in our media_info.csv file
        matching_site = find_matching_site(unwound_url)

        # if we found the matching site, then we have info for it
        if(matching_site):
            
            # look up the bias and reliability for the site the url is from
            url_bias = media_bias_lookup[matching_site]
            url_reliability = media_reliability_lookup[matching_site]

            #### TODO: Update the three loop variables here #### 

            print("  bias: " + str(url_bias))
            print("  reliability: " + str(url_reliability))
        else:
            # We didn't have info on this site
            print("**did not recognize site!")

    print()
        

####  TODO: Use the loop variables to calculate the total number of urls #### 
#  we cold measure and then the average bias and reliability for those
# Then display them with the print statements below

print("--------------------------------------")
print("Total number of urls we could measure: ")
print("Average bias: ")
print("Average reliability: ")

## Reflection tasks

Once you get the code above working and finding an average bias and reliability, modify the search (replace the word "senate") to try at least three more searches, looking over the tweets and the final results, and answer the questions below.

Note: For searches, you can search for political terms, or news terms, or even phrases associated with different communities, like "Soros", "cult", "#censorship", etc.

1. What additional searches did you run (at least 3)?

TODO: Answer the question here

2. When doing those searches, what were your observations about the calculations of media bias and reliability? (For example: were there a lot of urls that you didn't measure? Do you feel like the final calculated bias and reliability match the search results?). Answer with at least 3 sentences

TODO: Answer the question here with at least 3 sentences

3. If you could redesign the Media Bias Chart, what would you want to do (e.g., add some other dimension besides just bias/responsibility, change how it is evaluated, add more news sources, consider different countries)? Answer with at least 3 sentences.

TODO: Answer the question here with at least 3 sentences

4. What might a social media companies or advertizers (including political campaigns) want to do with information on a users' political views and susceptibility to consipracy theories? Answer with at least 3 sentences.

TODO: Answer the question here with at least 3 sentences

5. Choose two ethics frameworks and use the frameworks to consider the different uses of the media bias and reliability information. Answer with at least 6 sentences.

TODO: Answer the question here with at least 3 sentences