# A4: Political Bias
In this assignment, you will be trying to calculate the average political bias and reliability of posts on Bluesky.

The code you are starting with here already does a search on a Bluesky and finds the reliability and political bias of url web addresses posted there. You will need to add loop variables to calculate these averages (see chapter 8 practice and demos).

After you get the averages to work, you wll then try your code with other search terms, and then you will answer some reflection questions.

First, we'll do our normal Bluesky login steps.

In [None]:
from atproto import Client
%run bluesky_keys.py
client = Client(base_url="https://bsky.social")
client.login(handle, password)

## Load Bias and Reliability Info

The code below saves some bias and reliability values. You can look at the code below you are interested, but you are not required to.

The part you need to know is that the measures of website bias and reliability are based on the [Media Bias Chart](https://adfontesmedia.com/) (old version 9.0). We took their ratings of reliability and bias and simplified them into a scale with bias ranging from -4 (extremely liberal) to +4 (extremely conservative), and reliability from +2 (fact reporting) to -4 (fabricated or inaccurate). We then chose a few of the more common websites to let us look up info. 
![Media bias chart with grid, showing the range labels](./imgs/bias_chart_divisions.jpg)

In [4]:
#This code saves bias and reliability info in lookup dictionaries
# for different websites.
# If you want to, you can update the values to make
# them match the current chart, or you can add additional
# sites to the data (just make sure you add it to both
# the bias and reliability dictionaries)
media_bias_lookup = {
    "pbs.org": -1,
    "apnews.com": -1,
    "reuters.com": -1,
    "abc.com": -1,
    "abcnews.go.com": -1,
    "bbc.co.uk": -1,
    "bbc.com": -1,
    "npr.org": -1,
    "cnn.com": -2,
    "wsj.com": 1,
    "nbc.com": -2,
    "nbcnews.com": -2,
    "nytimes.com": -2,
    "theguardian.com": -2,
    "washingtonpost.com": -2,
    "vice.com": -2,
    "vox.com": -2,
    "usatoday.com": -1,
    "rasmussenreports.com": 2,
    "huffpost.com": -2,
    "msnbc.com": -2,
    "newrepublic.com": -3,
    "salon.com": -3,
    "theweek.com": -2,
    "freebeacon.com": 2,
    "foxnews.com": 2,
    "reason.com": 2,
    "nypost.com": 2,
    "dailywere.com": 2,
    "nationalreview.com": 2,
    "theblaze.com": 2,
    "dailykos.com": -3,
    "wsws.org": -3,
    "tyt.com": -3,
    "distractify.com": -2,
    "bipartisanreport.com": -3,
    "hartmannreport.com": -3,
    "palmerreport.com": -1,
    "vetarnstoday.com": -2,
    "dailymail.co.uk": 1,
    "dailycaller.com": 2,
    "breitbart.com": 2,
    "newsmax.com": 2,
    "oann.com": 2,
    "aclj.org": 2,
    "tpusa.com": 2,
    "infowars.com": 3
}

media_reliability_lookup = {
    "pbs.org": 2,
    "apnews.com": 2,
    "reuters.com": 2,
    "abc.com": 1,
    "abcnews.go.com": 1,
    "bbc.co.uk": 1,
    "bbc.com": 1,
    "npr.org": 1,
    "cnn.com": 1,
    "wsj.com": 1,
    "nbc.com": 1,
    "nbcnews.com": 1,
    "nytimes.com": 1,
    "theguardian.com": 1,
    "washingtonpost.com": 1,
    "wapo": 1,
    "vice.com": 1,
    "vox.com": 1,
    "usatoday.com": 1,
    "rasmussenreports.com": 1,
    "huffpost.com": 0,
    "msnbc.com": 0,
    "newrepublic.com": 0,
    "salon.com": 0,
    "theweek.com": 0,
    "freebeacon.com": 0,
    "foxnews.com": 0,
    "reason.com": 0,
    "nypost.com": 0,
    "dailywere.com": 0,
    "nationalreview.com": 0,
    "theblaze.com": 0,
    "dailykos.com": -1,
    "wsws.org": -1,
    "tyt.com": -1,
    "distractify.com": -1,
    "bipartisanreport.com": -1,
    "hartmannreport.com": -1,
    "palmerreport.com": -1,
    "vetarnstoday.com": -2,
    "dailymail.co.uk": -1,
    "dailycaller.com": -1,
    "breitbart.com": -1,
    "newsmax.com": -1,
    "oann.com": -1,
    "aclj.org": -2,
    "tpusa.com": -2,
    "infowars.com": -3
}

def find_matching_site(url):
    for site in media_sites:
        if site in url:
            return site   

media_sites = media_bias_lookup.keys()

In [6]:
# This helper function will help us look up full urls, since some posts come with shortened ones
# It has a bunch of checks to timeout on DNS connections and reads, and tries 3 times to find the URL
# If it fails, it just gives up and gives you the short url
import requests
from time import sleep

def get_full_url(short_url, timeout=(2, 5), retries=3):
    for attempt in range(retries):
        try:
            response = requests.head(short_url, allow_redirects=True, timeout=timeout)
            return response.url
        except requests.Timeout:
            print(f"Timeout on attempt {attempt + 1} for URL: {short_url}")
            if attempt < retries - 1:
                sleep(1)  # Optional: wait before retrying
        except requests.RequestException as e:
            print(f"Error retrieving URL: {e}")
            return short_url
    return short_url  # Return the original if all attempts fail

## Get a list of search results from Bluesky

We will now get a list of results from Bluesky. To start with, we'll try to get 100 with the search term "Seattle."


In [16]:
# The "params" variable is what we send as a search request to bluesky
# 'q' is the search query; edit this if you want to try different search terms
# 'limit' is the maximum number of posts you'll extract. 
params = {
        'q': "Seattle",
        'limit': 100
    }

posts = client.app.bsky.feed.search_posts(params=params)['posts']

## TODO: Modify the code below (Run Search)
The code below loops through each bluesky post, and if the submission was a website url, the program checks to see if we have reliability/bias info on the site. If we have that info we calculate the bias and reliability and display it.

__Your job__ is to add loop variables to the code to calculate the number of urls we had info for (`number_matched_urls`) and then the total bias and total reliability for those urls. Then you can use that at the end to calculate the average bias and average reliability.


In [None]:
#### TODO: Create your loop variables here

# Go through all the posts
for p in posts:
    embed = p['record']['embed']
    
    # Check if the post has an external link
    if hasattr(embed, 'external') and hasattr(embed['external'], 'uri'):
        url = embed['external']['uri']

        #Need to check if we can find the full url, so our lookup is more accurate
        #Note 
        print("Checking for full url...")
        full_url = get_full_url(url)
        print(full_url)
        # try to find the source website in our dataset
        matching_site = find_matching_site(full_url)

         # if we found the matching site, then we have info for it
        if(matching_site):
            
            # look up the bias and reliability for the site the url is from
            url_bias = media_bias_lookup[matching_site]
            url_reliability = media_reliability_lookup[matching_site]

            #### TODO: Update the three loop variables here #### 

            print("  bias: " + str(url_bias))
            print("  reliability: " + str(url_reliability))
        else:
            # We didn't have info on this site
            print("**did not recognize site!")

#### TODO: calculate the averages below and output the total and averages
# Note: It's ok if the code gives an error when no urls are found
#  (since trying to find the average might cause a divide by 0 error)    

print("--------------------------------------")
print("Total number of urls we could measure: ")
print("Average bias: ")
print("Average reliability: ")
        

## Reflection tasks

Once you get the code above working and finding an average bias and reliability, modify the search to try at least three different searches. Open up the bluesky separately and try the same searches look at your results, then answer the questions below.

Note: For searches, you can search try different search terms that might have different views and post links to news articles, like: "news", "science", "politics", "liberal", "conservative", "tech", "BlackLivesMatter", etc.

1. What additional searches did you run (at least 3)?


TODO: Answer question here

2. When doing those searches, what were your observations about the calculations of media bias and reliability? (For example: were there a lot of urls that you didn't measure? Do you feel like the final calculated bias and reliability match the search results?). Answer with at least 3 sentences

TODO: Answer question here

3. If you could redesign the Media Bias Chart, what would you want to do (e.g., add some other dimension besides just bias/responsibility like other [Political Spectrums](https://en.wikipedia.org/wiki/Political_spectrum), change how it is evaluated, add more news sources, consider different countries)? Answer with at least 3 sentences.

TODO: Answer question here

4. What might a social media companies or advertizers (including political campaigns) want to do with information on a users' political views and susceptibility to consipracy theories? Answer with at least 3 sentences.

TODO: Answer question here

5. Choose two ethics frameworks and use the frameworks to consider the different uses of the media bias and reliability information. Answer with at least 6 sentences total (e.g., 3 per framework).

TODO: Answer question here