<div class="alert alert-block alert-info"><b>IAB303</b> - Data Analytics for Business Insight</div>

# Finding border restrictions in the news

[The Guardian](https://www.theguardian.com/au) is a quality **open** News outlet with an easy to use [open-platform API](https://open-platform.theguardian.com).

* Explore and experiment with the [platform here](https://open-platform.theguardian.com/explore/)
* Get your own [developer API key here](https://bonobo.capi.gutools.co.uk/register/developer)

#### Load the key before anything else...

In [None]:
#load key
with open(???, 'r') as file:
    key = file.read().strip()
len(key) # check key loaded by reading its length - don't want to display the actual key!!

#### Import required libraries

In [None]:
#import required libraries
import requests
import json
import pandas as pd

#### Reuse useful functions from the studio session :)

In [None]:
# a function to build the URL

def buildUrl(search_text,office="",tag="",fromDate=""):
    baseUrl = 'https://content.guardianapis.com/search?q='
    # Only include office, tag and fromDate  if they have values
    if office:
        office = '&production-office='+office
    if tag:
        tag = '&tag='+tag
    if fromDate:
        fromDate = '&fromDate='+fromDate
    fullurl =  baseUrl+'"'+search_text+'"'+office+tag+fromDate
    print(fullurl)
    return fullurl

In [None]:
# create a function to make it easier
def getData(url,key):
    response = requests.get(url+'&page-size=50'+'&api-key='+key)
    data = json.loads(response.content)
    if data['response']['status']=='ok':
        total = data['response']['total']
        pages = data['response']['pages']
        print("Found a total of {} records, returning first of {} pages.".format(total,pages))
        print("-------------------------------------------------------")
    else:
        print("ERROR:")
        print(response)
    return data

## Business concern

There are several rumours about mandatory vaccination policy or vaccine passports.

The Tourism Business Association hires you to find insights in the news about mandatory vaccination and vaccine passports. Using The Guardian API, let's explore what relevant information can we extract about mandatory vaccination 

In [None]:
mandatory_vaccination_data = ???
mandatory_vaccination_data

In [None]:
vaccine_passports_data = ???
vaccine_passports_data

#### We want to focus only on the results that contains the news information. Let's find the results list

In [None]:
mandatory_vaccination_results = mandatory_vaccination_data[???][???]
mandatory_vaccination_results

In [None]:
vaccine_passports_rsults = vaccine_passports_data[???][???]
vaccine_passports_rsults

#### Let's merge the two results list to combine the news into a dataframe

In [None]:
merge_results = ???
merge_results

In [None]:
results_df = pd.DataFrame(???)
results_df

## Probably a sentiment analysis would be helpful (ref Tutorial 5)

In [None]:
!pip install textblob

In [None]:
from textblob import TextBlob
import re

def cleanTitle(title):
    '''
    Utility function to clean the text in a tweet by removing 
    links and special characters using regex.
    '''
    return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", title).split())

def analyseSentiment(title):
    '''
    Utility function to classify the polarity of a tweet
    using textblob.
    '''
    analysis = TextBlob(cleanTitle(title))
    if analysis.sentiment.polarity > 0:
        return 1
    elif analysis.sentiment.polarity == 0:
        return 0
    else:
        return -1

#### Create a new column with the sentiment analysis

In [None]:
results_df[???] = results_df["webTitle"].apply(lambda a: ???)
results_df

#### Calculate the total of positive, neutral and negative sentiment and their percentages

In [None]:
positive = ???
neutral = ???
negative = ???
print("Total positive sentiment titles " + str(positive))
print("Total neutral sentiment titles " + str(neutral))
print("Total negative sentiment titles " + str(negative))

In [None]:
positivePercentage = ???
neutralPercentage = ???
negativePercentage = ???
print("Positive sentiment titles percentage " + str(positivePercentage) + "%")
print("Neutral sentiment titles percentage " + str(neutralPercentage) + "%")
print("Negative sentiment titles percentage " + str(negativePercentage) + "%")

#### Let's visualise the sentiment in a pie chart

In [None]:
# For plotting and visualization
from IPython.display import display
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

labels = [???, ???, ???]
sizes = [???, ???, ???]

# Set different colors
colors = ['green', 'grey', 'red']

plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=140)
plt.axis('equal')
plt.show()

## We could also include additional information scrapping the web page (ref Tutorial 8)

In [None]:
from bs4 import BeautifulSoup

def get_HTML(url):
    response = requests.get(url)
    return response.content

def get_subtitle(url):
    try:
        html = get_HTML(???)
        soup = BeautifulSoup(???, "html.parser")
        article = soup.find(???)
        div = article.find(???, attrs={???: ???})
        p = div.find(???)
        return p.text
    except:
        return "Unable to find content"

#### Create a new column with the subtitle of the news

In [None]:
results_df[???] = ???
results_df

## Homework
Using the same principle, find the topic and include it into the dataframe