# Tutorial :: Threats and opportunities in external data - the power of the news

**CONCERN**

You are working for a consultancy firm in charge of the Australian government's political image. In September 2021, the Australian government had a high-profile problem with France due to a deal to buy french submarines being called off. A report has already been generated with the titles of news items. However, your job as an analyst is to create a more thorough report taking into consideration additional information inside each news item.

In particular, your clients want to be aware of **threats** and **opportunities** suggested by the news.

1. **Q**uestion
2. **D**ata
3. **A**nalysis
4. **V**isualisation
5. **I**nsight

<img src="graphics/QDAVI_cycle_sm.png" width="50%" />

### 1. Question

How has the news affected the image of the Australian government?

**Tip:** You can combine web scraping and APIs

### 2. Data

You must use The Guardian API

**Tip:** Check the studio session and tutorial notebooks from Week 3 for information about how to call the guardian API

In [1]:
# Libraries for the analysis
import pandas as pd
import requests
import json
from bs4 import BeautifulSoup

In [2]:
# Build a search URL
baseUrl = 'https://content.guardianapis.com/search?q=' # content search

searchString = "submarine"
office = "&production-office=aus"
tag = "&tag=politics/politics"
fromDate = "&from-date=2021-09-01"
toDate = "&to-date=2021-11-30"

url = baseUrl+'"'+searchString+'"'+office+fromDate+toDate+"&api-key=test"
print(url)

https://content.guardianapis.com/search?q="submarine"&production-office=aus&from-date=2021-09-01&to-date=2021-11-30&api-key=test


In [3]:
# Call the API
response = requests.get(url)
data = json.loads(response.content)
results = data['response']['results']
results

[{'id': 'world/2021/nov/09/australia-promises-jobs-to-workers-stranded-by-scrapping-of-french-submarine-deal',
  'type': 'article',
  'sectionId': 'world',
  'sectionName': 'World news',
  'webPublicationDate': '2021-11-08T16:30:35Z',
  'webTitle': 'Australia promises jobs to workers stranded by scrapping of French submarine deal',
  'webUrl': 'https://www.theguardian.com/world/2021/nov/09/australia-promises-jobs-to-workers-stranded-by-scrapping-of-french-submarine-deal',
  'apiUrl': 'https://content.guardianapis.com/world/2021/nov/09/australia-promises-jobs-to-workers-stranded-by-scrapping-of-french-submarine-deal',
  'isHosted': False,
  'pillarId': 'pillar/news',
  'pillarName': 'News'},
 {'id': 'world/2021/oct/01/fears-australias-france-submarine-snub-could-scupper-closer-eu-economic-ties',
  'type': 'article',
  'sectionId': 'world',
  'sectionName': 'World news',
  'webPublicationDate': '2021-10-01T09:00:58Z',
  'webTitle': 'Fears Australia’s France submarine snub could scupper c

The results contain the URL to the news items on the website. After inspecting a couple of pages, which information could be easily extracted from it

In [4]:
# Get HTML function
def get_HTML(url):
    # get data from server
    response = requests.get(url)
    html = response.content
    return html

In [5]:
# Beautiful soup function for subtitle
def extract_subTitle(HTML):
    soup = BeautifulSoup(HTML, "html.parser") # the html input and the parser name
    article = soup.find("article") # the tag that contains the article
    div_element = article.find("div", attrs={"data-gu-name": "standfirst"}) # the tag that can be found using an attribute
    if div_element is not None:
        target_element = div_element.find("p")
        return target_element.text
    else:
        return ""
    

In [6]:
# Beautiful soup function for body
def extract_body(HTML):
    soup = BeautifulSoup(HTML, "html.parser") # the html input and the parser name
    article = soup.find("article") # the tag that contains the article
    div_element = article.find("div", attrs={"id": "maincontent"}) # the tag that can be found using an attribute
    if div_element is not None:
        div_div_element = div_element.find("div")
        target_elements = div_element.findAll("p")
        result = ""
        for te in target_elements:
            result += te.text
        return result
    else:
        return ""

#### Clean/preprocess data

In [7]:
# Create a dataframe
df = pd.DataFrame(columns=["Date", "Section", "Title", "Subtitle", "Body"])
df

Unnamed: 0,Date,Section,Title,Subtitle,Body


In [8]:
# Populate the dataframe
for news in results:
    html = get_HTML(news["webUrl"])
    data = {"Date": news["webPublicationDate"], "Section": news["sectionName"], "Title": news["webTitle"], "Subtitle": extract_subTitle(html), "Body": extract_body(html)}
    df_to_append = pd.DataFrame([data])
    df = pd.concat([df,df_to_append], ignore_index=True)
df

Unnamed: 0,Date,Section,Title,Subtitle,Body
0,2021-11-08T16:30:35Z,World news,Australia promises jobs to workers stranded by...,Defence industry minister Melissa Price tells ...,“Each and every” skilled shipbuilding worker a...
1,2021-10-01T09:00:58Z,World news,Fears Australia’s France submarine snub could ...,Opposition accuses Scott Morrison of failing ‘...,The postponement of trade talks between the Eu...
2,2021-09-29T06:15:33Z,Australia news,Australia tore up French submarine contract ‘f...,Shipbuilding company maintains it ‘did not fai...,Australia scrapped the $90bn submarine deal wi...
3,2021-10-29T04:32:28Z,Australia news,Macron’s anger over nuclear submarine deal lin...,Australian defence minister’s claim comes as F...,Peter Dutton says sustained expressions of out...
4,2021-10-28T04:39:00Z,World news,Australia’s foreign minister to meet French am...,Marise Payne says she regrets France’s ‘deep d...,Australia’s foreign minister will meet with th...
5,2021-11-18T16:30:10Z,World news,‘Naughty guy’: top Chinese diplomat accuses Au...,"Exclusive: Acting ambassador to Australia, Wan...",A top Chinese diplomat has likened Australia t...
6,2021-09-29T08:19:43Z,Australia news,Malcolm Turnbull excoriates Scott Morrison ove...,Former PM also revealed he plans to go to Glas...,Malcolm Turnbull has revealed he has spoken to...
7,2021-09-29T02:50:32Z,World news,Former US navy secretary now Scott Morrison’s ...,"Prof Donald Winter, who advised the Australian...",A former US navy secretary who advised the Aus...
8,2021-09-20T01:11:38Z,World news,‘We felt fooled’: France still furious after A...,"‘Maybe we’re not friends,’ recalled ambassador...",French anger at the Morrison government’s deci...
9,2021-09-17T08:35:30Z,Australia news,Game-changer or irresponsible? The known unkno...,Analysis: Stay tuned as we battle the notoriou...,"It started, as defence stories so often do, wi..."


In [9]:
df

Unnamed: 0,Date,Section,Title,Subtitle,Body
0,2021-11-08T16:30:35Z,World news,Australia promises jobs to workers stranded by...,Defence industry minister Melissa Price tells ...,“Each and every” skilled shipbuilding worker a...
1,2021-10-01T09:00:58Z,World news,Fears Australia’s France submarine snub could ...,Opposition accuses Scott Morrison of failing ‘...,The postponement of trade talks between the Eu...
2,2021-09-29T06:15:33Z,Australia news,Australia tore up French submarine contract ‘f...,Shipbuilding company maintains it ‘did not fai...,Australia scrapped the $90bn submarine deal wi...
3,2021-10-29T04:32:28Z,Australia news,Macron’s anger over nuclear submarine deal lin...,Australian defence minister’s claim comes as F...,Peter Dutton says sustained expressions of out...
4,2021-10-28T04:39:00Z,World news,Australia’s foreign minister to meet French am...,Marise Payne says she regrets France’s ‘deep d...,Australia’s foreign minister will meet with th...
5,2021-11-18T16:30:10Z,World news,‘Naughty guy’: top Chinese diplomat accuses Au...,"Exclusive: Acting ambassador to Australia, Wan...",A top Chinese diplomat has likened Australia t...
6,2021-09-29T08:19:43Z,Australia news,Malcolm Turnbull excoriates Scott Morrison ove...,Former PM also revealed he plans to go to Glas...,Malcolm Turnbull has revealed he has spoken to...
7,2021-09-29T02:50:32Z,World news,Former US navy secretary now Scott Morrison’s ...,"Prof Donald Winter, who advised the Australian...",A former US navy secretary who advised the Aus...
8,2021-09-20T01:11:38Z,World news,‘We felt fooled’: France still furious after A...,"‘Maybe we’re not friends,’ recalled ambassador...",French anger at the Morrison government’s deci...
9,2021-09-17T08:35:30Z,Australia news,Game-changer or irresponsible? The known unkno...,Analysis: Stay tuned as we battle the notoriou...,"It started, as defence stories so often do, wi..."


### 3. Analysis

Information extraction?

#### Inspect the data

Read a few articles at random to get a feel for what is important to analyse.

#### One approach - a basic sentiment analysis that looks for positive and negative words in the text

In [10]:
# Define lists of positive and negative words
positive_words = ["good", "positive", "excellent", "success"] # add words you think are good indicators
negative_words = ["bad", "poor", "negative", "disappointing"]


# Function to calculate a basic sentiment score
def analyze_sentiment(article):
    positive_count = 0
    negative_count = 0
    
    # Convert article to lowercase and split into words
    words = article.lower().split()
    
    # Count occurrences of positive and negative words
    for word in words:
        if word in positive_words:
            positive_count += 1
        if word in negative_words:
            negative_count += 1
            
    # Compute sentiment score
    sentiment_score = positive_count - negative_count
    return sentiment_score



In [11]:
# Analyze the articles

# Create a list of article bodies from the df column
article_body_texts = df["Body"].tolist()

# Loop through and run the analyze_sentiment function on each
for article_text in article_body_texts:
    
    score = analyze_sentiment(article_text)

    # Print sentiment score
    print(f"Sentiment Score: {score}")

Sentiment Score: 0
Sentiment Score: 0
Sentiment Score: -1
Sentiment Score: 0
Sentiment Score: 0
Sentiment Score: -1
Sentiment Score: -1
Sentiment Score: 0
Sentiment Score: 1
Sentiment Score: 0


#### Combine the scores and do some analysis

##### Tip: You could add them as a new column of your existing df


In [None]:
???

### 4. Visualisation

In [None]:
???

### 5. Insights

What might be some limitations of how you analysed the data?

#

# Scrape some data from the web to include as background in your final report

Use code similar to that in this week's studio session to scrape some data from the web relevant to the submarine issue.

Suggestion: A list of current Australian submarines with their names and launch dates scraped from the web like the one at https://en.wikipedia.org/wiki/Collins-class_submarine#Submarines_in_class)