# Analyzing Climate Talk
## Studying the Impact of Rhetorical Strategies in Environmentally-Related Discourse

In the complex fight against climate change, effective communication is key. Understanding how different rhetoric is used to shape public perception and spur action can make all the difference. I want to find which types of language are most indicative of propaganda versus scientifically backed information to gauge their influence on public understanding of climate change.

From understanding the ways propaganda and scientifically backed works communicate their messages, we can learn how to better inform and engage the public. Analyzing the language used in these different types of rhetoric will help us understand the divides that influence how climate change is perceived and acted upon. My research intends to break these divisions and instead promote a more informed and unified approach to understanding environmental issues.

By not only addressing an urgent global challenge but also using the power of digital humanities to find the intricate ways in which language shapes our world, my study aims to contribute to more effective climate communication strategies for greater public awareness and action.

The primary motivation for pursuing this research question came from completing Assignment 05, where we learned how to use Natural Language Processing to create strong data visualizations. I wanted to apply these tools to a new dataset on a topic I feel deeply about. I believe that understanding the severity of global warming is crucial, and one of the issues is how we discuss it. The language we use often fails to convey the urgency of the situation, and I want to explore how this affects public perception and potential solutions.

All my data will be compiled from the online Climate section of [The Atlantic](https://www.theatlantic.com/), because after testing many reputable publications, I found that it has fewer restrictions on web scraping. I will use Python packages specializing in web scraping, such as Beautiful Soup and Requests, to parse HTML documents and extract links to various articles. Once I have access to these articles, I will use Pandas to organize the data into an analyzable format. For text processing, I will utilize NLTK, and for creating visualizations, I will use Matplotlib.

### DATA COLLECTION:

In [7]:
import requests
from bs4 import BeautifulSoup

The first step in our analysis was to set up the web scraping process. To retrieve data from the web, I imported two libraries, `requests` to retrieve the HTML code from The Atlantic's [Climate](https://www.theatlantic.com/category/climate/) webpage and `BeautifulSoup` to parse the HTML and find links to various articles on the page.

In [8]:
response = requests.get('https://www.theatlantic.com/category/climate/')
html_string = response.text

doc = BeautifulSoup(html_string, 'html.parser')
article_list = []
    
for li in doc.select('ul li a'):
        href = li.get('href')
        if href and '/archive' in href:
            article_list.append(href)
          
for a in doc.find_all('a', href=True):
    href = a['href']
    if '/archive' in href:
        article_list.append(href)

By searching for all occurrences of anchor tags within lists and also all anchor tags on the page, we created a variable `article_list` that contains a list of links to the latest Atlantic articles on climate. To avoid irrelevant links to images and other navigation, we filtered specifically for links that contain `/archive` in their `href` attribute.

In [9]:
def getArticleText(article):
  response = requests.get(article)
  doc = BeautifulSoup(response.text, 'html.parser')
  
  article_body = doc.find_all(class_='ArticleParagraph_root__4mszW')
  article_text = ''
  
  for paragraph in article_body:
    article_text += paragraph.get_text() + '\n'
    
  return article_text

I have defined a function `getArticleText` to open a particular url and return its paragraph text in a string form without unnecessary headers or links.

In [10]:
import random

sampled_articles = random.sample(article_list, 6)

Because [The Atlantic](https://www.theatlantic.com/) has an abundance of written works, I imported Python's `random` module to take a random sample of 6 article URLs from the list to control my data analysis.

In [15]:
import pandas as pd

articleDF = pd.DataFrame(columns=['url', 'text'])

df_list = []

for article_url in sampled_articles:
  if article_url.startswith('/'):
    article_url = 'https://www.theatlantic.com' + article_url
  article_text = getArticleText(article_url)
    
  df_list.append(pd.DataFrame({'url': [article_url], 'text': [article_text]}))

articleDF = pd.concat(df_list, ignore_index=True)

I have now imported the `pandas` library to set up for data manipulation and analysis. I then set up a for-loop that retrieved the text using the `getArticleText` function defined earlier. To organize all the data and prevent overwriting variables with each iteration, I created a list to store individual data frames for each article's `article_text` and `article_url`. After collecting all the data frames, I concatenated them into a single data frame to store the information for all articles in the sample.

### TEXT PROCESSING:

In [3]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.sentiment import vader
from nltk.corpus import stopwords
from nltk.corpus import opinion_lexicon
from nltk.stem.porter import PorterStemmer

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('vader_lexicon')
nltk.download('opinion_lexicon')

[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /home/jovyan/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
[nltk_data] Downloading package opinion_lexicon to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package opinion_lexicon is already up-to-date!


True

To begin the data exploration portion of my final project, I have imported the Natural Language Toolkit (`NLTK`) and its various modules to help with starting a more focused analysis.

In [21]:
stop_words = (stopwords.words('english'))
senti = vader.SentimentIntensityAnalyzer()

def processText(text):
  tokenized_text = word_tokenize(text)
  noStop_text = [word for word in tokenized_text if word.lower() not in stop_words]
  
  stemmed_text = [PorterStemmer().stem(word) for word in noStop_text]
  return stemmed_text

articleDF['processed'] = articleDF['text'].apply(processText)

Above, I began by creating two variables to help me with syntax in using NLTK modules. I then defined a function to process the raw text from [The Atlantic](https://www.theatlantic.com/) articles by tokenizing the string, removing stop words, and words with the same stem.

### ANALYSIS + VISUALIZATION:

In [31]:
import matplotlib.pyplot as plt

# compares total frequency of positive and negative words
def PosNeg(text):
  pos_count = 0
  neg_count = 0

  for word in text:
    score = senti.polarity_scores(word)
    if score['compound'] > 0:
      pos_count += 1
    elif score['compound'] < 0:
      neg_count += 1

  return pos_count, neg_count

The `PosNeg` function takes in a given text and calculates the total number of positive and negative words by using the VADER sentiment analyzer to score each word, incrementing variables `pos_count` for positive scores and `neg_count` for negative scores. The function returns the counts of positive and negative words, visualizing information into the general sentiment of the text.

In [30]:
# compares 15 most common positive and negative words
def PosNegFreq(text):
  pos_words = []
  neg_words = []
  
  for word in text:
    score = senti.polarity_scores(word)
    if score['compound'] > 0:
      pos_words.append(word)
    elif score['compound'] < 0:
      neg_words.append(word)

  pos_common = nltk.FreqDist(pos_words).most_common(15)
  neg_common = nltk.FreqDist(neg_words).most_common(15)
  
  pos = []
  pos_freq = []
  for i in pos_common:
    pos.append(i[0])
    pos_freq.append(i[1])
    
  neg = []
  neg_freq = []
  for i in neg_common:
    neg.append(i[0])
    neg_freq.append(i[1])
    
  return pos, pos_freq, neg, neg_freq

The `PosNegFreq` function identifies the 15 most common positive and negative words in a given text. It classifies words based on their sentiment scores using the VADER analyzer, and then calculates their frequencies, returning four lists: the most common positive words, their frequencies, the most common negative words, and their frequencies.

In [32]:
def RankByNeg(text):
  neg_score = []
  
  for index, row in text.iterrows():
    text = processText(row['text'])
    pos_count, neg_count = PosNeg(text)
    neg_score.append((row['url'], neg_count))
    
  for i in range(len(neg_score)):
    for j in range(i + 1, len(neg_score)):
      if neg_score[i][1] < neg_score[j][1]:
        neg_score[i], neg_score[j] = neg_score[j], neg_score[i]
  
  return neg_score

The `RankByNeg` function calculates negative scores for each article and sorts the list of scores in descending order.

In [34]:
def RankByPos(text):
  pos_score = []
  
  for index, row in text.iterrows():
    text = processText(row['text'])
    pos_count, neg_count = PosNeg(text)
    pos_score.append((row['url'], pos_count))
    
  for i in range(len(pos_score)):
    for j in range(i + 1, len(pos_score)):
      if pos_score[i][1] < pos_score[j][1]:
        pos_score[i], pos_score[j] = pos_score[j], pos_score[i]
  
  return pos_score

The `RankByPos` function calculates positive scores for each article and sorts the list of scores in descending order.

This research is important because it brings focus to the important role of rhetoric in shaping climate change discussions. My findings present valuable information that can guide the development of more impactful communication strategies, inform media practices, and support advocacy efforts that are aimed at increasing climate action and policy.

This study brings us one step closer to confirm that the way climate change is discussed significantly impacts public response. When language effectively conveys urgency and seriousness, it can promote meaningful action and create a deeper understanding of the crisis. On the other hand, language that downplays the severity or misrepresents facts can impede efforts to address climate change.

how it’s discussed in public discussion can either mobilize or hinder efforts in addressing it.