<center>
<font color='DarkTurquoise'>

# Analysis of news topics all over the world

</font>
</center>

### Our project aimed at analyzing news coverage worldwide and examining the prevalence of reports *regarding the ongoing conflict in Ukraine*.

In a world driven by constant information flow, news plays a pivotal role in shaping public opinion and influencing international perceptions. Particularly, the media's role in covering conflicts serves as a crucial indicator of the magnitude and impact of such events on a global scale. So we have made a research about of one of the most pressing geopolitical issues of our time: <span style="color: DarkTurquoise">the war in Ukraine</span>.

With countless news websites disseminating information around the clock, it becomes increasingly challenging to comprehend the true extent of media coverage regarding specific events. Our project takes on this challenge, utilizing <a href="https://scrapy.org" style="color: #55e0a3">Python Scrapy</a> module to retrieve the data and then determine the prevalence and distribution of reports related to the conflict in Ukraine.

We have analyzed <span style="color: DarkTurquoise">5 different countries</span> and their most widely read newspaper:
<p style="line-height: 25px;">
1. <b style="font-size: 20px">USA</b>
  <ul>
    <li><a href="https://www.ft.com/world" style="color: Orange">Financial Times</a></li>
  </ul>
2. <b style="font-size: 20px">Great Britain</b>
  <ul>
    <li><a href="https://subscription.theweek.co.uk" style="color: Orange">The Week</a></li>
  </ul>
3. <b style="font-size: 20px">France</b>
  <ul>
    <li><a href="https://www.lemonde.fr/en/" style="color: Orange">Le Monde</a></li>
  </ul>
4. <b style="font-size: 20px">South Africa</b>
  <ul>
    <li><a href="https://www.sanews.gov.za" style="color: Orange">South African Government News Agency</a></li>
    <li><a href="https://www.news24.com/news24" style="color: Orange">news24</a></li>
  </ul>
5. <b style="font-size: 20px">Japan</b>
  <ul>
    <li><a href="https://www.japantimes.co.jp" style="color: Orange">The Japan Times</a></li>
  </ul>
</p>

In [None]:
# importing python modules to work with data
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
import re
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist
import numpy as np

In [None]:
# keywords for finding alticles about the war in Ukraine
keywords = [
    "ukraine",
    "ukrainian",
    "russia",
    "russian",
    "dpr",
    "lpr",
    "putin",
    "zelensky",
    "kyiv",
    "bakhmut"
]

In [None]:
# additional functions
def check(string: str):
    for word in keywords:
        if word in string.lower():
            return True
    return False

def get_month(date: datetime):
    return date.month

<center>
<font color='DarkTurquoise'>

# USA

</font>
</center>

**"The Financial Times"** is one of the world’s leading news organisations, recognised internationally for its authority, integrity and accuracy.

In [None]:
df = pd.read_json('datasets/news_ft.json')
df.head()

Forming dataframe containing only news about war in Ukraine:

In [None]:
war_df = df[df['title'].apply(lambda x: check(x))]
war_df.head()

Calculating the percentage of news about war in Ukraine:

In [None]:
war_perc = []
sorted_dates = sorted(list(war_df.date.unique()))
for date in sorted_dates:
    news_this_date = df[df['date'] == date].shape[0]
    news_war = war_df[war_df['date'] == date].shape[0]
    war_perc.append(round(news_war/news_this_date * 100, 2))

str_dates = list(map(lambda x: str(x)[:10], sorted_dates))

In [None]:
plt.figure(figsize=(15, 8))
plt.bar(str_dates, war_perc, color ='CadetBlue')
plt.ylim(0, 100)
plt.xlabel('Date')
plt.ylabel('Percentage of news about war in Ukraine')
plt.title('News about war in Ukraine in Financial Times')
plt.show()

<center>
<font color='DarkTurquoise'>

# Great Britain

</font>
</center>

**"The Week"** news is a respected digital platform that offers a curated selection of news articles, opinion pieces, and cultural content. The website gathers news articles from various sources, including major national and international publications, and presents them in a concise and digestible format. This approach allows readers to quickly grasp multiple perspectives on a given topic and gain a comprehensive understanding of current events.

In [None]:
df = pd.read_json("datasets/theweek.json")
df.head()

Forming dataframe containing only news about war in Ukraine:

In [None]:
war_df = df[df['header'].apply(lambda x: check(x))]
war_df.head()

<font color='DarkTurquoise'>

## February 2022 - December 2022
</font>

Creating dataframe for period (February 2022 - December 2022)

In [None]:
df_2022 = df.loc[df['date'] < datetime(2023, 1, 1)].copy()
df_2022.loc[:, 'month'] = df_2022['date'].apply(lambda x: get_month(x))

df_2022.head()

Creating dataframe with news about Ukraine for period (February 2022 - December 2022)

In [None]:
war_df_2022 = war_df.loc[war_df['date'] < datetime(2023, 1, 1)].copy()
war_df_2022.loc[:, 'month'] = war_df_2022['date'].apply(lambda x: get_month(x))

war_df_2022.head()

Calculation of percentages of Ukrainian news

In [None]:
war_perc_2022 = []
months = list(range(2, 13))
for month in months:
    month_df = df_2022[df_2022['month'] == month].shape[0]
    month_war_df = war_df_2022[war_df_2022['month'] == month].shape[0]
    try:
        war_perc_2022.append(round(month_war_df/month_df * 100, 2))
    except ZeroDivisionError:
        war_perc_2022.append(0.0)

### Visualisation of 2022

In [None]:
months = ['Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
plt.figure(figsize = (10, 5))

plt.bar(months, war_perc_2022, color ='CadetBlue', width = 0.7)
plt.ylim(0, 100)

plt.xlabel("Month")
plt.ylabel("Percents %")
plt.title("Percentage of news The Week News about war in Ukraine in 2022")
plt.show()

<font color='DarkTurquoise'>

## January 2023 - May 2023
</font>

Creating dataframe for period (January 2023 - May 2023)

In [None]:
df_2023 = df.loc[df['date'] >= datetime(2023, 1, 1)].copy()
df_2023.loc[:, 'month'] = df_2023['date'].apply(lambda x: get_month(x))

df_2023.head()

Creating dataframe with news about Ukraine for period (January 2023 - May 2023)

In [None]:
war_df_2023 = war_df.loc[war_df['date'] >= datetime(2023, 1, 1)].copy()
war_df_2023.loc[:, 'month'] = war_df_2023['date'].apply(lambda x: get_month(x))

war_df_2023.head()

Calculation of percentages of Ukrainian news

In [None]:
war_perc_2023 = []
months = list(range(1, 6))
for month in months:
    month_df = df_2023[df_2023['month'] == month].shape[0]
    month_war_df = war_df_2023[war_df_2023['month'] == month].shape[0]
    try:
        war_perc_2023.append(round(month_war_df/month_df * 100, 2))
    except ZeroDivisionError:
        war_perc_2023.append(0.0)

### Visualisation of 2023

In [None]:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
plt.figure(figsize = (10, 5))

plt.bar(months, war_perc_2023, color ='CadetBlue', width = 0.7)
plt.ylim(0, 100)

plt.xlabel("Month")
plt.ylabel("Percents %")
plt.title("Percentage of news in The Week News about war in Ukraine in 2023")
plt.show()

<center>
<font color='DarkTurquoise'>

# France

</font>
</center>

<h3>Online magazine: <span style="color: #9797c3"><b>Le Monde<b></span></h3>

**Le Monde** is French daily evening newspaper founded in 1944. The first issue of the newspaper was published on December 19, 1944. Since December 19, 1995, the newspaper is available online. "Le Monde" should not be confused with "Mond Diplomatic", a monthly publication devoted to issues of international relations.

#### Step 1: create a dataframe by using pandas

In [None]:
df = pd.read_json('fr_news.json')
df.head()

#### Step 2: check whether the word is in keywords

In [None]:
ukraine_war_df = df[df['title'].apply(lambda x: check(x))]
ukraine_war_df.head()

#### Step 3: calculate the percentage of the news about war in Ukraine

##### The data is specific days during the war (ex. blackouts, liberation of Kherson)

In [None]:
ukraine_war_per_cent = []
dates = list(ukraine_war_df.date.unique())
for date in dates:

    one_month_df = df[df['date'] == date].shape[0]
    one_month_ukraine_df = ukraine_war_df[ukraine_war_df['date'] == date].shape[0]

    try:
        ukraine_war_per_cent.append(round(one_month_ukraine_df/one_month_df * 100, 2))
    except ZeroDivisionError:
        ukraine_war_per_cent.append(0.0)

string_dates = [np.datetime_as_string(date, unit='D') for date in dates]

#### Step 4: create the bar graph

In [None]:
plt.figure(figsize=(20, 10))

plt.bar(string_dates, ukraine_war_per_cent, color = '#9797c3', width = 0.9)
plt.ylim(0, 100)

plt.xlabel('Specific dates during the war')
plt.ylabel('Percentage (%) of news about war in Ukraine')
plt.title('News about war in Ukraine in France ("Le Monde")')
plt.show()

<center>
<font color='DarkTurquoise'>

# South Africa

</font>
</center>

<h3><b>Newspaper</b>: <span style="color: #79c2d0">News24</span></h3>

**News24** is owned by Media24, South Africa's leading media company, with interests in digital media and services, newspapers, magazines, e-commerce, book publishing, print and distribution. 

In [None]:
df = pd.read_json('datasets/news_24_dataset.json')
df.head()

Forming dataframe containing only news about war in Ukraine:

In [None]:
war_df = df[df['title'].apply(lambda x: check(x))]
war_df.head()

<font color='DarkTurquoise'>

## Jun - Dec 2022
</font>

Creating dataframe for period (Jun - Dec 2022)

In [None]:
df_2022 = df[df['date'] < datetime(2023, 1, 1)][1:].copy()
df_2022['month'] = df_2022['date'].apply(lambda x: get_month(x))
df_2022.head()

Forming dataframe containing only news about war in Ukraine:

In [None]:
war_df_2022 = war_df[war_df['date'] < datetime(2023, 1, 1)].copy()
war_df_2022['month'] = war_df_2022['date'].apply(lambda x: get_month(x))
war_df_2022.head()

Calculation of percentages of Ukrainian news

In [None]:
war_perc_2022 = []
months = list(range(6, 13))
for month in months:
    month_df = df_2022[df_2022['month'] == month].shape[0]
    month_war_df = war_df_2022[war_df_2022['month'] == month].shape[0]
    try:
        war_perc_2022.append(round(month_war_df/month_df * 100, 2))
    except ZeroDivisionError:
        war_perc_2022.append(0.0)

### Building plot 2022

In [None]:
months = ['Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
plt.figure(figsize = (10, 5))
 
# creating the bar plot
plt.bar(months, war_perc_2022, color ='CadetBlue', width = 0.7)
plt.ylim(0, 100)

plt.xlabel("Month")
plt.ylabel("Percents %")
plt.title("Percentage of news in SA about war in Ukraine in 2022")
plt.show()

<font color='DarkTurquoise'>

## Jan - May 2023
</font>

Creating dataframe for period (Jan - May 2023)

In [None]:
df_2023 = df[df['date'] >= datetime(2023, 1, 1)].copy()
df_2023['month'] = df_2023['date'].apply(lambda x: get_month(x))

Forming dataframe containing only news about war in Ukraine:

In [None]:
war_df_2023 = war_df[war_df['date'] >= datetime(2023, 1, 1)].copy()
war_df_2023['month'] = war_df_2023['date'].apply(lambda x: get_month(x))
war_df_2023.head()

Calculation of percentages of Ukrainian news

In [None]:
war_perc_2023 = []
months = list(range(1, 6))
for month in months:
    month_df = df_2023[df_2023['month'] == month].shape[0]
    month_war_df = war_df_2023[war_df_2023['month'] == month].shape[0]
    try:
        war_perc_2023.append(round(month_war_df/month_df * 100, 2))
    except ZeroDivisionError:
        war_perc_2023.append(0.0)

### Building plot 2023

In [None]:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
plt.figure(figsize = (10, 5))
 
# creating the bar plot
plt.bar(months, war_perc_2023, color ='CadetBlue', width = 0.7)
plt.ylim(0, 100)

plt.xlabel("Month")
plt.ylabel("Percents %")
plt.title("Percentage of news in SA about war in Ukraine in 2023")
plt.show()

<h3><b>Newspaper</b>: <span style="color: #79c2d0">South African Government News Agency</span></h3>

**The SA Government News Agency** is a news service, published by the Government Communication and Information System (GCIS).

In [None]:
df = pd.read_json('datasets/sa_news_dataset.json')
df.head()

Forming dataframe containing only news about war in Ukraine:

In [None]:
war_df = df[df['title'].apply(lambda x: check(x))]
war_df.head()

<font color='DarkTurquoise'>

## Feb - Dec 2022
</font>

Creating dataframe for period (Feb - Dec 2022)

In [None]:
df_2022 = df[df['date'] < datetime(2023, 1, 1)].copy()
df_2022['month'] = df_2022['date'].apply(lambda x: get_month(x))

Forming dataframe containing only news about war in Ukraine:

In [None]:
war_df_2022 = war_df[war_df['date'] < datetime(2023, 1, 1)].copy()
war_df_2022['month'] = war_df_2022['date'].apply(lambda x: get_month(x))
war_df_2022.head()

In [None]:
war_perc_2022 = []
months = list(range(2, 13))
for month in months:
    month_df = df_2022[df_2022['month'] == month].shape[0]
    month_war_df = war_df_2022[war_df_2022['month'] == month].shape[0]
    try:
        war_perc_2022.append(round(month_war_df/month_df * 100, 2))
    except ZeroDivisionError:
        war_perc_2022.append(0.0)

### Building plot 2022

In [None]:
months = ['Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
plt.figure(figsize = (10, 5))
 
# creating the bar plot
plt.bar(months, war_perc_2022, color ='CadetBlue', width = 0.7)
plt.ylim(0, 100)

plt.xlabel("Month")
plt.ylabel("Percents %")
plt.title("Percentage of news in SA about war in Ukraine in 2022")
plt.show()

<font color='DarkTurquoise'>

## Jan - May 2023
</font>

Creating dataframe for period (Jan - May 2023)

In [None]:
df_2023 = df[df['date'] >= datetime(2023, 1, 1)].copy()
df_2023['month'] = df_2023['date'].apply(lambda x: get_month(x))

Forming dataframe containing only news about war in Ukraine:

In [None]:
war_df_2023 = war_df[war_df['date'] >= datetime(2023, 1, 1)].copy()
war_df_2023['month'] = war_df_2023['date'].apply(lambda x: get_month(x))
war_df_2023.head()

In [None]:
war_perc_2023 = []
months = list(range(1, 6))
for month in months:
    month_df = df_2023[df_2023['month'] == month].shape[0]
    month_war_df = war_df_2023[war_df_2023['month'] == month].shape[0]
    try:
        war_perc_2023.append(round(month_war_df/month_df * 100, 2))
    except ZeroDivisionError:
        war_perc_2023.append(0.0)

### Building plot 2023

In [None]:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
plt.figure(figsize = (10, 5))
 
# creating the bar plot
plt.bar(months, war_perc_2023, color ='CadetBlue', width = 0.7)
plt.ylim(0, 100)

plt.xlabel("Month")
plt.ylabel("Percents %")
plt.title("Percentage of news in SA about war in Ukraine in 2023")
plt.show()

<center>
<font color='DarkTurquoise'>

# Japan

</font>
</center>

**"The Japan Times"** news website is a prominent and reputable online platform that provides comprehensive coverage of news and current affairs related to Japan. With a strong focus on both domestic and international news, the website serves as a valuable resource for readers seeking reliable and up-to-date information about Japan and its place in the world.

In [None]:
def create_dateframe():
    """
    create dataframe from news dataframe
    """
    df = pd.read_json('datasets/news.json')
    date_format = '%Y-%m-%d'
    df['date'] = pd.to_datetime(df['date'], format=date_format)
    # df = df.sort_values(by='date')
    df = df[df['date'] >= '2022-01-01']
    return df

df = create_dateframe()
df.head()

Forming dataframe containing only news about war in Ukraine:

In [None]:
ukr_df = df[df['title'].apply(lambda x: check(x))]
ukr_df = ukr_df.reset_index()
ukr_df['year'] = pd.DatetimeIndex(ukr_df['date']).year
ukr_df['month'] = pd.DatetimeIndex(ukr_df['date']).month
ukr_df.head()

In [None]:
months_ukr_df = ukr_df.pivot_table(columns=['year', 'month'], aggfunc='size')
months_ukr_df = months_ukr_df.reset_index()
months_ukr_df.columns = ['year', 'month', 'count']
months_ukr_df['date_plt'] = months_ukr_df['year'].apply(str) + '-' + months_ukr_df['month'].apply(str)
months_ukr_df.head()

In [None]:
df['year'] = pd.DatetimeIndex(df['date']).year
df['month'] = pd.DatetimeIndex(df['date']).month
months_df = df.pivot_table(columns=['year', 'month'], aggfunc='size')
months_df = months_df.reset_index()
months_df.columns = ['year', 'month', 'count']
months_df['date_plt'] = months_df['year'].apply(str) + '-' + months_df['month'].apply(str)
months_df.head()

In [None]:
plt.figure(figsize=(15, 7))
plt.bar(months_df['date_plt'], months_df['count'], color='#34eb9e')   
plt.bar(months_ukr_df['date_plt'], months_ukr_df['count'], color='#eb3434')
plt.legend(['green bars - all japan news', 'red bars - japan news about ukraine'])
plt.show()

<center>
<h2 style="color: #a5e9db">
The analysis of news coverage within the region of Donetsk, Ukraine, a significant epicenter of the ongoing conflict.
</h2>

</center>

The conflict in Eastern Ukraine, particularly in Donetsk, has been a focal point of international attention due to its far-reaching implications. As researchers dedicated to understanding the multifaceted nature of this conflict, we have directed our efforts towards examining the local news landscape within Donetsk to gain deeper insights into the narratives, perspectives, and impact of media reporting.

<center>
<font color='DarkTurquoise'>

# Ukraine. Donetsk

</font>
</center>

In [None]:
df = pd.read_json('datasets/news_dan.json')
df['year'] = pd.to_datetime(df['date'], format='%d.%m.%Y %H:%M').dt.year

usless_words = ['что', 'это', 'для', 'как', 'так', 'и', 'в', 'над', 'к', 'до', 'не', 'на', 'но', 'за', 'то', 'с', 'ли', 'а', 'донбассе','донбасса']
colors = ['#0080ff', '#00ffff', '#00ff80']

In [None]:
word_frequencies_by_year = {}
for year, group in df.groupby('year'):
    titles = ' '.join(group['title'])
    word_tokens = word_tokenize(titles.lower())
    word_tokens = [word for word in word_tokens if (word.isalpha() and len(word) > 2) and word not in usless_words] # Remove punctuation
    word_frequencies_by_year[year] = FreqDist(word_tokens)
word_frequencies_by_year

In [None]:
plt.figure(figsize=(30, 16))

for year, word_frequencies in word_frequencies_by_year.items():
    most_common_words = word_frequencies.most_common(3)
    words, frequencies = zip(*most_common_words)

    for index, word in enumerate(words):
        plt.bar(f'{year} №{index+1}', frequencies[index], color=colors[index]) 
        plt.text(f'{year} №{index+1}', frequencies[index], word, ha='center', va='bottom', fontsize=12)

plt.title("Most Frequent Words in Titles", fontsize=28)
plt.xlabel("Words", fontsize=22)
plt.ylabel("Frequency", fontsize=22)
plt.show()

<center>
<font color='DarkTurquoise'>

# Conclusion

</font>
</center>

##### <center>As you can observe from the graphs above, after more than a year of war, people around the world continue to be interested in news about the war in Ukraine. There is a noticeable greater interest in foreign media during hot situations at the front (for example, in the direction of Bakhmut) or important events for the course of the war, such as: the sinking of the ship "Moscow" or the liberation of Kherson.<center/>
