<a href="https://colab.research.google.com/github/marfrlv/news_corpus/blob/main/fin_cleaning_feature_ext.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Data preprocessing and cleaning


In [None]:
# libraries
import pandas as pd
import numpy as np
import string
import re
import spacy
nlp = spacy.load("en_core_web_sm")

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


##1.1. Low-credibility sources

### 1.1.1. Russia Today

In [None]:
# paths to files on drive
rt_path1 = '/content/drive/MyDrive/rt_articles.csv'
rt_path2 = '/content/drive/MyDrive/rt_articles_mar_apr.csv'

# initialize two dfs
rt_df1 = pd.read_csv(rt_path1)
rt_df2 =  pd.read_csv(rt_path2)

# concatenate two dfs
rt_df = pd.concat([rt_df1, rt_df2], axis=0)
# delete duplicates (in case if during the scraping sessions some of the articles were scraped twice)
rt_df = rt_df.drop_duplicates(subset = 'Headings')

# check if the column with articles' texts has any missing values
if rt_df['Texts'].isnull().any():
    print("Column has missing values.")
    print()
else:
    print("Column does not have missing values.")

# add the name of the source and label (1 for propaganda and 0 for not propaganda)
rt_df['Source'] = 'Russia Today'
rt_df

Column does not have missing values.


Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source
0,US' inspector general for Afghanistan warns ag...,2023-02-26 17:54:00,Failure to implement rigid oversight measures ...,Washington has been the largest financial cont...,Russia Today
1,US presidential hopeful calls for end to Ukrai...,2023-02-26 17:11:00,The US should focus on China and Taiwan instea...,Washington should focus on countering China an...,Russia Today
2,Ukraine’s neighbor speaks out on NATO membership,2023-02-26 15:01:00,The possibility of joining NATO isn’t being ra...,"Moldova must be able to protect itself, but no...",Russia Today
3,Defense minister explains conditions for Russi...,2023-02-26 14:26:00,Potential Russian military advances in Ukraine...,A widening of the military operation depends o...,Russia Today
4,Thousands rally for peace in Italy,2023-02-26 14:25:00,Several thousands have taken to the streets of...,Demonstrators in the cities of Genoa and Milan...,Russia Today
...,...,...,...,...,...
295,Ukraine conflict being fueled by empires - Pop...,2023-03-10 11:32:00,The conflict in Ukraine is being fueled by the...,"The interests of several countries, not just R...",Russia Today
296,Two EU members offer jets to Ukraine,2023-03-09 18:56:00,Poland and Slovakia are ready to hand over the...,Poland and Slovakia will donate Soviet-era war...,Russia Today
297,Oscars reportedly say no to Zelensky again,2023-03-09 18:56:00,Ukrainian President Vladimir Zelensky’s bid to...,Ukrainian President Vladimir Zelensky’s bid to...,Russia Today
298,US to discourage African nations from doing bu...,2023-03-09 16:41:00,Deputy Treasury Secretary Wally Adeyemo will v...,The deputy treasury secretary will attempt to ...,Russia Today


In [None]:
rt_texts = list(rt_df['Texts'])
print(f'There are {len(rt_texts)} in the list.')

There are 587 in the list.


## The main function for cleaning

In [None]:
def clean_texts(texts_list):
    print(f'The number of texts given for cleaning {len(texts_list)}.')
    print('Processing...')
    # create an empty list to store the cleaned texts
    cleaned_list = []

    # iterate over the input
    for text in texts_list:
        # convert the text to lowercase
        text = text.lower()

        # process the text with spaCy to create a doc object
        doc = nlp(text)

        # iterate over the tokens in the doc and perform cleaning operations
        clean_tokens = []
        for token in doc:
            # remove stop words and punctuation
            if not token.is_stop and token.is_alpha:
                # lemmatize the token
                clean_tokens.append(token.lemma_)

        # join the cleaned tokens into a string and append it to the list
        clean_text = ' '.join(clean_tokens)
        cleaned_list.append(clean_text)
    print(f'The number of texts cleaned is {len(cleaned_list)}.')
    # return the cleaned list
    return cleaned_list


In [None]:
# extract content words
rt_cleaned = clean_texts(rt_texts)
# delete the name of the source
rt_cleaned_fin = []
for text in rt_cleaned:
  text_wo_source_name = re.sub('russia today', '', text)
  rt_cleaned_fin.append(text_wo_source_name)

The number of texts given for cleaning 587.
Processing...
The number of texts cleaned is 587.


In [None]:
rt_df['Content words'] = rt_cleaned_fin
rt_df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words
0,US' inspector general for Afghanistan warns ag...,2023-02-26 17:54:00,Failure to implement rigid oversight measures ...,Washington has been the largest financial cont...,Russia Today,washington large financial contributor ukraine...
1,US presidential hopeful calls for end to Ukrai...,2023-02-26 17:11:00,The US should focus on China and Taiwan instea...,Washington should focus on countering China an...,Russia Today,washington focus counter china tackle taiwan i...
2,Ukraine’s neighbor speaks out on NATO membership,2023-02-26 15:01:00,The possibility of joining NATO isn’t being ra...,"Moldova must be able to protect itself, but no...",Russia Today,moldova able protect join lead military bloc p...
3,Defense minister explains conditions for Russi...,2023-02-26 14:26:00,Potential Russian military advances in Ukraine...,A widening of the military operation depends o...,Russia Today,widening military operation depend weaponry ki...
4,Thousands rally for peace in Italy,2023-02-26 14:25:00,Several thousands have taken to the streets of...,Demonstrators in the cities of Genoa and Milan...,Russia Today,demonstrator city genoa milan demand end weapo...
...,...,...,...,...,...,...
295,Ukraine conflict being fueled by empires - Pop...,2023-03-10 11:32:00,The conflict in Ukraine is being fueled by the...,"The interests of several countries, not just R...",Russia Today,interest country russia drive hostility pontif...
296,Two EU members offer jets to Ukraine,2023-03-09 18:56:00,Poland and Slovakia are ready to hand over the...,Poland and Slovakia will donate Soviet-era war...,Russia Today,poland slovakia donate soviet era warplane kie...
297,Oscars reportedly say no to Zelensky again,2023-03-09 18:56:00,Ukrainian President Vladimir Zelensky’s bid to...,Ukrainian President Vladimir Zelensky’s bid to...,Russia Today,ukrainian president vladimir zelensky bid appe...
298,US to discourage African nations from doing bu...,2023-03-09 16:41:00,Deputy Treasury Secretary Wally Adeyemo will v...,The deputy treasury secretary will attempt to ...,Russia Today,deputy treasury secretary attempt pressure sta...


### 1.1.2. TASS

In [None]:
path = '/content/drive/MyDrive/tass_articles.csv'
tass_df = pd.read_csv(path)
tass_df = tass_df.drop_duplicates(subset = 'Headings')
if rt_df['Texts'].isnull().any():
    print("Column has missing values.")
    print()
else:
    print("Column does not have missing values.")
tass_df['Source'] = 'TASS'
tass_df

Column does not have missing values.


Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source
0,Russia calls for preventing large-scale Palest...,"MOSCOW, February 26.",Hindering the Quartet’s activities adversely i...,"MOSCOW, February 26. /TASS/. Russia calls for ...",TASS
1,Switzerland drops neutrality to support Ukrain...,"MOSCOW, February 26.","Apart from that, she drew attention to the fac...","MOSCOW, February 26. /TASS/. Bern has dropped ...",TASS
2,"US, EU seeking to divert int’l community’s att...","MOSCOW, February 26.",He stressed that Moscow deems such an approach...,"MOSCOW, February 26. /TASS/. The United States...",TASS
27,US is ‘confident’ China is considering supplie...,"WASHINGTON, February 26.","The US leadership thought it was ""important to...","WASHINGTON, February 26. /TASS/. The US is con...",TASS
28,Washington’s ‘peace narrative’ illusory due to...,"BEIJING, February 25.","According to the article, the West regards its...","BEIJING, February 25. /TASS/. American claims ...",TASS
...,...,...,...,...,...
1442,Arms supplies to Ukraine pushes Kiev toward mi...,"MOSCOW, February 20.",It was the fifth telephone contact between the...,"MOSCOW, February 20. /TASS/. Russian President...",TASS
1443,Zelensky's statements mean he is not going to ...,"MOSCOW, February 20.",Dmitry Peskov stressed that the settlement in ...,"MOSCOW, February 20. /TASS/. Statements by Ukr...",TASS
1444,Talks begin between Putin and Macron,"MOSCOW, February 20.",The Elysee Palace noted that today's conversat...,"MOSCOW, February 20. /TASS/. The telephone con...",TASS
1445,Putin plans to hold another telephone conversa...,"MOSCOW, February 20.",Macron reportedly intends to discuss aggravati...,"MOSCOW, February 20. /TASS/. On Sunday, Russia...",TASS


In [None]:
tass_texts = list(tass_df['Texts'])
print(f'There are {len(tass_texts)} in the list.')

There are 1421 in the list.


In [None]:
# extract content words
tass_cleaned = clean_texts(tass_texts)
# delete the name of the source
tass_cleaned_fin = []
for text in tass_cleaned:
  text_wo_source_name = re.sub('tass', '', text)
  tass_cleaned_fin.append(text_wo_source_name)

The number of texts given for cleaning 1421.
Processing...
The number of texts cleaned is 1421.


In [None]:
tass_df['Content words'] = tass_cleaned_fin
tass_df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words
0,Russia calls for preventing large-scale Palest...,"MOSCOW, February 26.",Hindering the Quartet’s activities adversely i...,"MOSCOW, February 26. /TASS/. Russia calls for ...",TASS,moscow february russia call resume work middle...
1,Switzerland drops neutrality to support Ukrain...,"MOSCOW, February 26.","Apart from that, she drew attention to the fac...","MOSCOW, February 26. /TASS/. Bern has dropped ...",TASS,moscow february bern drop traditional neutrali...
2,"US, EU seeking to divert int’l community’s att...","MOSCOW, February 26.",He stressed that Moscow deems such an approach...,"MOSCOW, February 26. /TASS/. The United States...",TASS,moscow february united states european union s...
27,US is ‘confident’ China is considering supplie...,"WASHINGTON, February 26.","The US leadership thought it was ""important to...","WASHINGTON, February 26. /TASS/. The US is con...",TASS,washington february confident china consider s...
28,Washington’s ‘peace narrative’ illusory due to...,"BEIJING, February 25.","According to the article, the West regards its...","BEIJING, February 25. /TASS/. American claims ...",TASS,beijing february american claim importance glo...
...,...,...,...,...,...,...
1442,Arms supplies to Ukraine pushes Kiev toward mi...,"MOSCOW, February 20.",It was the fifth telephone contact between the...,"MOSCOW, February 20. /TASS/. Russian President...",TASS,moscow february russian president vladimir put...
1443,Zelensky's statements mean he is not going to ...,"MOSCOW, February 20.",Dmitry Peskov stressed that the settlement in ...,"MOSCOW, February 20. /TASS/. Statements by Ukr...",TASS,moscow february statement ukrainian president ...
1444,Talks begin between Putin and Macron,"MOSCOW, February 20.",The Elysee Palace noted that today's conversat...,"MOSCOW, February 20. /TASS/. The telephone con...",TASS,moscow february telephone conversation russian...
1445,Putin plans to hold another telephone conversa...,"MOSCOW, February 20.",Macron reportedly intends to discuss aggravati...,"MOSCOW, February 20. /TASS/. On Sunday, Russia...",TASS,moscow february sunday russian president vladi...


### 1.1.3 News Target

In [None]:
path = '/content/drive/MyDrive/news_target_articles.csv'
nt_df = pd.read_csv(path)
nt_df = nt_df.drop_duplicates(subset = 'Headings')
if nt_df['Texts'].isnull().any():
    print("Column has missing values.")
    print()
else:
    print("Column does not have missing values.")
nt_df['Source'] = 'News Target'
nt_df

Column does not have missing values.


Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source
0,Worldwide famine looms as Russia-Ukraine confl...,2022-06-08 00:43:36,An article published by British weekly newspap...,Worldwide famine looms as Russia-Ukraine confl...,News Target
1,Will the Russia-Ukraine conflict end with the ...,2022-03-11 12:38:31,Rumors are emerging to suggest that the endgam...,Will the Russia-Ukraine conflict end with the ...,News Target
2,Ongoing Russia-Ukraine conflict pushing global...,2022-05-08 20:55:29,The ongoing conflict between Russia and Ukrain...,Ongoing Russia-Ukraine conflict pushing global...,News Target
3,Russia-Ukraine conflict not helping American c...,2022-05-08 20:55:39,The United States has spent the past two years...,Russia-Ukraine conflict not helping American c...,News Target
4,"Nebraska farmland values climb 16%, but farmer...",2022-05-06 12:48:36,The value of farmland in Nebraska has hit hist...,"Nebraska farmland values climb 16%, but farmer...",News Target
...,...,...,...,...,...
1431,Biden intentionally destroying U.S. economy an...,2022-09-09 21:19:37,It’s become patently obvious that Joe Biden’s ...,Biden intentionally destroying U.S. economy an...,News Target
1432,USA targeting of Moskva ship is Russia’s “Pear...,2022-04-21 13:41:48,To understand the sinking of the Moskva and Pu...,USA targeting of Moskva ship is Russia’s “Pear...,News Target
1433,Democrats rooting for Ukraine to beat back Rus...,2022-03-02 00:32:07,"If Democrats are anything, they are massive hy...",Democrats rooting for Ukraine to beat back Rus...,News Target
1434,"US lags behind Russia, China in hypersonic wea...",2022-03-14 10:43:43,The United States has fallen behind Russia and...,"Report: US lags behind Russia, China in hypers...",News Target


In [None]:
# in all articles from News Target there are lists of sources used in the end, the code below allows to get rid of them
nt_texts = nt_df['Texts']
nt_texts_wo_sources = []
for text in nt_texts:
  nt_text_wo_sources = text.split('Sources')[0]
  nt_texts_wo_sources.append(nt_text_wo_sources)

# to control
print(len(nt_texts_wo_sources))

1436


In [None]:
# for cleaning use the list with texts WITHOUT SOURCES!
# extract content words
nt_cleaned = clean_texts(nt_texts_wo_sources)
# delete the name of the source
nt_cleaned_fin = []
for text in nt_cleaned:
  text_wo_source_name = re.sub('news target', '', text)
  nt_cleaned_fin.append(text_wo_source_name)

The number of texts given for cleaning 1436.
Processing...
The number of texts cleaned is 1436.


In [None]:
nt_df['Content words'] = nt_cleaned_fin
nt_df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words
0,Worldwide famine looms as Russia-Ukraine confl...,2022-06-08 00:43:36,An article published by British weekly newspap...,Worldwide famine looms as Russia-Ukraine confl...,News Target,worldwide famine loom russia ukraine conflict ...
1,Will the Russia-Ukraine conflict end with the ...,2022-03-11 12:38:31,Rumors are emerging to suggest that the endgam...,Will the Russia-Ukraine conflict end with the ...,News Target,russia ukraine conflict end collapse western c...
2,Ongoing Russia-Ukraine conflict pushing global...,2022-05-08 20:55:29,The ongoing conflict between Russia and Ukrain...,Ongoing Russia-Ukraine conflict pushing global...,News Target,ongoing russia ukraine conflict push global ec...
3,Russia-Ukraine conflict not helping American c...,2022-05-08 20:55:39,The United States has spent the past two years...,Russia-Ukraine conflict not helping American c...,News Target,russia ukraine conflict help american company ...
4,"Nebraska farmland values climb 16%, but farmer...",2022-05-06 12:48:36,The value of farmland in Nebraska has hit hist...,"Nebraska farmland values climb 16%, but farmer...",News Target,nebraska farmland value climb farmer worry eff...
...,...,...,...,...,...,...
1431,Biden intentionally destroying U.S. economy an...,2022-09-09 21:19:37,It’s become patently obvious that Joe Biden’s ...,Biden intentionally destroying U.S. economy an...,News Target,biden intentionally destroy economy sabotage a...
1432,USA targeting of Moskva ship is Russia’s “Pear...,2022-04-21 13:41:48,To understand the sinking of the Moskva and Pu...,USA targeting of Moskva ship is Russia’s “Pear...,News Target,usa target moskva ship russia pearl harbor ret...
1433,Democrats rooting for Ukraine to beat back Rus...,2022-03-02 00:32:07,"If Democrats are anything, they are massive hy...",Democrats rooting for Ukraine to beat back Rus...,News Target,democrats root ukraine beat russian invasion s...
1434,"US lags behind Russia, China in hypersonic wea...",2022-03-14 10:43:43,The United States has fallen behind Russia and...,"Report: US lags behind Russia, China in hypers...",News Target,report lag russia china hypersonic weapon race...


### 1.1.4. Activist Post

In [None]:
path = '/content/drive/MyDrive/activist_post_articles.csv'
ap_df = pd.read_csv(path)
ap_df = ap_df.drop_duplicates(subset = 'Headings')
if ap_df['Texts'].isnull().any():
    print("Column has missing values.")
    print()
else:
    print("Column does not have missing values.")
ap_df['Source'] = 'Activist Post'
ap_df

Column does not have missing values.


Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source
0,What Happens Next In The Ukraine Proxy War? A ...,2023-02-17 13:08:27,The Ukraine event is just one part of a larger...,By Brandon Smith\nFrom the very beginning of t...,Activist Post
1,Zelensky Signs Agreement With JPMorgan on Ukra...,2023-02-15 01:51:43,"JPMorgan, America’s largest bank, discussed wi...",By Dave DeCamp\nBankers from JPMorgan Chase vi...,Activist Post
2,Canada’s Dangerous Escalation in the Russia-Uk...,2023-02-11 16:22:33,Unless Canada distances itself from this war f...,Op-Ed by Dan Fournier\nKey Takeaways:\nCanada ...,Activist Post
3,Russia Says The U.S. Has “Questions to Answer”...,2023-02-09 17:15:32,The Central Intelligence Agency and the White ...,By Mac Slavo\nRussia says the United States ha...,Activist Post
4,New $2.2 Billion Arms Package for Ukraine Incl...,2023-02-06 15:50:04,"According to the Pentagon fact sheet, the US h...",By Dave DeCamp\nThe Biden administration on Fr...,Activist Post
...,...,...,...,...,...
114,100+ Anti-War Groups Demand Biden End Brinkman...,2022-02-01 14:14:44,,By Jake Johnson\nMore than 100 advocacy organi...,Activist Post
115,"Washington Plans Fresh Sanctions on Russia, Ke...",2023-02-20 11:57:07,The sanctions’ targets include Russia’s financ...,By Connor Freeman\nThe White House is readying...,Activist Post
116,How Much Is US Aid to Ukraine Costing You?,2023-02-19 00:39:42,"The amount is piling up, and it's a lot to pay...","By David R. Henderson\nIn 2022, the U.S. gover...",Activist Post
117,Flashback: Rick Rozoff Warns Ukraine War is In...,2023-02-18 17:15:05,Rozoff warns that war is not a potential outco...,By The Corbett Report\nRick Rozoff of Stop NAT...,Activist Post


In [None]:
# get rid of authors' names at the beginning as well as from all the information on how to support them
ap_texts = ap_df['Texts']
ap_texts_for_cleaning = []

for text in ap_texts:
  act_p_text_wo_name = text.split('\n', 1)[1]
  act_p_text_wo_info = act_p_text_wo_name.split('Become a Patron!')[0]

  # use this as an input for the cleaning function!
  ap_texts_for_cleaning.append(act_p_text_wo_info)

print(len(ap_texts_for_cleaning))

117


In [None]:
# for cleaning use the list with texts WITHOUT AUTHORS' NAMES AND INFO ON HOW TO SUPPORT!
# extract content words
ap_cleaned = clean_texts(ap_texts_for_cleaning)
# delete the name of the source
ap_cleaned_fin = []
for text in ap_cleaned:
  text_wo_source_name = re.sub('activist post', '', text)
  ap_cleaned_fin.append(text_wo_source_name)

The number of texts given for cleaning 117.
Processing...
The number of texts cleaned is 117.


In [None]:
ap_df['Content words'] = ap_cleaned_fin
ap_df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words
0,What Happens Next In The Ukraine Proxy War? A ...,2023-02-17 13:08:27,The Ukraine event is just one part of a larger...,By Brandon Smith\nFrom the very beginning of t...,Activist Post,beginning ukraine conflict follow development ...
1,Zelensky Signs Agreement With JPMorgan on Ukra...,2023-02-15 01:51:43,"JPMorgan, America’s largest bank, discussed wi...",By Dave DeCamp\nBankers from JPMorgan Chase vi...,Activist Post,banker jpmorgan chase visit ukraine week sign ...
2,Canada’s Dangerous Escalation in the Russia-Uk...,2023-02-11 16:22:33,Unless Canada distances itself from this war f...,Op-Ed by Dan Fournier\nKey Takeaways:\nCanada ...,Activist Post,key takeaway canada deceptively manipulate sup...
3,Russia Says The U.S. Has “Questions to Answer”...,2023-02-09 17:15:32,The Central Intelligence Agency and the White ...,By Mac Slavo\nRussia says the United States ha...,Activist Post,russia say united states question answer pipel...
4,New $2.2 Billion Arms Package for Ukraine Incl...,2023-02-06 15:50:04,"According to the Pentagon fact sheet, the US h...",By Dave DeCamp\nThe Biden administration on Fr...,Activist Post,biden administration friday announce new billi...
...,...,...,...,...,...,...
114,100+ Anti-War Groups Demand Biden End Brinkman...,2022-02-01 14:14:44,,By Jake Johnson\nMore than 100 advocacy organi...,Activist Post,advocacy organization represent million people...
115,"Washington Plans Fresh Sanctions on Russia, Ke...",2023-02-20 11:57:07,The sanctions’ targets include Russia’s financ...,By Connor Freeman\nThe White House is readying...,Activist Post,white house ready raft new sanction export con...
116,How Much Is US Aid to Ukraine Costing You?,2023-02-19 00:39:42,"The amount is piling up, and it's a lot to pay...","By David R. Henderson\nIn 2022, the U.S. gover...",Activist Post,government approve expenditure billion aid ukr...
117,Flashback: Rick Rozoff Warns Ukraine War is In...,2023-02-18 17:15:05,Rozoff warns that war is not a potential outco...,By The Corbett Report\nRick Rozoff of Stop NAT...,Activist Post,rick rozoff stop nato join context nato eu imf...


## 1.1.5. Low-credibility sources dataset

In [None]:
# concatenate all dfs
lc_df = pd.concat([rt_df, tass_df, ap_df, nt_df])
if lc_df['Content words'].isnull().any():
    print("Column has missing values.")
    print()
else:
    print("Column does not have missing values.")
lc_df

Column does not have missing values.


Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words
0,US' inspector general for Afghanistan warns ag...,2023-02-26 17:54:00,Failure to implement rigid oversight measures ...,Washington has been the largest financial cont...,Russia Today,washington large financial contributor ukraine...
1,US presidential hopeful calls for end to Ukrai...,2023-02-26 17:11:00,The US should focus on China and Taiwan instea...,Washington should focus on countering China an...,Russia Today,washington focus counter china tackle taiwan i...
2,Ukraine’s neighbor speaks out on NATO membership,2023-02-26 15:01:00,The possibility of joining NATO isn’t being ra...,"Moldova must be able to protect itself, but no...",Russia Today,moldova able protect join lead military bloc p...
3,Defense minister explains conditions for Russi...,2023-02-26 14:26:00,Potential Russian military advances in Ukraine...,A widening of the military operation depends o...,Russia Today,widening military operation depend weaponry ki...
4,Thousands rally for peace in Italy,2023-02-26 14:25:00,Several thousands have taken to the streets of...,Demonstrators in the cities of Genoa and Milan...,Russia Today,demonstrator city genoa milan demand end weapo...
...,...,...,...,...,...,...
1431,Biden intentionally destroying U.S. economy an...,2022-09-09 21:19:37,It’s become patently obvious that Joe Biden’s ...,Biden intentionally destroying U.S. economy an...,News Target,biden intentionally destroy economy sabotage a...
1432,USA targeting of Moskva ship is Russia’s “Pear...,2022-04-21 13:41:48,To understand the sinking of the Moskva and Pu...,USA targeting of Moskva ship is Russia’s “Pear...,News Target,usa target moskva ship russia pearl harbor ret...
1433,Democrats rooting for Ukraine to beat back Rus...,2022-03-02 00:32:07,"If Democrats are anything, they are massive hy...",Democrats rooting for Ukraine to beat back Rus...,News Target,democrats root ukraine beat russian invasion s...
1434,"US lags behind Russia, China in hypersonic wea...",2022-03-14 10:43:43,The United States has fallen behind Russia and...,"Report: US lags behind Russia, China in hypers...",News Target,report lag russia china hypersonic weapon race...


In [None]:
lc_df['Label'] = 1
lc_df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words,Label
0,US' inspector general for Afghanistan warns ag...,2023-02-26 17:54:00,Failure to implement rigid oversight measures ...,Washington has been the largest financial cont...,Russia Today,washington large financial contributor ukraine...,1
1,US presidential hopeful calls for end to Ukrai...,2023-02-26 17:11:00,The US should focus on China and Taiwan instea...,Washington should focus on countering China an...,Russia Today,washington focus counter china tackle taiwan i...,1
2,Ukraine’s neighbor speaks out on NATO membership,2023-02-26 15:01:00,The possibility of joining NATO isn’t being ra...,"Moldova must be able to protect itself, but no...",Russia Today,moldova able protect join lead military bloc p...,1
3,Defense minister explains conditions for Russi...,2023-02-26 14:26:00,Potential Russian military advances in Ukraine...,A widening of the military operation depends o...,Russia Today,widening military operation depend weaponry ki...,1
4,Thousands rally for peace in Italy,2023-02-26 14:25:00,Several thousands have taken to the streets of...,Demonstrators in the cities of Genoa and Milan...,Russia Today,demonstrator city genoa milan demand end weapo...,1
...,...,...,...,...,...,...,...
1431,Biden intentionally destroying U.S. economy an...,2022-09-09 21:19:37,It’s become patently obvious that Joe Biden’s ...,Biden intentionally destroying U.S. economy an...,News Target,biden intentionally destroy economy sabotage a...,1
1432,USA targeting of Moskva ship is Russia’s “Pear...,2022-04-21 13:41:48,To understand the sinking of the Moskva and Pu...,USA targeting of Moskva ship is Russia’s “Pear...,News Target,usa target moskva ship russia pearl harbor ret...,1
1433,Democrats rooting for Ukraine to beat back Rus...,2022-03-02 00:32:07,"If Democrats are anything, they are massive hy...",Democrats rooting for Ukraine to beat back Rus...,News Target,democrats root ukraine beat russian invasion s...,1
1434,"US lags behind Russia, China in hypersonic wea...",2022-03-14 10:43:43,The United States has fallen behind Russia and...,"Report: US lags behind Russia, China in hypers...",News Target,report lag russia china hypersonic weapon race...,1


## 1.2. High-credibility sources

### 1.2.1. Foreign Policy

In [None]:
path = '/content/drive/MyDrive/for_pol_articles.csv'
fp_df = pd.read_csv(path)
fp_df = fp_df.drop_duplicates(subset = 'Headings')
if fp_df['Texts'].isnull().any():
    print("Column has missing values.")
    print()
else:
    print("Column does not have missing values.")
fp_df['Source'] = 'Foreign Policy'
fp_df

Column does not have missing values.


Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source
0,China’s Taiwan Saber-Rattling Is the New Normal,2022-08-05 15:24:45,,Experts who spoke to Foreign Policy said the m...,Foreign Policy
1,A Realist Guide to World Peace,2022-12-23 08:04:45,,"For a realist like me, these developments aren...",Foreign Policy
2,"In Northern Kosovo, Tensions Threaten to Boil ...",2022-10-31 10:15:27,,"Underlying issues, however, remain unresolved....",Foreign Policy
3,How Egypt Doubled Down on Fossil Fuels by Stif...,2022-11-08 18:14:35,,But the undeniable failures of high-emission c...,Foreign Policy
4,Who Owns the Earth’s Lungs?,2022-12-09 06:00:12,,"The research station, called Camp 41, is a pin...",Foreign Policy
...,...,...,...,...,...
2288,Polish MiG-29 Jets for Ukraine Is a Deeply Fla...,2022-03-14 15:19:10,,Successful security assistance begins with an ...,Foreign Policy
2289,Russia and China Threats Are Not the Same,2022-10-13 15:44:14,,If you would like to receive Situation Report ...,Foreign Policy
2290,How Europe Can Slash Fossil Fuels and Frustrat...,2022-03-10 09:39:12,,"Yet, in the first two weeks of Russia’s invasi...",Foreign Policy
2291,Moscow Strikes Back at Countries That Cross It,2022-04-18 13:15:20,,"“Mr. President, we are sure that today’s speak...",Foreign Policy


In [None]:
fp_texts = list(fp_df['Texts'])
print(f'There are {len(fp_texts)} in the list.')

There are 2064 in the list.


In [None]:
# for cleaning use the list with texts WITHOUT AUTHORS' NAMES AND INFO ON HOW TO SUPPORT!
# extract content words
fp_cleaned = clean_texts(fp_texts)
# delete the name of the source
fp_cleaned_fin = []
for text in fp_cleaned:
  text_wo_source_name = re.sub('foreign policy', '', text)
  fp_cleaned_fin.append(text_wo_source_name)

The number of texts given for cleaning 2064.
Processing...
The number of texts cleaned is 2064.


In [None]:
fp_df['Content words'] = fp_cleaned_fin
fp_df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words
0,China’s Taiwan Saber-Rattling Is the New Normal,2022-08-05 15:24:45,,Experts who spoke to Foreign Policy said the m...,Foreign Policy,expert speak say move indicate china try set ...
1,A Realist Guide to World Peace,2022-12-23 08:04:45,,"For a realist like me, these developments aren...",Foreign Policy,realist like development surprising realism ce...
2,"In Northern Kosovo, Tensions Threaten to Boil ...",2022-10-31 10:15:27,,"Underlying issues, however, remain unresolved....",Foreign Policy,underlie issue remain unresolved particularly ...
3,How Egypt Doubled Down on Fossil Fuels by Stif...,2022-11-08 18:14:35,,But the undeniable failures of high-emission c...,Foreign Policy,undeniable failure high emission country allev...
4,Who Owns the Earth’s Lungs?,2022-12-09 06:00:12,,"The research station, called Camp 41, is a pin...",Foreign Policy,research station call camp pinprick civilizati...
...,...,...,...,...,...,...
2288,Polish MiG-29 Jets for Ukraine Is a Deeply Fla...,2022-03-14 15:19:10,,Successful security assistance begins with an ...,Foreign Policy,successful security assistance begin identify ...
2289,Russia and China Threats Are Not the Same,2022-10-13 15:44:14,,If you would like to receive Situation Report ...,Foreign Policy,like receive situation report inbox thursday s...
2290,How Europe Can Slash Fossil Fuels and Frustrat...,2022-03-10 09:39:12,,"Yet, in the first two weeks of Russia’s invasi...",Foreign Policy,week russia invasion ukraine eu likely pay upw...
2291,Moscow Strikes Back at Countries That Cross It,2022-04-18 13:15:20,,"“Mr. President, we are sure that today’s speak...",Foreign Policy,mr president sure today speaker address lot ki...


### 1.2.2. Associated Press

In [None]:
path = '/content/drive/MyDrive/ass_press_articles.csv'
asp_df = pd.read_csv(path)
asp_df = asp_df.drop_duplicates(subset = 'Headings')
if asp_df['Texts'].isnull().any():
    print("Column has missing values.")
    print()
else:
    print("Column does not have missing values.")
asp_df['Source'] = 'Associated Press'
asp_df

Column has missing values.



Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source
0,"Word war: In Russia-Ukraine war, information b...",2023-02-23 05:07:51,WASHINGTON (AP) — Russia's invasion of Ukraine...,FILE - Destroyed Russian armored vehicles sit ...,Associated Press
1,Putin's Ukraine gamble seen as biggest threat ...,2023-02-20 07:43:46,Vladimir Putin says he learned from his boyhoo...,Putin's Ukraine gamble seen as biggest threat ...,Associated Press
2,"Yellen visits Ukraine, underscores US economic...",2023-02-27 17:52:46,"KYIV, Ukraine (AP) — U.S. Treasury Secretary J...","Ukrainian President Volodymyr Zelenskyy, left,...",Associated Press
3,G-20 meeting in India ends without consensus o...,2023-02-25 13:27:54,"BENGALURU, India (AP) — A meeting of finance c...",In this handout photo released by Indian Finan...,Associated Press
4,"Japan, other G-7 leaders step up Russia sanctions",2023-02-24 12:43:36,TOKYO (AP) — Japanese Prime Minister Fumio Kis...,Japanese Prime Minister Fumio Kishida speaks d...,Associated Press
...,...,...,...,...,...
66,Ukraine: Drone video shows cost of intense fig...,2023-02-24 08:44:03,"MARINKA, Ukraine (AP) — The hulking Russian ta...","New video footage shot on Feb. 19, 2023 from t...",Associated Press
67,Russia's sports exile persists 1 year after in...,2023-02-22 14:26:09,"One year after the invasion of Ukraine began, ...",FILE - A Russian flag is held above the Olympi...,Associated Press
68,Ukraine war saga unfolds across the lives of 5...,2023-02-16 08:30:40,"BUCHA, Ukraine (AP) — In the cemetery where Ol...",Ukraine war saga unfolds across the lives of 5...,Associated Press
69,Russian delegates defiant at hostile OSCE asse...,2023-02-24 21:03:56,"VIENNA, Austria (AP) — A contentious Organizat...",Head of the Russian OSCE PA delegation Petr To...,Associated Press


In [None]:
asp_df = asp_df.dropna(subset=['Texts'])
asp_df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source
0,"Word war: In Russia-Ukraine war, information b...",2023-02-23 05:07:51,WASHINGTON (AP) — Russia's invasion of Ukraine...,FILE - Destroyed Russian armored vehicles sit ...,Associated Press
1,Putin's Ukraine gamble seen as biggest threat ...,2023-02-20 07:43:46,Vladimir Putin says he learned from his boyhoo...,Putin's Ukraine gamble seen as biggest threat ...,Associated Press
2,"Yellen visits Ukraine, underscores US economic...",2023-02-27 17:52:46,"KYIV, Ukraine (AP) — U.S. Treasury Secretary J...","Ukrainian President Volodymyr Zelenskyy, left,...",Associated Press
3,G-20 meeting in India ends without consensus o...,2023-02-25 13:27:54,"BENGALURU, India (AP) — A meeting of finance c...",In this handout photo released by Indian Finan...,Associated Press
4,"Japan, other G-7 leaders step up Russia sanctions",2023-02-24 12:43:36,TOKYO (AP) — Japanese Prime Minister Fumio Kis...,Japanese Prime Minister Fumio Kishida speaks d...,Associated Press
...,...,...,...,...,...
66,Ukraine: Drone video shows cost of intense fig...,2023-02-24 08:44:03,"MARINKA, Ukraine (AP) — The hulking Russian ta...","New video footage shot on Feb. 19, 2023 from t...",Associated Press
67,Russia's sports exile persists 1 year after in...,2023-02-22 14:26:09,"One year after the invasion of Ukraine began, ...",FILE - A Russian flag is held above the Olympi...,Associated Press
68,Ukraine war saga unfolds across the lives of 5...,2023-02-16 08:30:40,"BUCHA, Ukraine (AP) — In the cemetery where Ol...",Ukraine war saga unfolds across the lives of 5...,Associated Press
69,Russian delegates defiant at hostile OSCE asse...,2023-02-24 21:03:56,"VIENNA, Austria (AP) — A contentious Organizat...",Head of the Russian OSCE PA delegation Petr To...,Associated Press


In [None]:
asp_texts = list(asp_df['Texts'])
print(f'There are {len(asp_texts)} in the list.')

There are 69 in the list.


In [None]:
# for cleaning use the list with texts WITHOUT AUTHORS' NAMES AND INFO ON HOW TO SUPPORT!
# extract content words
asp_cleaned = clean_texts(asp_texts)
# delete the name of the source
asp_cleaned_fin = []
for text in asp_cleaned:
  text_wo_source_name = re.sub('associated press', '', text)
  asp_cleaned_fin.append(text_wo_source_name)

The number of texts given for cleaning 69.
Processing...
The number of texts cleaned is 69.


In [None]:
asp_df['Content words'] = asp_cleaned_fin
asp_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  asp_df['Content words'] = asp_cleaned_fin


Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words
0,"Word war: In Russia-Ukraine war, information b...",2023-02-23 05:07:51,WASHINGTON (AP) — Russia's invasion of Ukraine...,FILE - Destroyed Russian armored vehicles sit ...,Associated Press,file destroy russian armored vehicle sit outsk...
1,Putin's Ukraine gamble seen as biggest threat ...,2023-02-20 07:43:46,Vladimir Putin says he learned from his boyhoo...,Putin's Ukraine gamble seen as biggest threat ...,Associated Press,putin ukraine gamble see big threat rule vladi...
2,"Yellen visits Ukraine, underscores US economic...",2023-02-27 17:52:46,"KYIV, Ukraine (AP) — U.S. Treasury Secretary J...","Ukrainian President Volodymyr Zelenskyy, left,...",Associated Press,ukrainian president volodymyr zelenskyy leave ...
3,G-20 meeting in India ends without consensus o...,2023-02-25 13:27:54,"BENGALURU, India (AP) — A meeting of finance c...",In this handout photo released by Indian Finan...,Associated Press,handout photo release indian finance ministry ...
4,"Japan, other G-7 leaders step up Russia sanctions",2023-02-24 12:43:36,TOKYO (AP) — Japanese Prime Minister Fumio Kis...,Japanese Prime Minister Fumio Kishida speaks d...,Associated Press,japanese prime minister fumio kishida speak pr...
...,...,...,...,...,...,...
66,Ukraine: Drone video shows cost of intense fig...,2023-02-24 08:44:03,"MARINKA, Ukraine (AP) — The hulking Russian ta...","New video footage shot on Feb. 19, 2023 from t...",Associated Press,new video footage shoot feb air drone show pa...
67,Russia's sports exile persists 1 year after in...,2023-02-22 14:26:09,"One year after the invasion of Ukraine began, ...",FILE - A Russian flag is held above the Olympi...,Associated Press,file russian flag hold olympic ring adler aren...
68,Ukraine war saga unfolds across the lives of 5...,2023-02-16 08:30:40,"BUCHA, Ukraine (AP) — In the cemetery where Ol...",Ukraine war saga unfolds across the lives of 5...,Associated Press,ukraine war saga unfold life friend bucha ukra...
69,Russian delegates defiant at hostile OSCE asse...,2023-02-24 21:03:56,"VIENNA, Austria (AP) — A contentious Organizat...",Head of the Russian OSCE PA delegation Petr To...,Associated Press,head russian osce pa delegation petr tolstoy a...


## 1.2.3. High credibility sources dataset

In [None]:
hc_df = pd.concat([n_df, asp_df, fp_df])
hc_df['Label'] = 0
hc_df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words,Label
0,Ten Kazakhstan nationals charged with particip...,"04:42 PM, 16 April 2023",No Summary,Kazakhstan’s National Security Committee is in...,Novaya Gazeta Europe,kazakhstan national security committee investi...,0
1,Ukraine’s Defence Minister: ‘total losses of U...,"02:25 PM, 16 April 2023",No Summary,Ukraine’s Defence Minister Oleksiy Reznikov ha...,Novaya Gazeta Europe,ukraine defence minister oleksiy reznikov spea...,0
2,Putin’s MPs,"10:44 AM, 16 April 2023",No Summary,"Back in 2003, former Duma speaker Boris Gryzlo...",Novaya Gazeta Europe,duma speaker boris gryzlov coin catchphrase pa...,0
3,Vladimir Kara-Murza: portrait of Putin’s enemy,"01:40 PM, 15 April 2023",No Summary,Foundation\nVladimir Kara-Murza was born into ...,Novaya Gazeta Europe,foundation vladimir kara murza bear family res...,0
4,"Russian singer-songwriter Semyon Slepakov, jou...","08:12 PM, 14 April 2023",No Summary,"Russian singer-songwriter Semyon Slepakov, jou...",Novaya Gazeta Europe,russian singer songwriter semyon slepakov jour...,0
...,...,...,...,...,...,...,...
2288,Polish MiG-29 Jets for Ukraine Is a Deeply Fla...,2022-03-14 15:19:10,,Successful security assistance begins with an ...,Foreign Policy,successful security assistance begin identify ...,0
2289,Russia and China Threats Are Not the Same,2022-10-13 15:44:14,,If you would like to receive Situation Report ...,Foreign Policy,like receive situation report inbox thursday s...,0
2290,How Europe Can Slash Fossil Fuels and Frustrat...,2022-03-10 09:39:12,,"Yet, in the first two weeks of Russia’s invasi...",Foreign Policy,week russia invasion ukraine eu likely pay upw...,0
2291,Moscow Strikes Back at Countries That Cross It,2022-04-18 13:15:20,,"“Mr. President, we are sure that today’s speak...",Foreign Policy,mr president sure today speaker address lot ki...,0


In [None]:
all_news_df = pd.concat([lc_df, hc_df])
all_news_df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words,Label
0,US' inspector general for Afghanistan warns ag...,2023-02-26 17:54:00,Failure to implement rigid oversight measures ...,Washington has been the largest financial cont...,Russia Today,washington large financial contributor ukraine...,1
1,US presidential hopeful calls for end to Ukrai...,2023-02-26 17:11:00,The US should focus on China and Taiwan instea...,Washington should focus on countering China an...,Russia Today,washington focus counter china tackle taiwan i...,1
2,Ukraine’s neighbor speaks out on NATO membership,2023-02-26 15:01:00,The possibility of joining NATO isn’t being ra...,"Moldova must be able to protect itself, but no...",Russia Today,moldova able protect join lead military bloc p...,1
3,Defense minister explains conditions for Russi...,2023-02-26 14:26:00,Potential Russian military advances in Ukraine...,A widening of the military operation depends o...,Russia Today,widening military operation depend weaponry ki...,1
4,Thousands rally for peace in Italy,2023-02-26 14:25:00,Several thousands have taken to the streets of...,Demonstrators in the cities of Genoa and Milan...,Russia Today,demonstrator city genoa milan demand end weapo...,1
...,...,...,...,...,...,...,...
2288,Polish MiG-29 Jets for Ukraine Is a Deeply Fla...,2022-03-14 15:19:10,,Successful security assistance begins with an ...,Foreign Policy,successful security assistance begin identify ...,0
2289,Russia and China Threats Are Not the Same,2022-10-13 15:44:14,,If you would like to receive Situation Report ...,Foreign Policy,like receive situation report inbox thursday s...,0
2290,How Europe Can Slash Fossil Fuels and Frustrat...,2022-03-10 09:39:12,,"Yet, in the first two weeks of Russia’s invasi...",Foreign Policy,week russia invasion ukraine eu likely pay upw...,0
2291,Moscow Strikes Back at Countries That Cross It,2022-04-18 13:15:20,,"“Mr. President, we are sure that today’s speak...",Foreign Policy,mr president sure today speaker address lot ki...,0


In [None]:
all_news_df.to_csv('all_news_final_apr.csv', index=False)

# 2. Feature extraction

In [None]:
path = '/content/drive/MyDrive/final_df.csv'
all_news_df = pd.read_csv(path)
all_news_df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words,Label,Adjectives only,Adjectives and adverbs only,Function words only,Punctuation,Polarity scores for adjectives and adverbs,Verbs only,Nouns only
0,US' inspector general for Afghanistan warns ag...,2023-02-26 17:54:00,Failure to implement rigid oversight measures ...,Washington has been the largest financial cont...,Russia Today,washington large financial contributor ukraine...,1,large financial robust foreign special appoint...,large financial robust foreign special appoint...,has been the to in its with must be in and fro...,", , ’ ( ) . , ’ , “ ” – “ . ” . “ , – , , , ” ...",-0.031027,prevent undermine assist oversee extend admit ...,contributor conflict safeguard place fraud cor...
1,US presidential hopeful calls for end to Ukrai...,2023-02-26 17:11:00,The US should focus on China and Taiwan instea...,Washington should focus on countering China an...,Russia Today,washington focus counter china tackle taiwan i...,1,ongoing prolong long possible presidential ear...,instead ongoing allegedly prolong long possibl...,should on and the the is being by in the betwe...,", “ ” , , . . , “ ” . . “ : . . : ‘ ’ , ‘ ’ , ...",0.056622,play seek announce want invade think look ukra...,issue conflict hostility entrepreneur candidac...
2,Ukraine’s neighbor speaks out on NATO membership,2023-02-26 15:01:00,The possibility of joining NATO isn’t being ra...,"Moldova must be able to protect itself, but no...",Russia Today,moldova able protect join lead military bloc p...,1,able military parliamentary official large pri...,able military parliamentary official previousl...,must be itself but by the its the of is being ...,", - , , . “ ” . “ , , , ” . , , “ ” . “ , ” , ...",0.085969,say join consider say hint speak want tell rep...,protect join lead speaker possibility speaker ...
3,Defense minister explains conditions for Russi...,2023-02-26 14:26:00,Potential Russian military advances in Ukraine...,A widening of the military operation depends o...,Russia Today,widening military operation depend weaponry ki...,1,military military potential western main legis...,military military potential directly western m...,a of the on the from the has has the for a of ...,", , . “ , ” ‘ . . . ’ , , . “ , ” . “ , . . ” ...",0.004444,widening depend get say widening explain tie d...,operation outline condition advance arm delive...
4,Thousands rally for peace in Italy,2023-02-26 14:25:00,Several thousands have taken to the streets of...,Demonstrators in the cities of Genoa and Milan...,Russia Today,demonstrator city genoa milan demand end weapo...,1,italian national participant local medium coll...,italian national nearly participant local medi...,in the of and are an to to up for in the of an...,". - , , . , . ( ) , “ , . ” “ ” “ , - . ” , . ...",0.103306,end turn left breach send draw organize take s...,weapon supply kiev people peace demonstration ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6572,Moldova Feels the Shock Waves of Putin’s War,2022-04-26 13:40:09,,In light of the struggles that have beset the ...,Foreign Policy,light struggle beset russian military campaign...,0,light russian military far skeptical ukrainian...,light russian military far skeptical west ukra...,in of the that have the in are can along the t...,", - , . , . , , , - - — — “ - . ” , , , . , , ...",0.026159,beset push say renew speak send seek take spur...,struggle campaign analyst concern goal create ...
6573,Polish MiG-29 Jets for Ukraine Is a Deeply Fla...,2022-03-14 15:19:10,,Successful security assistance begins with an ...,Foreign Policy,successful security assistance begin identify ...,0,successful ukrainian unable russian absent pub...,successful ukrainian unable russian absent pub...,with an what is the are or are they for their ...,": ? , - ? ? , . , . , . - . , - . : “ . . , . ...",0.053228,begin identify achieve intercept risk fight fl...,security assistance requirement battlefield re...
6574,Russia and China Threats Are Not the Same,2022-10-13 15:44:14,,If you would like to receive Situation Report ...,Foreign Policy,like receive situation report inbox thursday s...,0,national national welcome clear sigh minor cle...,national newly national welcome closely clear ...,you would in your every up and to what on for ...,", . , . : , , . ! , . . , , ( ) , . ! , . . , ...",0.059497,receive release offer condemn grab follow anti...,situation report tap day diving security strat...
6575,How Europe Can Slash Fossil Fuels and Frustrat...,2022-03-10 09:39:12,,"Yet, in the first two weeks of Russia’s invasi...",Foreign Policy,week russia invasion ukraine eu likely pay upw...,0,actual essential russian offensive quiet russi...,likely actual essential russian offensive quie...,in the of of the upwards of to for what is in ...,", , . . , . . . - , , . , . , , , , , , , , . ...",0.069159,pay translate come undermine carry move accele...,week fossil fuel response mechanism action red...


In [None]:
df = all_news_df

## 2.1. Adjectives

In [None]:
# check for missing values
if df['Content words'].isnull().any():
    # print the missing values in column A
    print(df['Content words'][df['Content words'].isnull()])
else:
    print('No missing values in a column.')

No missing values in a column.


In [None]:
df = df.drop(2445)
# check for missing values
if df['Content words'].isnull().any():
    # print the missing values in column A
    print(df['Content words'][df['Content words'].isnull()])
else:
    print('No missing values in a column.')

No missing values in a column.


In [None]:
# check if all the elements in a list are the same type
# if not, delete the ones that are inconsistent
string_type = type(df.loc[0, 'Content words'])

# check if all elements are of the same type
for index, row in df.iterrows():
    if type(row['Content words']) != string_type:
        print(f"Type {type(row['Content words'])} differs from type {string_type}.")
        break
else:
    print(f"All elements in a column are of type {string_type}.")

All elements in a column are of type <class 'str'>.


In [None]:
adjectives = []
for text in list(df['Content words']):
  doc = nlp(text)
  adj = [token.text for token in doc if token.pos_ == 'ADJ']
  adjectives.append(adj)

print('All done!')
print('The number of texts processed is', len(adjectives))
print()
print('The example:')
print(adjectives[0])

In [None]:
adjectives_joint = []
for text in adjectives:
  adj_jnt = ' '.join(text)
  adjectives_joint.append(adj_jnt)

print(adjectives_joint[0])
df['Adjectives only'] = adjectives_joint

large financial robust foreign special appoint military significant allocate foreign bad steal accomplish intend ukrainian corrupt ukrainian necessary average ukrainian similar afghan recent military financial ukraine special ukrainian


## 2.2. Adjectives and adverbs

In [None]:
adjectives_and_adverbs = []
for text in list(df['Content words']):
  doc = nlp(text)
  adj_adv = [token.text for token in doc if token.pos_ in ['ADJ', 'ADV']]
  adjectives_and_adverbs.append(adj_adv)

print('All done!')
print('The number of texts processed is', len(adjectives_and_adverbs))
print()
print('The example:')
print(adjectives_and_adverbs[0])

All done!
The number of texts processed is 6578

The example:
['large', 'financial', 'robust', 'foreign', 'special', 'appoint', 'military', 'significant', 'principally', 'allocate', 'severely', 'foreign', 'bad', 'steal', 'accomplish', 'intend', 'ukrainian', 'corrupt', 'ukrainian', 'necessary', 'average', 'ukrainian', 'similar', 'afghan', 'recent', 'military', 'financial', 'ukraine', 'special', 'ukrainian']


In [None]:
adj_adv_joint = []
for text in adjectives_and_adverbs:
  adj_adv_jnt = ' '.join(text)
  adj_adv_joint.append(adj_adv_jnt)

print(adj_adv_joint[0])
df['Adjectives and adverbs only'] = adj_adv_joint

large financial robust foreign special appoint military significant principally allocate severely foreign bad steal accomplish intend ukrainian corrupt ukrainian necessary average ukrainian similar afghan recent military financial ukraine special ukrainian


In [None]:
df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words,Label,Adjectives only,Adjectives and adverbs only
0,US' inspector general for Afghanistan warns ag...,2023-02-26 17:54:00,Failure to implement rigid oversight measures ...,Washington has been the largest financial cont...,Russia Today,washington large financial contributor ukraine...,1,large financial robust foreign special appoint...,large financial robust foreign special appoint...
1,US presidential hopeful calls for end to Ukrai...,2023-02-26 17:11:00,The US should focus on China and Taiwan instea...,Washington should focus on countering China an...,Russia Today,washington focus counter china tackle taiwan i...,1,ongoing prolong long possible presidential ear...,instead ongoing allegedly prolong long possibl...
2,Ukraine’s neighbor speaks out on NATO membership,2023-02-26 15:01:00,The possibility of joining NATO isn’t being ra...,"Moldova must be able to protect itself, but no...",Russia Today,moldova able protect join lead military bloc p...,1,able military parliamentary official large pri...,able military parliamentary official previousl...
3,Defense minister explains conditions for Russi...,2023-02-26 14:26:00,Potential Russian military advances in Ukraine...,A widening of the military operation depends o...,Russia Today,widening military operation depend weaponry ki...,1,military military potential western main legis...,military military potential directly western m...
4,Thousands rally for peace in Italy,2023-02-26 14:25:00,Several thousands have taken to the streets of...,Demonstrators in the cities of Genoa and Milan...,Russia Today,demonstrator city genoa milan demand end weapo...,1,italian national participant local medium coll...,italian national nearly participant local medi...
...,...,...,...,...,...,...,...,...,...
6574,Polish MiG-29 Jets for Ukraine Is a Deeply Fla...,2022-03-14 15:19:10,,Successful security assistance begins with an ...,Foreign Policy,successful security assistance begin identify ...,0,successful ukrainian unable russian absent pub...,successful ukrainian unable russian absent pub...
6575,Russia and China Threats Are Not the Same,2022-10-13 15:44:14,,If you would like to receive Situation Report ...,Foreign Policy,like receive situation report inbox thursday s...,0,national national welcome clear sigh minor cle...,national newly national welcome closely clear ...
6576,How Europe Can Slash Fossil Fuels and Frustrat...,2022-03-10 09:39:12,,"Yet, in the first two weeks of Russia’s invasi...",Foreign Policy,week russia invasion ukraine eu likely pay upw...,0,actual essential russian offensive quiet russi...,likely actual essential russian offensive quie...
6577,Moscow Strikes Back at Countries That Cross It,2022-04-18 13:15:20,,"“Mr. President, we are sure that today’s speak...",Foreign Policy,mr president sure today speaker address lot ki...,0,sure contentious future long armed national ou...,sure contentious future instead long armed nat...


## 2.3. Funciton words and punctuation

In [None]:
# punktuation and function words extraction from raw texts
punctuation = []
function_words = []
function_word_tags = {"DET", "PRON", "ADP", "CCONJ", "AUX"}

for text in df['Texts']:
  doc = nlp(text)
  punct = [token.text for token in doc if token.is_punct]
  punctuation.append(punct)

  fw = [token.text for token in doc if token.pos_ in function_word_tags]
  function_words.append(fw)

fw_joint = []
for text in function_words:
  fw_jnt = ' '.join(text)
  fw_joint.append(fw_jnt)

punct_joint = []
for text in punctuation:
  p_jnt = ' '.join(text)
  punct_joint.append(p_jnt)

fw_joint_low = []
for text in fw_joint:
  lowered = text.lower()
  fw_joint_low.append(lowered)

df['Function words only'] = fw_joint_low
df['Punctuation'] = punct_joint
df

## 2.4. Verbs

In [None]:
verbs = []
for text in list(df['Content words']):
  doc = nlp(text)
  verb = [token.text for token in doc if token.pos_ == 'VERB']
  verbs.append(verb)

print('All done!')
print('The number of texts processed is', len(verbs))
print()
print('The example:')
print(verbs[0])

verbs_joint = []
for text in verbs:
  vrb_jnt = ' '.join(text)
  verbs_joint.append(vrb_jnt)

print(verbs_joint[0])
df['Verbs only'] = verbs_joint

All done!
The number of texts processed is 6578

The example:
['prevent', 'undermine', 'assist', 'oversee', 'extend', 'admit', 'rebuilding', 'predict', 'learn', 'limit', 'gets', 'provide', 'win', 'say', 'explain', 'bind', 'steal', 'add', 'intend', 'cite', 'conclude', 'lose', 'continue', 'announce', 'impose', 'send', 'say', 'call', 'oversee', 'find']
prevent undermine assist oversee extend admit rebuilding predict learn limit gets provide win say explain bind steal add intend cite conclude lose continue announce impose send say call oversee find


In [None]:
# delete the article about Scotland
df = df.drop(6578)

## 2.5. Polarity scores for adjectives and adverbs

In [None]:
# sentiment analysis with textblob
from textblob import TextBlob

polarity_scores = []
for text in df['Adjectives and adverbs only']:
  blob = TextBlob(text)
  polarity = blob.sentiment.polarity
  polarity_scores.append(polarity)
print(len(polarity_scores))
df['Polarity scores for adjectives and adverbs'] = polarity_scores

6578


## 2.6. Nouns

In [None]:
nouns = []
for text in list(df['Content words']):
  doc = nlp(text)
  noun = [token.text for token in doc if token.pos_ == 'NOUN']
  nouns.append(noun)

print('All done!')
print('The number of texts processed is', len(nouns))
print()
print('The example:')
print(nouns[0])

nouns_joint = []
for text in nouns:
  nn_jnt = ' '.join(text)
  nouns_joint.append(nn_jnt)

print(nouns_joint[0])
df['Nouns only'] = nouns_joint

All done!
The number of texts processed is 6577

The example:
['contributor', 'conflict', 'safeguard', 'place', 'fraud', 'corruption', 'aid', 'package', 'conflict', 'role', 'presence', 'country', 'fund', 'divert', 'failure', 'mistake', 'impact', 'assistance', 'outcome', 'assistance', 'divert', 'way', 'purpose', 'case', 'weapon', 'money', 'wastage', 'element', 'host', 'government', 'government', 'money', 'oversight', 'waste', 'impact', 'aid', 'lead', 'loss', 'support', 'measure', 'experience', 'soldier', 'police', 'officer', 'confidence', 'government', 'hand', 'observation', 'corruption', 'increase', 'aid', 'package', 'month', 'year', 'anniversary', 'launch', 'operation', 'country', 'aid', 'package', 'sanction', 'tariff', 'increase', 'rubber', 'stamp', 'assistance', 'month', 'plan', 'committee', 'track', 'asset', 'exercise', 'inspector', 'appoint', 'aid', 'report', 'month', 'leader', 'fire', 'government', 'official', 'engage', 'bribery', 'corruption']
contributor conflict safeguard plac

In [None]:
df

Unnamed: 0,Headings,Publication dates,Summaries,Texts,Source,Content words,Label,Adjectives only,Adjectives and adverbs only,Function words only,Punctuation,Polarity scores for adjectives and adverbs,Verbs only,Nouns only
0,US' inspector general for Afghanistan warns ag...,2023-02-26 17:54:00,Failure to implement rigid oversight measures ...,Washington has been the largest financial cont...,Russia Today,washington large financial contributor ukraine...,1,large financial robust foreign special appoint...,large financial robust foreign special appoint...,has been the to in its with must be in and fro...,", , ’ ( ) . , ’ , “ ” – “ . ” . “ , – , , , ” ...",-0.031027,prevent undermine assist oversee extend admit ...,contributor conflict safeguard place fraud cor...
1,US presidential hopeful calls for end to Ukrai...,2023-02-26 17:11:00,The US should focus on China and Taiwan instea...,Washington should focus on countering China an...,Russia Today,washington focus counter china tackle taiwan i...,1,ongoing prolong long possible presidential ear...,instead ongoing allegedly prolong long possibl...,should on and the the is being by in the betwe...,", “ ” , , . . , “ ” . . “ : . . : ‘ ’ , ‘ ’ , ...",0.056622,play seek announce want invade think look ukra...,issue conflict hostility entrepreneur candidac...
2,Ukraine’s neighbor speaks out on NATO membership,2023-02-26 15:01:00,The possibility of joining NATO isn’t being ra...,"Moldova must be able to protect itself, but no...",Russia Today,moldova able protect join lead military bloc p...,1,able military parliamentary official large pri...,able military parliamentary official previousl...,must be itself but by the its the of is being ...,", - , , . “ ” . “ , , , ” . , , “ ” . “ , ” , ...",0.085969,say join consider say hint speak want tell rep...,protect join lead speaker possibility speaker ...
3,Defense minister explains conditions for Russi...,2023-02-26 14:26:00,Potential Russian military advances in Ukraine...,A widening of the military operation depends o...,Russia Today,widening military operation depend weaponry ki...,1,military military potential western main legis...,military military potential directly western m...,a of the on the from the has has the for a of ...,", , . “ , ” ‘ . . . ’ , , . “ , ” . “ , . . ” ...",0.004444,widening depend get say widening explain tie d...,operation outline condition advance arm delive...
4,Thousands rally for peace in Italy,2023-02-26 14:25:00,Several thousands have taken to the streets of...,Demonstrators in the cities of Genoa and Milan...,Russia Today,demonstrator city genoa milan demand end weapo...,1,italian national participant local medium coll...,italian national nearly participant local medi...,in the of and are an to to up for in the of an...,". - , , . , . ( ) , “ , . ” “ ” “ , - . ” , . ...",0.103306,end turn left breach send draw organize take s...,weapon supply kiev people peace demonstration ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6573,Moldova Feels the Shock Waves of Putin’s War,2022-04-26 13:40:09,,In light of the struggles that have beset the ...,Foreign Policy,light struggle beset russian military campaign...,0,light russian military far skeptical ukrainian...,light russian military far skeptical west ukra...,in of the that have the in are can along the t...,", - , . , . , , , - - — — “ - . ” , , , . , , ...",0.026159,beset push say renew speak send seek take spur...,struggle campaign analyst concern goal create ...
6574,Polish MiG-29 Jets for Ukraine Is a Deeply Fla...,2022-03-14 15:19:10,,Successful security assistance begins with an ...,Foreign Policy,successful security assistance begin identify ...,0,successful ukrainian unable russian absent pub...,successful ukrainian unable russian absent pub...,with an what is the are or are they for their ...,": ? , - ? ? , . , . , . - . , - . : “ . . , . ...",0.053228,begin identify achieve intercept risk fight fl...,security assistance requirement battlefield re...
6575,Russia and China Threats Are Not the Same,2022-10-13 15:44:14,,If you would like to receive Situation Report ...,Foreign Policy,like receive situation report inbox thursday s...,0,national national welcome clear sigh minor cle...,national newly national welcome closely clear ...,you would in your every up and to what on for ...,", . , . : , , . ! , . . , , ( ) , . ! , . . , ...",0.059497,receive release offer condemn grab follow anti...,situation report tap day diving security strat...
6576,How Europe Can Slash Fossil Fuels and Frustrat...,2022-03-10 09:39:12,,"Yet, in the first two weeks of Russia’s invasi...",Foreign Policy,week russia invasion ukraine eu likely pay upw...,0,actual essential russian offensive quiet russi...,likely actual essential russian offensive quie...,in the of of the upwards of to for what is in ...,", , . . , . . . - , , . , . , , , , , , , , . ...",0.069159,pay translate come undermine carry move accele...,week fossil fuel response mechanism action red...


In [None]:
df.to_csv('final_df.csv', index=False)