In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_colwidth', -1)

## Preparing data in the format for ner tags

Original input data is in format :
* Each input article was given in seperated text file (article id in text file name), for train and dev sets
* With the same article id in label file , we get annotation of propaganda spans:
    * article_id propaganda start_span end_span

    * article id 1111 is present in article_1111.txt and label is present in article_1111_TC.txt 
         * article_1111.txt : Trump the white president say he likes black people working from him .
         * article_1111_TC.txt : 
             * 1111 <propaganda type> 11 88

    
From this input we convert we covert the articles to files for each article ids , but in positions of span text , markings of start span and end span.This is done by using the scripts given by the competition organizer. For example : 

 * the output after processing looks like 
    
    article_1111.txt : Trump the <span-7> white president say he likes black people working from him. <7-/span> 
    
    the id 7 comes from type of propaganda. each propaganda has been given a seperated id .
    
From the above output the following preprocessing turns this article id into seperate sentences each having ner like labels
    
Example 
    
      file no : article_1111.txt
      
      Trump O
      the O
      white I 
      president I
      say I
      he I
      likes I
      black I
      people I
      working I
      from I
      him I
      . I


In [2]:
import pandas as pd
import glob
import swifter
path = r'/data/semeval-2020/task-11/datasets/train-tagged_article/'
all_files = glob.glob(path + "/*.txt")

content_tuple = []

for filename in all_files:
    file = open(filename)
    content = "".join(file.readlines()).replace("\n"," ")
    article_id = filename.split("article")[-1].split(".txt")[0]
    content_tuple.append((article_id,content))

In [3]:
content_df = pd.DataFrame(content_tuple,columns=["article_id","content"])

In [4]:
content_df.head(3)

Unnamed: 0,article_id,content
0,762956953,"Iran Admits To Aiding Al-Qaeda and Facilitating 9/11 Jihad Terror Attacks This has long been known, although the mainstream media dismissed it as a conspiracy theory. But now we have definitive confirmation. It was Iran Bush should have invaded after 9/11 , not Iraq. Now consider this: even though, as President of the United States, Barack Obama had access to information that the general public does not have, and certainly knew of Iran’s involvement in the 9/11 attacks, Obama still pursued the Iran nuclear deal and gave billions to the Islamic Republic. The Iran nuclear deal should never have proceeded — President Obama, <span-6 the worst president in American history 6-/span> . “Iran Admits To Facilitating 9/11 Terror Attacks,” by Adam Kredo, Washington Free Beacon, June 8, 2018: Iranian officials, in a first, have admitted to facilitating the 9/11 terrorist attacks in the U.S. by secretly aiding the free travel of al Qaeda operatives who eventually went on to fly commercial airliners into the Twin Towers in New York City, according to new remarks from a senior Iranian official. Mohammad-Javad Larijani, an international affairs assistant in the Iran’s judiciary, disclosed in Farsi-language remarks broadcast on Iran’s state-controlled television that Iranian intelligence officials secretly helped provide the al Qaeda attackers with passage and gave them refuge in the Islamic Republic, according to an English translation published by Al Arabiya. “Our government agreed not to stamp the passports of some of them because they were on transit flights for two hours, and they were resuming their flights without having their passports stamped. However their movements were under the complete supervision of the Iranian intelligence,” Larijani was quoted as saying. The remarks represent the first time senior Iranian officials have publicly admitted to aiding al Qaeda and playing a direct role in facilitating the 9/11 attacks. The U.S. government has long accused Iran of playing a role in the attacks and even fined the Islamic Republic billions as a result. The U.S. 9/11 Commission assembled to investigate the attacks concluded that Iran played a role in facilitating the al Qaeda terrorists. Larijani admitted that Iranian officials did not stamp the passports of the al Qaeda militants in order to obfuscate their movements and prevent detection by foreign governments. Al Qaeda operative also were given safe refuge in Iran…. Article posted with permission from Pamela Geller"
1,787529309,"The Last-Minute <span-11 Character Assassination 11-/span>of Judge Kavanaugh Using any <span-8 despicable tactic 8-/span>at hand to derail Judge Brett Kavanaugh’s Supreme Court confirmation less than a week before the Senate Judiciary Committee is scheduled to vote on whether to approve his nomination, Senate Democrats <span-8 have sunk to their lowest level 8-/span>of <span-11 character assassination 11-/span>yet. They have resorted to peddling an allegation of sexual misconduct against Judge Kavanaugh that supposedly occurred while the judge was in high school. The accuser had refused to identify herself before and during the Senate Judiciary Committee hearings. She conveniently waited until this Sunday to come forward via an on-the-record interview with the Washington Post. The accuser’s name is Christine Blasey Ford, a registered Democrat who is currently a California professor teaching clinical psychology. Judge Kavanaugh issued a statement on Friday in which he said, ""I <span-8 categorically and unequivocally deny 8-/span>this allegation. I did not do this back in high school or at any time."" Senator Dianne Feinstein (D-Calif.), the ranking Democrat on the Senate Judiciary Committee that heard Judge Kavanaugh’s public testimony earlier this month during his Supreme Court confirmation hearing, had received last July a copy of a letter written by the woman making the charge, who we now know was Ms. Ford. Even though Senator Feinstein had the letter in hand, she never brought up the charge during the public hearing, nor during her own meeting with the judge. Instead, Senator Feinstein sat on the letter until late last week, when she issued a cryptic release stating that she had received the letter but did not want to give more details in deference to the woman’s wish to keep the matter confidential. Senator Feinstein turned the letter over to the FBI. The FBI placed the letter in its background file on Judge Kavanaugh but decided not to pursue any further investigation. Senator Feinstein had initially resisted sharing the contents of the letter with her fellow Democrat members of the Senate Judiciary Committee or to go public with its existence because “the incident was too distant in the past to merit public discussion” and she had already “taken care of it,” according to a source quoted by The New Yorker. Nevertheless, Senator Feinstein evidently bowed to pressure from her <span-9 leftist colleagues 9-/span>to find a way to insert the allegation into the <span-8 cesspool of public gossip 8-/span>at the eleventh hour. The New Yorker article, written before Ms. Ford publicly identified herself, provided some details regarding her allegation. However, now Ms. Ford has decided to do what she called her “civic responsibility” and tell her own story publicly. <span-8 How convenient 8-/span>, coming just 4 days before the scheduled Senate Judiciary Committee vote! The whole sequence of events surrounding how this allegation has suddenly come to light <span-8 reeks 8-/span>of a set-up, reminiscent of how Anita Hill surfaced in a last-minute attempt to derail Justice Clarence Thomas's Supreme Court confirmation. Christine Blasey Ford claims, according to the Washington Post article, that “one summer in the early 1980s, Kavanaugh and a friend — both ‘<span-8 stumbling drunk 8-/span>,’ Ford alleges — corralled her into a bedroom during a gathering of teenagers at a house in Montgomery County. While his friend watched, she said, Kavanaugh pinned her to a bed on her back and groped her over her clothes, grinding his body against hers and clumsily attempting to pull off her one-piece bathing suit and the clothing she wore over it. When she tried to scream, she said, he put his hand over her mouth. ‘I thought he might inadvertently kill me,’ said Ford. ‘He was trying to attack me and remove my clothing.’” Ms. Ford said she was able to escape the room and go home without any apparent further incident after “Kavanaugh’s friend and classmate at Georgetown Preparatory School, Mark Judge, jumped on top of them, sending all three tumbling.” Here is where Ms. Ford’s story becomes quite murky and begins to fall apart. Although Ms. Ford believes the alleged incident occurred during the summer of 1982, she “said she does not remember some key details of the incident,” according to the Washington Post article. For example, Ms. Ford “said she does not remember how the gathering came together the night of the incident.” She also does not remember how she got home. Yet she claims to be absolutely certain that Kavanaugh, whom she presumably knew only as an acquaintance and said she had not spoken to since the night the incident allegedly occurred, was involved in the alleged incident. Ms. Ford admitted that she “told no one at the time what had happened to her.” In fact, she said she recalled thinking: “I’m not ever telling anyone this. This is nothing, it didn’t happen, and he didn’t rape me.” Even if one explains this behavior as the natural reaction of a frightened teenager to a highly traumatic incident, that does not explain why, by her own admission, she “told no one of the incident in any detail until 2012, when she was in couples therapy with her husband,” according to the Washington Post article. Most revealingly, the article reported on a <span-8 gaping hole in the therapist’s notes 8-/span>, portions of which were provided by Ms. Ford for the Washington Post’s review. The therapist’s notes “do not mention Kavanaugh’s name but say she reported that she was attacked by students ‘from an <span-9 elitist boys’ school 9-/span>’ who went on to become ‘highly respected and high-ranking members of society in Washington.’” In other words, the only written documentation Ms. Ford has offered in support of her allegation about the incident she said took place while she was in high school – a therapist’s notes of a couples therapy session occurring 30 years after the alleged incident – did not mention Judge Kavanaugh’s name. Judge Kavanaugh has had extensive background checks performed on him in the past for his various federal government positions, including for his current position as a federal appellate court judge, without the accusation ever having surfaced. <span-5 Ms. Ford may believe her story to be true, but the lack of any credible corroborating evidence, her partial memory of details surrounding the alleged incident, and the absence of any pattern of such sexual misconduct by Judge Kavanaugh undercut the reliability of her version of the incident 5-/span>. In a letter addressed to Senate Judiciary Committee Chairman Charles Grassley (R-Iowa) and Senator Feinstein, 65 women who said they knew Judge Kavanaugh in high school vouched for his character: We are women who have known Brett Kavanaugh for more than 35 years and knew him while he attended high school between 1979 and 1983. <span-0 For the entire time we have known Brett Kavanaugh, he has behaved honorably and treated women with respect 0-/span>… Brett attended Georgetown Prep, an all-boys high school in Rockville, Maryland. He was an outstanding student and athlete with a wide circle of friends. Almost all of us attended all-girls high schools in the area. We knew Brett well through social events, sports, church, and various other activities. Many of us have remained close friends with him and his family over the years. Through the more than 35 years we have known him, Brett has stood out for his friendship, character, and integrity. In particular, he has always treated women with decency and respect. That was true when he was in high school, and it has remained true to this day. The signers of this letter hold a broad range of political views. Many of us are not lawyers, but we know Brett Kavanaugh as a person. And he has always been a good person. Nevertheless, using <span-8 their standard contemptible, obstructionist tactics 8-/span>, the Democrats opposed to Judge Kavanaugh happily seized on the unsubstantiated allegation of teen sexual misbehavior in high school to <span-11 assassinate Judge Kavanaugh’s character 11-/span>. They have done so in the face of Judge Kavanaugh’s lifetime record of stellar public service, multiple background checks producing no evidence of sexual misconduct, and the letter written by the 65 women, who knew him when he was in high school and thereafter and who signed their names to a ringing endorsement of his good character. Feminists <span-4 gave the <span-9 serial sexual predator 9-/span>Bill Clinton a free pass because his policies were in line with their ideology 4-/span>. Senator Feinstein called <span-9 Ted “Chappaquiddick” Kennedy 9-/span>an ""inspiration and a friend,"" presumably also based on their compatible ideologies. Hypocritically exploiting an unsubstantiated allegation of decades-old purported teenage sexual mischief, Democrats <span-8 seeking to torpedo 8-/span>Judge Kavanaugh’s Supreme Court confirmation for ideological reasons have debased themselves with a <span-8 shameless smear campaign 8-/span>against an eminently qualified candidate for the Supreme Court. Predictably, Senate Minority Leader Charles Schumer (D-N.Y.) and other Democrats, including Senator Feinstein, have called for the Senate to postpone a vote on Judge Kavanaugh. ""Senator Grassley must postpone the vote until, at a very minimum, these serious and credible allegations are thoroughly investigated,” Senator Schumer said. If a thorough investigation was considered to be so important, <span-5 why didn't Senator Feinstein set the ball rolling back in July when she first received word of the allegation 5-/span>? The answer is that <span-4 this is all a ruse to block Judge Kavanaugh's confirmation by all means necessary 4-/span>. Senator Schumer is fulfilling his promise to oppose Judge Kavanaugh with ""everything I've got."" As of now, the Senate Judiciary Committee Republican majority plans to move forward with Judge Kavanaugh's nomination as scheduled. It is time for the Democrat obstructionists to <span-8 slink back into their shadowy corner 8-/span>."
2,999001296,"Altered Election Documents Tied To Florida Democrats Reviewed By Federal Prosecutors <span-8 It is high time 8-/span>that this begin. The Democrats are committing voter fraud <span-6 on a <span-8 massive 8-/span>scale 6-/span>, and not only in Florida, but in Arizona, Georgia, California and elsewhere. In Arizona, it is already over, and <span-9 a pro-jihad Marxist Democrat 9-/span>who lost the election will be going to the Senate. In Florida, <span-7 the Democrats are likewise working feverishly to overturn the will of the people 7-/span>. <span-6 They are <span-8 insane 8-/span>in their lust for power, and will destroy even our democratic system to get it 6-/span>. “Federal prosecutors reviewing altered election documents tied to Florida Democrats,” by Matt Dixon, Politico, November 14, 2018: TALLAHASSEE — The Florida Department of State last week asked federal prosecutors to investigate dates that were changed on official state election documents, the first voting “irregularities” it has flagged in the wake of the 2018 elections. take our poll - story continues below Should Jim Acosta have gotten his press pass back? Should Jim Acosta have gotten his press pass back? Should Jim Acosta have gotten his press pass back? * Yes, he should have gotten it back. No, you can't act like a child and keep your pass. Maybe? I'm not sure if he should have. Email * Name This field is for validation purposes and should be left unchanged. Completing this poll grants you access to Freedom Outpost updates free of charge. You may opt out at anytime. You also agree to this site's Privacy Policy and Terms of Use. The concerns, which the department says can be tied to the Florida Democratic Party, center around date changes on forms used to fix vote-by-mail ballots sent with incorrect or missing information. Known as “cure affidavits,” those documents used to fix mail ballots were due no later than 5 p.m. on Nov. 5 — the day before the election. But affidavits released on Tuesday by the DOS show that documents from four different counties said the ballots could be returned by 5 p.m. on Thursday, which is not accurate. Audio of a Florida Democratic Party caller leaving a voicemail message asking a Palm Beach County voter to fix their vote by mail ballot after Election Day, which is not allowed, was also sent to POLITICO separately. It was not part of the information turned over to federal prosecutors. Among the counties in question is Broward, which emerged as the epicenter of controversy as three statewide races and three local legislative races went into recounts following the Nov. 6 elections. <span-5 Republicans have pointed to <span-9 embattled Broward Elections chief Brenda Snipes9-/span>’ record of <span-8 past election gaffes 8-/span>in arguing that the largely Democratic country is tilted against them — perhaps fraudulently so 5-/span>. DOS officials have repeatedly told the media that the monitors they sent to Broward County saw no election fraud. It wasn’t until Tuesday that the office revealed publicly that it had turned over information to federal prosecutors. <span-0 The information was sent on Nov. 9 by Bradley McVay, DOS’ interim general counsel, who asked that the altered dates be investigated. “Altering a form in a manner that provides the incorrect date for a voter to cure a defect … imposes a burden on the voter significant enough to frustrate the voter’s ability to vote,” McVay wrote in a letter that was sent Nov. 9 and released publicly on Tuesday 0-/span>. The letter was sent to U.S. Attorneys Christopher P. Canova of the Northern District of Florida, Maria Chapa Lopez of the Middle District of Florida and Ariana Fajardo Orshan in the Southern District of Florida. The records released by DOS, which is part of Gov. Rick Scott’s administration, <span-8 point the finger 8-/span>at the Florida Democratic Party. Political parties can get daily lists of people who had their mail-in ballots rejected. Political parties — or anyone else — can also get the publicly available cure affidavits and send them to voters who had a mail-in ballot rejected to encourage them to fix the ballots. In an email chain released as part of the Department of State’s Tuesday document dump, Citrus County Supervisor of Elections Susan Gill last week told DOS officials that a voter who received one of the cure affidavits with the wrong date had also received a call from a number identified as the Tallahassee office of the Florida Democratic Party, an indication the party was reaching out about her vote by mail ballot. “When I called it, it is the Democratic Party of Florida,” she said in a Nov. 8 email to DOS officials. She went on to write that she thinks the incorrect date was used because whoever sent the cure affidavit mixed up the deadline for cure affidavits with the deadline for provisional ballots. But, she said, “<span-13 a bigger problem is the fact they actually changed one of the DOE forms 13-/span>.”…"


In [5]:
from spacy.tokenizer import Tokenizer
from spacy.lang.en import English   # updated , 
nlp = English()
nlp.add_pipe(nlp.create_pipe('sentencizer')) # updated
tokenizer = Tokenizer(nlp.vocab)

content_df[content_df["article_id"] == "735855251"].content[30]

'ICE arrests 20 in Kansas City during 4-day operation targeting criminal aliens  KANSAS CITY, Mo. — Federal officers with U.S. Immigration and Customs Enforcement’s (ICE) Enforcement and Removal Operations (ERO) arrested 20 criminal aliens and immigration violators in the Kansas City metro area during a four-day enforcement operation, which ended Thursday. During this operation, ERO deportation officers made arrests in the following Missouri cities: St. Joseph (6), Belton (1), Blue Springs (1), Independence (2) and Kansas City (6). ICE officers also made arrests in the Kansas cities of Olathe (3) and Lawrence (1). Fifteen men and five women, ages ranging 18-61, were arrested. Aliens arrested during this operation are from the following countries: Brazil (1), El Salvador (3), Guatemala (6), Honduras (1), Mexico (7), Romania (1) and Sierra Leone (1). Several of the aliens targeted by ERO deportation officers during this operation had prior criminal histories that included driving under t

In [6]:
for i in nlp(content_df[content_df["article_id"] == "735855251"].content[30]).sents:
    print(i)
    print("--")

ICE arrests 20 in Kansas City during 4-day operation targeting criminal aliens  KANSAS CITY, Mo. — Federal officers with U.S. Immigration and Customs Enforcement’s (ICE) Enforcement and Removal Operations (ERO) arrested 20 criminal aliens and immigration violators in the Kansas City metro area during a four-day enforcement operation, which ended Thursday.
--
During this operation, ERO deportation officers made arrests in the following Missouri cities: St. Joseph (6), Belton (1), Blue Springs (1), Independence (2) and Kansas City (6).
--
ICE officers also made arrests in the Kansas cities of Olathe (3) and Lawrence (1).
--
Fifteen men and five women, ages ranging 18-61, were arrested.
--
Aliens arrested during this operation are from the following countries: Brazil (1), El Salvador (3), Guatemala (6), Honduras (1), Mexico (7), Romania (1) and Sierra Leone (1).
--
Several of the aliens targeted by ERO deportation officers during this operation had prior criminal histories that included d

In [7]:
def split_into_sentences(row):
    sentences = []
    for ix,sent in enumerate(nlp(row.content).sents):
        sentences.append((row.article_id,ix,sent.text))
    return sentences    

In [8]:
sentences_dataset = content_df.apply(split_into_sentences,axis=1).values.tolist()

In [9]:
len(sentences_dataset)

371

In [10]:
sentences_dataset = [j for i in sentences_dataset for j in i]

In [11]:
len(sentences_dataset)

15565

In [12]:
sentences_dataset[:10]

[('762956953',
  0,
  'Iran Admits To Aiding Al-Qaeda and Facilitating 9/11 Jihad Terror Attacks  This has long been known, although the mainstream media dismissed it as a conspiracy theory.'),
 ('762956953', 1, 'But now we have definitive confirmation.'),
 ('762956953',
  2,
  'It was Iran Bush should have invaded after 9/11 , not Iraq.'),
 ('762956953',
  3,
  'Now consider this: even though, as President of the United States, Barack Obama had access to information that the general public does not have, and certainly knew of Iran’s involvement in the 9/11 attacks, Obama still pursued the Iran nuclear deal and gave billions to the Islamic Republic.'),
 ('762956953',
  4,
  'The Iran nuclear deal should never have proceeded — President Obama, <span-6 the worst president in American history 6-/span> . “'),
 ('762956953',
  5,
  'Iran Admits To Facilitating 9/11 Terror Attacks,” by Adam Kredo, Washington Free Beacon, June 8, 2018: Iranian officials, in a first, have admitted to facilitat

In [13]:
sentences_df = pd.DataFrame(sentences_dataset,columns=["article_id","sentence_id","sentence"])

In [14]:
sentences_df.head(4)

Unnamed: 0,article_id,sentence_id,sentence
0,762956953,0,"Iran Admits To Aiding Al-Qaeda and Facilitating 9/11 Jihad Terror Attacks This has long been known, although the mainstream media dismissed it as a conspiracy theory."
1,762956953,1,But now we have definitive confirmation.
2,762956953,2,"It was Iran Bush should have invaded after 9/11 , not Iraq."
3,762956953,3,"Now consider this: even though, as President of the United States, Barack Obama had access to information that the general public does not have, and certainly knew of Iran’s involvement in the 9/11 attacks, Obama still pursued the Iran nuclear deal and gave billions to the Islamic Republic."


In [15]:
sentences_df[sentences_df["article_id"] == "735855251"]

Unnamed: 0,article_id,sentence_id,sentence
1049,735855251,0,"ICE arrests 20 in Kansas City during 4-day operation targeting criminal aliens KANSAS CITY, Mo. — Federal officers with U.S. Immigration and Customs Enforcement’s (ICE) Enforcement and Removal Operations (ERO) arrested 20 criminal aliens and immigration violators in the Kansas City metro area during a four-day enforcement operation, which ended Thursday."
1050,735855251,1,"During this operation, ERO deportation officers made arrests in the following Missouri cities: St. Joseph (6), Belton (1), Blue Springs (1), Independence (2) and Kansas City (6)."
1051,735855251,2,ICE officers also made arrests in the Kansas cities of Olathe (3) and Lawrence (1).
1052,735855251,3,"Fifteen men and five women, ages ranging 18-61, were arrested."
1053,735855251,4,"Aliens arrested during this operation are from the following countries: Brazil (1), El Salvador (3), Guatemala (6), Honduras (1), Mexico (7), Romania (1) and Sierra Leone (1)."
1054,735855251,5,"Several of the aliens targeted by ERO deportation officers during this operation had prior criminal histories that included driving under the influence, child neglect, child abuse, drug offenses, fraud and larceny."
1055,735855251,6,"Four of these were arrested for illegally re-entering the United States after having been deported, which is a felony."
1056,735855251,7,Two overstayed lawful visits to the U.S. All were amenable to arrest and removal under the U.S. Immigration and Nationality Act.
1057,735855251,8,"The following are criminal summaries of some of the offenders arrested in the Kansas City area during this operation: A 55-year-old, Mexican citizen who overstayed a lawful visit to the U.S. by more than 12 years."
1058,735855251,9,"She was arrested Feb. 26, 2018 in Johnson County Kansas."


In [16]:
"Iran Admits To Aiding Al-Qaeda and Facilitating 9/11 Jihad Terror Attacks This has long been known, although the mainstream media dismissed it as a conspiracy theory.".find("span")

-1

In [17]:
def has_propaganda(sentence):
    if sentence.find("-/span") > 0:
        return True
    if sentence.find("span-") > 0:
        return True
    return False

In [18]:
sentences_df["has_propaganda"] = sentences_df["sentence"].apply(has_propaganda)

In [19]:
sentences_df.head(4)

Unnamed: 0,article_id,sentence_id,sentence,has_propaganda
0,762956953,0,"Iran Admits To Aiding Al-Qaeda and Facilitating 9/11 Jihad Terror Attacks This has long been known, although the mainstream media dismissed it as a conspiracy theory.",False
1,762956953,1,But now we have definitive confirmation.,False
2,762956953,2,"It was Iran Bush should have invaded after 9/11 , not Iraq.",False
3,762956953,3,"Now consider this: even though, as President of the United States, Barack Obama had access to information that the general public does not have, and certainly knew of Iran’s involvement in the 9/11 attacks, Obama still pursued the Iran nuclear deal and gave billions to the Islamic Republic.",False


In [20]:
sentences_df.has_propaganda.value_counts()

False    10608
True     4957 
Name: has_propaganda, dtype: int64

In [21]:
import re
def cleaned_sentence(sentence):
    return re.sub("<span-\d+", "", re.sub("\d+-/span>", "", sentence))

In [22]:
sentences_df["cleaned_sentence"] = sentences_df["sentence"].apply(cleaned_sentence)

In [23]:
sentences_df.head()

Unnamed: 0,article_id,sentence_id,sentence,has_propaganda,cleaned_sentence
0,762956953,0,"Iran Admits To Aiding Al-Qaeda and Facilitating 9/11 Jihad Terror Attacks This has long been known, although the mainstream media dismissed it as a conspiracy theory.",False,"Iran Admits To Aiding Al-Qaeda and Facilitating 9/11 Jihad Terror Attacks This has long been known, although the mainstream media dismissed it as a conspiracy theory."
1,762956953,1,But now we have definitive confirmation.,False,But now we have definitive confirmation.
2,762956953,2,"It was Iran Bush should have invaded after 9/11 , not Iraq.",False,"It was Iran Bush should have invaded after 9/11 , not Iraq."
3,762956953,3,"Now consider this: even though, as President of the United States, Barack Obama had access to information that the general public does not have, and certainly knew of Iran’s involvement in the 9/11 attacks, Obama still pursued the Iran nuclear deal and gave billions to the Islamic Republic.",False,"Now consider this: even though, as President of the United States, Barack Obama had access to information that the general public does not have, and certainly knew of Iran’s involvement in the 9/11 attacks, Obama still pursued the Iran nuclear deal and gave billions to the Islamic Republic."
4,762956953,4,"The Iran nuclear deal should never have proceeded — President Obama, <span-6 the worst president in American history 6-/span> . “",True,"The Iran nuclear deal should never have proceeded — President Obama, the worst president in American history . “"


In [24]:
len(sentences_df)

15565

In [25]:
sentences_df[sentences_df["cleaned_sentence"] == ""]

Unnamed: 0,article_id,sentence_id,sentence,has_propaganda,cleaned_sentence
1068,735855251,19,6-/span>,True,
1656,697959084,19,5-/span>,True,
1802,999000147,24,5-/span>,True,
7750,721890296,43,8-/span>,True,
10547,705035735,21,8-/span>,True,


In [26]:
sentences_df = sentences_df[sentences_df["cleaned_sentence"] != ""]

In [27]:
sentences_df.to_csv("/data/semeval-2020/task-11/processed/sentence_dataset_v2.csv",index=False)