### News
Combine news.tsv files from training, validation, and test files to get one csv file to reference for all news data.

#### Import Libraries and Data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [3]:
# import all the news.tsv files
columns = ['news_id', 'category', 'subcategory', 'title', 'abstract', 'url', 'title_entities', 'abstract_entities']
train_news = pd.read_csv("./train/news.tsv", 
                   sep='\t', 
                   names = columns)
val_news = pd.read_csv("./val/news.tsv", 
                   sep='\t', 
                   names = columns)

#### Combine news.tsv Files

In [4]:
# concatenate the news files together
news = pd.concat([train_news, val_news], ignore_index = True)

In [5]:
# check for duplicates
news.duplicated().sum()

np.int64(69399)

In [6]:
# remove duplicates
news = news.drop_duplicates()

In [7]:
news.shape

(104151, 8)

In [8]:
# remove url column since they are expired
news = news.drop(columns = 'url')

# convert id to int
news['news_id'] = news['news_id'].str.replace('N', '').astype(int)

In [9]:
news.head()

Unnamed: 0,news_id,category,subcategory,title,abstract,title_entities,abstract_entities
0,88753,lifestyle,lifestyleroyals,"The Brands Queen Elizabeth, Prince Charles, an...","Shop the notebooks, jackets, and more that the...","[{""Label"": ""Prince Philip, Duke of Edinburgh"",...",[]
1,45436,news,newsscienceandtechnology,Walmart Slashes Prices on Last-Generation iPads,Apple's new iPad releases bring big deals on l...,"[{""Label"": ""IPad"", ""Type"": ""J"", ""WikidataId"": ...","[{""Label"": ""IPad"", ""Type"": ""J"", ""WikidataId"": ..."
2,23144,health,weightloss,50 Worst Habits For Belly Fat,These seemingly harmless habits are holding yo...,"[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik...","[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik..."
3,86255,health,medical,Dispose of unwanted prescription drugs during ...,,"[{""Label"": ""Drug Enforcement Administration"", ...",[]
4,93187,news,newsworld,The Cost of Trump's Aid Freeze in the Trenches...,Lt. Ivan Molchanets peeked over a parapet of s...,[],"[{""Label"": ""Ukraine"", ""Type"": ""G"", ""WikidataId..."


#### Export to csv

In [11]:
news.to_csv('full_news.csv', index = False)