## Introduction
The goal of the project was to perform a systematic investigation of a number of Deep Learning methods in the context of text processing tasks, and benchmark these methods against classical methods where appropriate. The following was provided: a report and source code complete with a link to Google Colab is contained within a .zip file provided and trained models using Keras / TensorFlow. No alternative data set or coding framework was used.

In [2]:
# Libraries
import random
import pandas as pd
import keras

## Dataset
[Guardian News Articles dataset](https://www.kaggle.com/datasets/adityakharosekar2/guardian-news-articles) on Kaggle was used to perform genre or more precisely section analysis. Since this dataset was large (~150,000 articles / >700MB) the full dataset was not used. Instead a proportion of the dataset (random 10%-20%) was used.

In [3]:
random.seed(4321) #for reproducibility
#10% of 149,828 rows = 14,983 rows
df = pd.read_csv("guardian_articles.csv", skiprows=lambda x: x > 0 and random.random() >=.1)

In [6]:
#percentage of data imported from guardian dataset
round((len(df.index) /149828)*100, 2)

10.0

In [7]:
df.head()#look at raw data

Unnamed: 0,article_id,sectionName,webTitle,webUrl,bodyContent,webPublicationDate,id
0,world/2016/jan/31/tanzania-britsh-helicopter-p...,World news,British pilot in Tanzania 'manoeuvred ​to save...,https://www.theguardian.com/world/2016/jan/31/...,A British pilot who was shot dead by an elepha...,2016-01-31T23:43:48Z,3
1,football/2016/jan/31/jurgen-klopp-liverpool-yo...,Football,Jürgen Klopp hails Liverpool youngsters but re...,https://www.theguardian.com/football/2016/jan/...,Jürgen Klopp’s exasperation was understandable...,2016-01-31T22:30:10Z,7
2,football/2016/jan/31/tommy-elphick-harry-arter...,Football,Tommy Elphick turns thoughts to Harry Arter af...,https://www.theguardian.com/football/2016/jan/...,Tommy Elphick paid tribute to his team-mate Ha...,2016-01-31T22:30:10Z,9
3,football/2016/jan/31/chelsea-manchester-city-f...,Football,Chelsea draw Manchester City and Arsenal meet ...,https://www.theguardian.com/football/2016/jan/...,Manchester City will play Chelsea at Stamford ...,2016-01-31T18:57:42Z,47
4,football/2016/jan/31/chelsea-john-terry-not-re...,Football,John Terry to leave Chelsea after refusal of f...,https://www.theguardian.com/football/2016/jan/...,John Terry has confirmed he is to leave Chelse...,2016-01-31T18:48:09Z,48


In [10]:
#reorder data and oly take the news title and category label
df = df[['webTitle', 'sectionName']]

In [11]:
df.head()

Unnamed: 0,webTitle,sectionName
0,British pilot in Tanzania 'manoeuvred ​to save...,World news
1,Jürgen Klopp hails Liverpool youngsters but re...,Football
2,Tommy Elphick turns thoughts to Harry Arter af...,Football
3,Chelsea draw Manchester City and Arsenal meet ...,Football
4,John Terry to leave Chelsea after refusal of f...,Football


In [16]:
#check missing data
df.isnull().sum() #none missing

webTitle       0
sectionName    0
dtype: int64

In [17]:
#split into features and targets
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

In [19]:
print(X)
print("\n")
print(y)

[["British pilot in Tanzania 'manoeuvred \u200bto save colleague\u200b\u200b before death'"]
 ['Jürgen Klopp hails Liverpool youngsters but remains mindful of task in hand']
 ['Tommy Elphick turns thoughts to Harry Arter after Bournemouth march on']
 ...
 ['No 10 to set out sweeping plans to override power of human rights court']
 ['What would a British bill of rights look like?']
 ['Liam Livingstone settling down to his role among England’s entertainers']]


['World news' 'Football' 'Football' ... 'Law' 'Law' 'Sport']
