### **Analysis of Greek News Industry Coverage on Turkish Relations: A Data-driven Approach**

*Exploring sentiment, temporal trends, and textual similarities using web scraping and NLP techniques*

**INTRODUCTION**

The dynamics of Greek-Turkish ties have garnered a lot of attention recently in the Greek news community. The goal of this project is to present a data-driven examination of the media coverage of this subject. I will examine the feelings conveyed, temporal patterns, and linguistic similarities among news items using web scraping methods and natural language processing (NLP) technologies. My investigation will clarify the dominant myths and trends in the Greek news sector about Greek-Turkish relations.

**1. Data Extraction and Preparation**

I used web scraping methods with Python's requests and beautifulsoup modules to compile a sizable corpus of data. Over 500 news stories about Greek-Turkish ties were gathered by concentrating on well-known Greek news websites. The retrieved information was organized and stored as.csv files.

The data was then imported into Python and dataframes were created using the pandas package for quick processing and analysis. By deleting useless information, erasing NaN values, and discarding extraneous text and columns, I made sure the data was cleansed. To improve the quality of the textual data, further preprocessing procedures such stopword removal, stemming, or lemmatization were carried out.

**2. Temporal Analysis and Visualizations**

We used the date column to set it as the dataframe's index, which gave us access to time series analysis tools. To organize and analyze the data depending on various time periods, such as day, month, three months, or year, we utilized the resample() method.

In order to identify temporal patterns and trends in the coverage of Greek-Turkish relations, visualizations were used. To depict the amount and intensity of news coverage over time, time series line plots, bar charts, or heatmaps were used. These graphic representations will make it easier to spot significant developments or changes in the narrative in the Greek news sector.


**3.Sentiment Analysis**

I used sentiment analysis methods to probe further into the sentiment presented in the news stories. Greek dictionaries or well-known lexicons like EmoLex were used to rate the articles' emotion. The dataframe's new column now contains the sentiment scores.

I may investigate the predominant attitudes on Greek-Turkish relations in the Greek news sector using sentiment analysis. I may investigate how sentiments change over time and spot sentiment shifts in response to important events or changes in policy by categorizing the sentiment data using the resample method.

**4. Textual Similarity Analysis**

I used vectorization methods and cosine similarity calculations to identify textual overlaps between the news items. I expressed the articles as numerical vectors in a high-dimensional space by vectorizing them. The similarity between these vectors was then determined using cosine similarity.

I made a graph that illustrates the associations between the news stories based on their linguistic similarity using the similarity scores. The Greek news industry's common themes or narratives surrounding Greek-Turkish relations will be shown by clusters or groupings of stories with a high degree of textual similarity.

**CONCLUSION**

I have learned a lot about how the Greek news business covers Greek-Turkish relations by using web scraping, NLP methods, and data analysis. My investigation included textual comparisons, sentiment analysis, and temporal patterns, giving us a complete picture of the dominant narratives, emotions, and connections in the market.
This data-driven approach helps us to appreciate the dynamics of Greek-Turkish relations from the viewpoint of the Greek news industry. Academics, policymakers, and anybody else with an interest in the evolving environment of Greek-Turkish relations and how it is depicted in Greek media may find the findings of this investigation to be a helpful resource.
These methods can help the Greek news sector obtain a greater understanding of its coverage patterns, sentiment dynamics, and textual parallels, allowing for more insightful and nuanced reporting on Greek-Turkish relations and other significant issues of interest.

### **CODING**

**Step 1: Data Extraction and Preparation**

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

Web Scraping

In [2]:
url = "https://example.com/greek-turkish-relations"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
articles = soup.find_all('article')

Saving Data as CSV

In [None]:
df = pd.DataFrame(data)
df.to_csv('greek_turkish_relations.csv', index=False)

**Step 2: Import the Data into Python**

In [4]:
import pandas as pd

Reading the CSV file into a DataFrame

In [None]:
df = pd.read_csv('greek_turkish_relations.csv')

**Step 3: Data Cleaning**

In [6]:
import pandas as pd

Removing unnecessary columns

In [None]:
df = df.drop(['column1', 'column2'], axis=1)

Removing rows with NaN values

In [None]:
df = df.dropna()

Cleaning text data (remove stopwords, stemming, lemmatization, etc.)

In [None]:
from nltk.corpus import stopwords
stop_words = set(stopwords.words('greek'))
df['text'] = df['text'].apply(lambda x: ' '.join([word for word in x.split() if word not in stop_words]))

**Step 4: Create New Parameters**

In [8]:
import pandas as pd

Extracting month, time, and percentage

In [None]:
df['month'] = pd.to_datetime(df['date']).dt.month
df['time'] = pd.to_datetime(df['date']).dt.time
df['percentage'] = df['count'] / df['total'] * 100

**Step 5: Perform Analyses and Create Graphs**

In [10]:
import pandas as pd
import matplotlib.pyplot as plt

Wordcloud

In [None]:
from wordcloud import WordCloud
wordcloud = WordCloud().generate(' '.join(df['text']))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

Frequency chart of most frequent words

In [None]:
top_words = df['text'].str.split(expand=True).stack().value_counts().head(15)
top_words.plot(kind='bar')
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.title('Top 15 Most Frequent Words')
plt.show()

Frequency chart of most frequent bigrams

In [None]:
from nltk import ngrams
bigrams = list(ngrams(df['text'].str.split().explode().tolist(), 2))
top_bigrams = pd.Series(bigrams).value_counts().head(15)
top_bigrams.plot(kind='bar')
plt.xlabel('Bigrams')
plt.ylabel('Frequency')
plt.title('Top 15 Most Frequent Bigrams')
plt.show()

**Step 6: Time Series Analysis**

In [11]:
import pandas as pd

Setting date as index

In [None]:
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

Resampling data by month and calculate the mean sentiment

In [None]:
monthly_sentiment = df['sentiment'].resample('M').mean()

Plotting time series line plot

In [None]:
monthly_sentiment.plot()
plt.xlabel('Date')
plt.ylabel('Sentiment')
plt.title('Monthly Sentiment on Greek-Turkish Relations')
plt.show()

**Step 7: Sentiment Analysis**

In [13]:
import pandas as pd

In [None]:
df['sentiment'] = df['text'].apply(sentiment_analysis_function)

**Step 8: Grouping and Resampling**

In [14]:
import pandas as pd

Grouping sentiment by month and calculate the mean sentiment

In [None]:
monthly_sentiment = df['sentiment'].resample('M').mean()

Grouping sentiment by day and calculate the mean sentiment

In [None]:
daily_sentiment = df['sentiment'].resample('D').mean()

**Step 9: Textual Similarity Analysis**

In [16]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

Vectorizing the articles

In [None]:
vectorizer = TfidfVectorizer()
vectors = vectorizer.fit_transform(df['text'])

Calculating cosine similarity between the vectors

In [None]:
similarity_matrix = cosine_similarity(vectors)


In [17]:
import networkx as nx
import matplotlib.pyplot as plt

In [None]:
G = nx.Graph()
G.add_nodes_from(df.index)
for i in range(len(df.index)):
for j in range(i + 1, len(df.index)):
similarity = similarity_matrix[i, j]
if similarity > 0.5:
G.add_edge(df.index[i], df.index[j], weight=similarity)

In [None]:
pos = nx.spring_layout(G)
labels = {node: str(node.date()) for node in G.nodes()}
nx.draw_networkx(G, pos, labels=labels)
plt.title('Textual Similarity Graph')
plt.show()