# Introduction

For this second milestone in the project, we're going to perform the following preliminary tasks to evaluate the feasibility of our project:
- [x] Load and pre-process the data
- [x] Exploratory data analysis to confirm intuitive correlations between variables and define a potential main article
- [x] Define a primitive set of clichés, a performance metric and verify if the passing paths using clichés have better performance
- [x] Study a particular cliché and the categorization of the articles linked to it and our main article

In the code below, we'll mainly be using the graph data from the Wikispeedia dataset. 
In particular, we use the finished paths, a little bit the unfinished paths, the adjacency matrix and theoretical shortest paths.


# Imports

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import csv
import os
from ast import literal_eval
from collections import Counter

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

from cliches import *
from data_augmentation import *
from data_quantity_analysis import *
from plot_helpers import *
from preprocessing import *

np.random.seed(127)

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/tudoroancea/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/tudoroancea/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/tudoroancea/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [3]:
#if the data has not been downloaded yet
if not os.path.exists('data'):
    %run download_data.py

# Preprocessing

In [4]:
(
    articles,
    categories,
    links,
    paths_finished,
    paths_unfinished,
) = import_and_clean_data()

# Data quantity analysis - exploratory data analysis

## General 

We start by exploring the data set to get an overview of the data and the information it provides. As a first step, we're trying to create graphs that could potentially give us information about how players play and think.

As the aim of our project is to look at the role of clichés in the choice of artworks by users and, more generally, the role they play in information, we begin by looking at the sub-sets of data that contain the most information.

Here we look at the 100 most visited articles by players and then the 100 most used targets, both among the finished paths.

In [None]:
top_50_visited_articles(paths_finished, categories, show=True, html_file=True)

In [None]:
top_50_target_articles(paths_finished, categories, show=True)

We then look at the distribution of the length and duration of the finished paths to see if these correspond to a normal trend or if there are particular features to be analysed.

In [None]:
path_duration_distribution(paths_finished, show=True)

In [None]:
path_length_distribution(paths_finished, show=True)

Now that we've looked at distributions with a single valuer, we'll look at the links between several valuers: in particular, we'll look at the interaction between the length of the path and the length of the game's duration, and then the interaction between the score given to the chmin and its length.

In [None]:
path_length_vs_duration(paths_finished, show=True)

In [None]:
rating_vs_path_length(paths_finished, show=True)

Based on the previous two plots, we could hypothesize that the length of the path is correlated to the duration of the game and its rating, simply based on the apparent monotonicity of the data.
Both these interpretations are intuitive but we should perform a statistical regressions and tests to confirm them.

We may be interested in another more objective metrics. In other word, we try to compare how the difference between the actual path taken and the shortest (theoretical) path behaves with respect to the rating given.

In [None]:
ax = sns.catplot(
    data=paths_finished,
    y="diff_length",
    kind="bar",
    palette="coolwarm",
    hue="rating",
)
ax.despine(left=True)
plt.title("Difference between path length and the shortest path length, depending on the rating")
plt.xlabel("Rating (where 0 is no rating)")
plt.ylabel("Length difference")
plt.show()

Again, there seems to be some difference between the difference of length of paths for different ratings. Notice that compared to the path lengths for different rating, the difference between rating 4 vs 5 in more siginificant in the plot of difference in length. Indeed, the error bars do not overlap in these ratings.

For this reason, we will choose this difference between the path length and the shortest path length as our performance metric.

## Specific example: the United Kingdom

Now that we've looked at the data as a whole, let's take a closer look at one data set in particular, that of United Kingdom. This will allow us to define a set of snapshots for this particular area. We've decided to use United Kingdom as an example because it's one of the most visited items in the dataset and as one of our european neighbors, we can easily find clichés that are true.

In [None]:
main_article = "United_Kingdom"

In [None]:
count_in_out_neighbors(paths_finished, paths_unfinished, main_article)

In [None]:
distribution_position_percentage(paths_finished, main_article, show=True)

In generality, we can't really infer any particular distribution from the data. However, in future analysis we could try to fit distributions when we consider the position of the main article conditonally on additional events (e.g. the path also contains a particular cliché).

In [None]:
# define main_article for analysis: 
main_article = "United_Kingdom"

# only keep rows such that main_article in path
around_main_article = get_df_main_article(paths_finished, main_article=main_article).copy(deep=True)

# get category of the main_article
around_main_article["main_article_category"] = around_main_article["path"].apply(lambda path: get_category_main_article(main_article, path, categories))

# article juste before/just after main_article
# TODO: function to be improved/discussed. What do we want to do with "<"?
around_main_article["around_path"] = around_main_article["path"].apply(
    lambda path: get_index_main_article_in(main_article, path)
)

# update initial/target_article, initial/target_category
around_main_article["around_IA"] = around_main_article["around_path"].apply(
    lambda path: path[0]
)
around_main_article["around_TA"] = around_main_article["around_path"].apply(
    lambda path: path[-1]
)
around_main_article["around_IC"] = around_main_article["around_IA"].apply(
    lambda article: categories[categories["article"] == article]["category1"].values
)
around_main_article["around_TC"] = around_main_article["around_TA"].apply(
    lambda article: categories[categories["article"] == article]["category1"].values
)

around_main_article

In [None]:
# keep only 10% most frequent articles just after main_article
most_frequent_before = around_main_article["around_IA"].value_counts(normalize=True)
plot_most_frequent_articles(most_frequent_before, "precede");

In [None]:
# keep only 10% most frequent articles just after main_article
most_frequent_before = around_main_article["around_IA"].value_counts(normalize=True)
plot_most_frequent_articles(most_frequent_before, "precede");

# Define our cliches 

## Cliches preprocessing

In this section we will define and extract data from the clichés. This will enable us to carry out analyses to discover whether or not there are any links between our chosen subject and the clichés.

## Statistical tests for difference of path length for different cliches

Now, we want to find some statistical evidence of whether clichés influence the length of the path from the initial article to the target article. From the previous exploration, it seems that the difference of path length between the actual path taken and the shortest (theoretical) path is a "good" metric of the player's performance.

So the idea is to select all the rows in `paths_finished` that contain the main article `United_Kingdom`. The selection of clichés remains to be done in an unbiased way. For now, we select "cliché" articles from the plots just above (see selected cliché articles below)[^1]. Next, it makes sense to compare difference in length for a given rating. Otherwise, as stated earlier, "harder" articles seem to tend to have higher difference in length and hence we would be comparing different categories.

To check whether there is a statistical difference, we compute a Welch's t-test. Indeed, given a rating, we assume that our observations are independant. Additionally, it seems that the difference in length among a rating between those who went through cliché articles and those who did not do not have same variance. 

[^1]: We are looking into different ways to retrieve cliché articles.

In [None]:
fig, axes = plt.subplots(1, 5, figsize=(12, 6), sharey=True)
fig.tight_layout()

for i, ax in enumerate(axes):
    temp = paths_finished[paths_finished["rating"] == i + 1]["diff_length"]
    ax.hist(temp, bins=20, density=True)
    ax.set_xlabel("difference in length")
    if i == 0:
        ax.set_ylabel("density value")
    ax.set_title(f"Rating {i+1}")

# make room for suptitle
fig.subplots_adjust(top=0.9)
fig.suptitle(
    "Distribution of the difference in length between the path and the shortest path, depending on the rating"
)

plt.show()

We observe that the higher the rating, the higher the variance. Additionally, it is worth noting that the data is imbalanced among the ratings.

Above, we have compared the variance between different ratings but not among a rating between the paths that go through cliché articles and those that do not. 

In [None]:
# select some cliches
main_article_cliches = [
    "William_Shakespeare",
    "Harry_Potter",
    "BBC",
    "Winston_Churchill",
    "The_Beatles",
    "Elizabeth_II_of_the_United_Kingdom",
    "Flower",
    "British_monarchy",
    "British_Empire",
    "Prime_Minister_of_the_United_Kingdom",
    "Pound_sterling",
    "London",
]

ratings = np.arange(1, 6, dtype=int)

In [None]:

# create dataframe
test_cliche = pd.DataFrame(
    columns=[
        "main_article",
        "cliche",
        "rating",
        "mean_cliche",
        "mean_nocliche",
        "shape_cliche",
        "shape_nocliche",
        "statistic",
        "p_value",
    ]
)

# test different ratings and cliches
for rating in ratings:
    for cliche in main_article_cliches:
        (
            stat,
            p,
            mean_path_cliche,
            mean_path_nocliche,
            shape_cliche,
            shape_nocliche,
        ) = test_difference_path_length_cliche(
            paths_finished, rating, cliche, main_article, False, False
        )
        test_cliche.loc[test_cliche.shape[0]] = [
            main_article,
            cliche,
            rating,
            mean_path_cliche,
            mean_path_nocliche,
            shape_cliche,
            shape_nocliche,
            stat,
            p,
        ]

test_cliche

Note that in most cases, we do not have enough data to compare. However, those going through the article `London` are *almost* statistically significantly (at $\alpha=0.05$) more performant that others fore rating 1 and 2.

# Categories and clichés

We consider here the finished paths passing through our chosen main article "United_Kingdom".
We enumerate all the categories of the articles at most 3 steps away from the main article on these paths and construct a bar chart to visualize their number of occurences.
We further color each category bar depending on whether "United_Kingdom" belongs to it or not.
Finally, arrows are used to indicate specific categories of a given cliché (here “William_Shakespeare”, drawn from the list created in the previous section). 

In [None]:
all_categories, subcategories1, subcategories2, subcatgeories3 = separate_categories(main_article, categories)

In [None]:
combine_results(paths_finished, main_article, categories, all_categories, ["category1", "category2", "category3"])

This bar plot shows that:
- the categories that "United_Kingdom" belongs to are (among) the most frequent ones, which is not surprising given that the hyperlinks in an article should redirect to related articles, and given that the players often rely on semantic links to find the target article.
- if we look at the categories of our designated cliché "William_Shakespeare", they are quite frequent, although not coinciding with the "United_Kingdom"'s catgeories. This is not surprising either, because of the very nature of the two articles: one of them is a country, the other one is a playwright. However, we can see that the categories of "William_Shakespeare" are quite frequent, which is a good sign for our project.

# Data augmentation pipeline

As seen previously, we do not have much data to extract meaningful results. So, we may want to augment our data.

The pipeline for augmenting our data is the following: we analyze the clichés around the UK in the SeeGull data set as well as the content of the articles and try to link them together. More precisely, we retreive the "topics" of the clichés from the SeeGull data set (eg alcohol, liquid etc.) and do the same for each articles.

So far, we have two sets of topics: one from the SeeGull dataset, and one for each article from the main data base. We add article to our list of cliché articles if there are "enough" topics in common.

The main part of this analysis is to make sure that is pipeline is sensible:
- does is make sense to compare the two sets of topics? 
- does it actually augmente the data (statistical significance for the t-tests above)

In [None]:
import spacy
from empath import Empath
import gensim
from gensim import corpora
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords, PlaintextCorpusReader
from nltk.stem import WordNetLemmatizer
import nltk

# download if first time running library
download = True

if download:
    nltk.download("punkt")
    nltk.download("stopwords")
    nltk.download("wordnet")

DATA_PATH = "data"
ARTICLES_PLAIN_TEXT_PATH = os.path.join(DATA_PATH, "articles_plain_text")

## SeeGull topics

In [None]:
from data_augmentation import *

In [None]:
uk_topics_seegull = generate_seegull_topics(["British", "English"], 10)

## Wikispeedia dataset topics

do the same but with the wikispeedia dataset

In [None]:
article_topics = load_articles_topics()

### Articles' content with proper noun

The idea is to look at the content of the articles and compare it to the reference article.

#### First attempt
First attempt takes the content in general without any filtering.
We can first test the topics on a small subset of articles to check.

Compare 10 random articles

In [None]:
np.random.seed(16)

uk_topics_row = article_topics[article_topics["article_name"] == "United_Kingdom"]
# Randomly select 10 articles
random_articles = article_topics.sample(n=10)

for index, row in random_articles.iterrows():
    common_topics = set(row["topics"]) & set(uk_topics_row["topics"].values[0])
    print(
        f"Common topics for {row['article_name']} and United Kingdom: {common_topics}"
    )

# Save 'topics' column of random_articles to a CSV file
# random_articles[["article_name", "topics"]].to_csv(
#     os.path.join(DATA_PATH, "random_articles_topics.csv"), index=False
# )

In [None]:
# /!\ dataset is stored in "top_50_common_topics.csv" file

uk_topics = uk_topics_row.iloc[0]["topics"]

# Compare topics for all articles in the dataset
common_article_topics = []

for index, row in article_topics.iterrows():
    if row["article_name"] == "United_Kingdom":
        continue
    common_topics = set(row["topics"]) & set(uk_topics_row["topics"].values[0])
    if common_topics:
        common_article_topics.append(
            {"article_name": row["article_name"], "common_topics": common_topics}
        )

# Sort comparisons based on the number of common topics
common_article_topics.sort(key=lambda x: len(x["common_topics"]), reverse=True)

common_article_topics = pd.DataFrame(common_article_topics)
display(common_article_topics.head(20))

# Print the top 5 comparisons
# print("Top 5 comparisons with most common topics:")
# for i in range(min(5, len(common_article_topics))):
#     print(
#         f"{common_article_topics[i]['article_name']} and United Kingdom: {common_article_topics[i]['common_topics']}"
#     )

# Save the top 50 comparisons to a CSV file
# top_50_comparisons = comparisons_with_common_topics[:50]
# if top_50_comparisons:
#     df_top_50_comparisons = pd.DataFrame(top_50_comparisons)
#     df_top_50_comparisons.to_csv(os.path.join(DATA_PATH , "top_50_common_topics.csv", index=False)

We notice that the common topics are very general and a lot of articles have them. So we need to filter the content of the articles.


In [None]:
# extract the common topics for articles 12th_century, Armand_Jean_du_Plessis%2C_Cardinal_Richelieu, Babylonia
concrete_examples = common_article_topics[
    common_article_topics["article_name"].isin(
        ["12th_century", "Armand_Jean_du_Plessis%2C_Cardinal_Richelieu", "Babylonia"]
    )
]
for _, row in concrete_examples.iterrows():
    print(f"{row['article_name']} and United Kingdom: {row['common_topics']}")

#### Second attempt
Look at the most common words and proper nouns in contents

In [None]:
tokens_and_pos["article_name"] = tokens_and_pos["article_name"].str.replace(".txt", "")
tokens_and_pos.to_csv(os.path.join(data_path, "article_token_pos.csv"), index=False)

In [None]:
# Load the preprocessed data
tokens_and_pos = load_token_pos()

# Filter rows related to the target article
target_article_name = "United_Kingdom"
target_article_data = tokens_and_pos[
    tokens_and_pos["article_name"] == target_article_name
]

# Extract the top 10 PROPN tokens for the target article
target_proper_nouns = [
    token[0]
    for tokens_pos_list in target_article_data["tokens_pos"]
    for token in tokens_pos_list
    if token[1] == "PROPN"
]
top_target_proper_nouns = [
    item[0] for item in Counter(target_proper_nouns).most_common(10)
]

# Initialize a list to store common proper nouns
common_proper_nouns = []

# Iterate through each row in the DataFrame
for index, row in tokens_and_pos.iterrows():
    # Extract the top 10 PROPN tokens for each article
    if row["article_name"] == target_article_name:
        continue
    article_proper_nouns = [
        token[0] for token in row["tokens_pos"] if token[1] == "PROPN"
    ]
    top_article_proper_nouns = [
        item[0] for item in Counter(article_proper_nouns).most_common(10)
    ]

    # Compare with the top PROPN tokens of the target article
    common_tokens = set(top_article_proper_nouns) & set(top_target_proper_nouns)

    # Store the results
    common_proper_nouns.append(
        {"article_name": row["article_name"], "common_propnouns": common_tokens}
    )

# Convert the results to a DataFrame
common_proper_nouns_df = pd.DataFrame(common_proper_nouns)

# Sort the DataFrame based on the number of common proper nouns
common_proper_nouns_df = common_proper_nouns_df.sort_values(
    by="common_propnouns", key=lambda x: x.str.len(), ascending=False
)

# Print the top 5 comparisons
print("Top 5 comparisons with most common proper nouns:")
for i in range(min(5, len(common_proper_nouns_df))):
    print(
        f"{common_proper_nouns_df.iloc[i]['article_name']} and {target_article_name}: {common_proper_nouns_df.iloc[i]['common_propnouns']}"
    )

# Save the top comparisons to a CSV file
top_comparisons = common_proper_nouns_df.head(50)
# if not top_comparisons.empty:
#     top_comparisons.to_csv(
#         os.path.join(data_path, "top_proper_noun_comparisons.csv"), index=False
#     )
#     print(
#         f"Top comparisons with most common proper nouns saved to {data_path}top_proper_noun_comparisons.csv"
#     )
# else:
#     print("No common proper nouns found.")

In [None]:
common_proper_nouns_df.head(20)

### Articles' content with links 
(pushed from Martin's work)

Idea is to compare the `linkTarget` in the reference article (UK) and other articles. Articles that satisfy "some conditions" (to be defined) are selected as clichés articles.

In [21]:
# load the precomputed data
links_common = load_links_common()
links_common.head(20)

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/tudoroancea/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/tudoroancea/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/tudoroancea/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Unnamed: 0,article_name,common_links
0,Áedán_mac_Gabráin,"[Orkney, Isle_of_Man, Monarchy, Great_Britain,..."
1,Åland,"[Time_zone, World_War_II, Currency, European_U..."
2,Édouard_Manet,"[Germany, United_States_dollar, Italy, France]"
3,Éire,"[English_language, Ireland, Republic_of_Irelan..."
4,Óengus_I_of_the_Picts,"[Lion, Great_Britain, Ireland, Scotland, England]"
5,€2_commemorative_coins,"[United_Nations, Ireland, Italy, European_Unio..."
6,10th_century,"[Lion, Monarchy, Italy, India, Scotland, Germa..."
7,11th_century,"[England, France, India, Novel]"
8,12th_century,"[England, Ireland, France, India]"
9,13th_century,"[Europe, Islam, Scotland, Isle_of_Man]"


In [22]:
# find 3 random sampes
random_samples = links_common.sample(n=3)
for _, row in random_samples.iterrows():
    print(type(row["common_links"]))
    print(
        f"Common links between {row['article_name']} and United Kingdom: {row['common_links']}"
    )

<class 'list'>
Common links between Red_Kite and United Kingdom: ['Europe', 'Wales', 'Scotland', 'Germany', 'England']
<class 'list'>
Common links between Magnesium and United Kingdom: ['DNA', 'Earth', 'Hydrogen', 'Scotland', 'Steel', 'Electron']
<class 'list'>
Common links between Honduras and United Kingdom: ['Time_zone', 'English_language', 'Currency', 'Television', 'United_States_dollar', 'List_of_countries_by_system_of_government', 'Spain']


In [23]:
links_common["nbr_common_links"] = links_common["common_links"].apply(len)
links_common.sort_values("nbr_common_links", ascending=False, inplace=True)
links_common.head(20)

Unnamed: 0,article_name,common_links,nbr_common_links
1379,England,"[Industrial_Revolution, Benjamin_Britten, Engl...",63
3642,Scotland,"[Industrial_Revolution, Parliament_of_the_Unit...",57
4282,United_States,"[Natural_gas, World_War_II, NATO, World_Herita...",42
700,British_monarchy,"[Parliament_of_the_United_Kingdom, World_War_I...",35
2530,London,"[Parliament_of_the_United_Kingdom, World_War_I...",33
1360,Elizabeth_II_of_the_United_Kingdom,"[British_monarchy, World_War_II, Northern_Irel...",32
698,British_Isles_(terminology),"[Scottish_Gaelic_language, Northern_Ireland, A...",32
1427,Europe,"[Industrial_Revolution, World_War_II, NATO, No...",32
1383,English_language,"[World_War_II, Northern_Ireland, Jersey, Briti...",30
1688,Germany,"[Natural_gas, Industrial_Revolution, World_War...",29


Lots of locations appear. We may want to remove them.

In [None]:
# create nlp object
nlp = spacy.load('en_core_web_sm')
doc = nlp(text_reference)

In [None]:
# extract countries and cities
locations = [(ent.text, ent.label_) for ent in doc.ents if ent.label_ in ['GPE', 'LOC']]

# filter out duplicates
unique_locations1 = (set(locations))

Commentary for us (french):
- normaliser les scores
- comment gerer les pays?
- combiner les deux methodes anna/martin ? comment ? threshold selection des cliches
- regarder les cliches selectionner par NLP et checker s'ils sont utilisés dans les paths
- commenter sur les cliches jamais pris. Pourquoi?

Prompt ChatGPT: give me clichés about the UK using bullet points and at most 5 words by cliché
- Tea time obsession
- Polite queueing traditions
- Rainy weather stereotypes
- Double-decker buses iconic
- Sherlock Holmes detective legacy
- Royal family fascination
- Soccer (football) mania
- Pubs and fish & chips
- Mysterious foggy landscapes
- Love for proper etiquette
- Red phone booths everywhere
- Beatles' timeless musical influence

## Revisit of statistical tests with data augmentation/clichés selection

### Normalize number of links in common

In [None]:
# find number of links by article
df_nbr_links_by_articles = pd.pivot_table(
    links, values=["linkTarget"], index=["linkSource"], aggfunc="count"
).sort_values("linkTarget")
display(df_nbr_links_by_articles)

In [None]:
# only keep articles with at least one in common
no_zero_common_articles = common_links_reference[
    ~np.where(common_links_reference["nbr_common_articles"] == 0, True, False)
].sort_values("nbr_common_articles")
display(no_zero_common_articles)

In [None]:
no_zero_common_articles_normalized_links = no_zero_common_articles.copy(deep=True)


def normalize_links(article, size_common_links, size_total_links):
    # normalize number of links in common by number of links in article
    total_links_article = size_total_links.loc[article].values[0]
    return size_common_links / total_links_article


# normalize number of links in common by number of links on article considered
no_zero_common_articles_normalized_links["normalized_common"] = no_zero_common_articles[
    ["article", "nbr_common_articles"]
].apply(
    lambda row: normalize_links(
        row["article"], row["nbr_common_articles"], df_nbr_links_by_articles
    ),
    axis=1,
)

display(no_zero_common_articles_normalized_links.sort_values("normalized_common", ascending=False))

Remove max/min values of normalization. How to keep the correct clichés?

### Perform the statistical tests

Does not look good with normalization. Statistical analysis without normalization using data augmentation.

In [None]:
common_articles_normalized = no_zero_common_articles_normalized_links.sort_values("nbr_common_articles", ascending=False).reset_index(drop=True).copy(deep=True)
common_articles_normalized[1:50]

Try to remove countries?

In [None]:
paths_finished_use_cliche = paths_finished.copy(deep=True)

# define the cliches
set_cliches = common_articles_normalized["article"][1:20]

# check which paths use cliches
paths_finished_use_cliche["uses_cliche"] = paths_finished["path"].apply(
    lambda path: len(set(path) & set(set_cliches)) != 0
)
paths_finished_use_cliche

In [None]:
# do the statistical tests by rating by comparing difference of length for those using cliches and those who do not
# here only for rating 1 for test
rating = 1
df_statistics_test_rating1 = (
    paths_finished_use_cliche.groupby(["rating", "uses_cliche"])[["diff_length"]]
    .apply(lambda x: x)
    .loc[rating]
)
df_statistics_test_rating1

In [None]:
from scipy import stats

stat, p = stats.ttest_ind(
    df_statistics_test_rating1.loc[True].to_numpy().reshape(df_statistics_test_rating1.loc[True].shape[0]),
    df_statistics_test_rating1.loc[False].to_numpy().reshape(df_statistics_test_rating1.loc[False].shape[0]),
    equal_var=False,
    alternative="two-sided",
)

print(f"stat={stat}, pvalue={p}")
print(
    "Result is significant at 0.05" if p < 0.05 else "Result is not significant"
)