In [1]:
import numpy as np
from sentence_transformers import SentenceTransformer, util

  from tqdm.autonotebook import tqdm, trange


Reference - Olympics Related News Article
https://www.scmp.com/opinion/china-opinion/article/3274583/how-achieve-olympic-success-china-offers-some-answers

In [2]:
news_article = '''
The Olympic Games hold different meanings for various nations, and the assessment of which country performs best often hinges on how medals are counted. In the 2024 Paris Olympics, China achieved its most successful performance since 2008, tying with the United States for the highest number of gold medals—40 each—marking the most golds China has ever secured at a Summer Olympics held outside its home country.
China excelled in table tennis and diving, and made significant strides in swimming and tennis, traditionally dominated by Western nations. Pan Zhanle’s astonishing victory in the men’s 100m freestyle sparked disbelief from coach Brett Hawke, who remarked that such a win seemed "not humanly possible." This came despite Pan undergoing doping tests 21 times between May and July.
China's swimming team began competing in the Olympics in 1988, winning its first gold in the women’s 100m freestyle in 1992. Since then, it has produced numerous gold medallists, primarily from Jiangsu and Zhejiang provinces, as well as Shanghai. The question arises: how did China achieve such remarkable success, and why do most gold medallists originate from coastal regions?
The secret lies in a combination of economic growth, a large population, and geographical advantages. Sociologist Wang Feng, in his book "China’s Age of Abundance," notes that China's economic reforms significantly improved nutrition. In the early 1970s, the average Chinese person consumed less than one egg per week, but by the early 1980s, egg consumption surged by 50%. As the nation grew wealthier, children became taller; boys in urban areas were 5.2 cm taller in 2002 than in 1992, and girls were 5.7 cm taller.
China's east coast, known for its wealth, has produced many Olympic gold medallists. The region's affluence contributes to better nutrition and fitness compared to other areas. Additionally, Jiangsu and Zhejiang provinces, with their lakes and rivers, foster a strong interest in swimming from a young age, supported by robust sporting infrastructure.
China's vast population allows it to identify athletes with the necessary attributes for various sports. Elite athletes are highly disciplined and endure rigorous training. For instance, southern Chinese athletes, typically smaller, have excelled in diving. Zhanjiang, a city in Guangdong, has produced three Olympic gold medallists in diving since 2004.
Seventeen-year-old Quan Hongchan, hailing from Zhanjiang, exemplifies the combination of a typical southern physique, intense training, and mental fortitude, winning three gold medals in diving at the Tokyo and Paris Olympics with perfect scores for her technique.
Hong Kong's Olympic journey began with Lee Lai-shan, who secured the territory’s first gold in windsurfing in 1996. After a long wait, Cheung Ka-long won Hong Kong's first gold in fencing 25 years later. With two golds in fencing at the Paris Olympics, Hong Kong ranks among the top 20 in gold medals per capita.
Ranked 37th in the medal table, Hong Kong's success is attributed to both the determination of its athletes and government support. In 2024-25, the government allocated HK$5.7 billion for community sports facilities and increased funding for elite athletes to HK$941 million.
Since Cheung’s fencing victory in 2021, the government and the Hong Kong Jockey Club launched a HK$300 million Sports Science and Research Funding Scheme to enhance athlete performance. This commitment to supporting athletes is commendable.
The achievements of Cheung and Vivian Kong Man-wai have inspired a surge in fencing interest among children. The ongoing success of Hong Kong athletes in various sports, including golf and tennis, has heightened enthusiasm for athletics. The upcoming Sports Park Sai Sha, developed by a private entity, will significantly enhance sports facilities for both elite and community athletes.
As top athletes retire, Hong Kong may face challenges in maintaining its Olympic success. However, the abundance of facilities, financial backing, and growing enthusiasm for sports bode well for the territory's long-term athletic development.
'''

In [3]:
sentences = news_article.split('.')

In [4]:
model = SentenceTransformer('all-miniLM-L6-v2')



Summarization using Cosine Similarity

In [5]:
sentence_embeddings = model.encode(sentences)

# Initialize the similarity matrix
similarity_matrix = np.zeros((len(sentences), len(sentences)))

#  Calculate cosine similarity
for i in range(len(sentences)):
    for j in range(len(sentences)):
        if i != j:
            similarity_matrix[i][j] = util.cos_sim(sentence_embeddings[i], sentence_embeddings[j])

# Print the similarity matrix
print("Similarity Matrix:")
print(similarity_matrix)

# Calculate sentence scores
sentence_scores = np.sum(similarity_matrix, axis=1)
print("Sentence Scores:")
print(sentence_scores)

Similarity Matrix:
[[ 0.          0.52786404  0.38496211 ...  0.46589595  0.2627154
  -0.03561613]
 [ 0.52786404  0.          0.45094928 ...  0.47985825  0.18692897
  -0.02499392]
 [ 0.38496211  0.45094928  0.         ...  0.46307892  0.39836401
  -0.07509203]
 ...
 [ 0.46589595  0.47985825  0.46307892 ...  0.          0.44085148
   0.06857216]
 [ 0.2627154   0.18692897  0.39836401 ...  0.44085148  0.
   0.1052688 ]
 [-0.03561613 -0.02499392 -0.07509203 ...  0.06857216  0.1052688
   0.        ]]
Sentence Scores:
[ 8.37293071 10.49509661 12.51742995  5.42930203  4.47388242  9.71469991
 11.35286103 12.69189056  7.25510142  5.44211456  5.42490995  4.82270341
  2.66498413 13.17065373  7.06187822 10.46502749 12.408704    8.96012358
 12.22459268 11.32911286 10.70705842  9.33171475  9.74709541 11.69738064
 13.04540006  6.11670022 10.90539877 12.35153531  6.91239204  9.02271495
 13.23800947  7.67975943 13.31993415  9.42931323  0.9706012 ]


In [6]:
# Select the top 3 sentences with the highest scores
summary_sentences = []
for i in range(3):
    index = np.argmax(sentence_scores)
    summary_sentences.append(sentences[index].strip())
    sentence_scores[index] = -1


In [7]:
# Concatenate the summary sentences to create a summary
summary = '. '.join(summary_sentences)
print(summary)

As top athletes retire, Hong Kong may face challenges in maintaining its Olympic success. The ongoing success of Hong Kong athletes in various sports, including golf and tennis, has heightened enthusiasm for athletics. China's east coast, known for its wealth, has produced many Olympic gold medallists


Summarization using BERT based Model 

In [8]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")



In [9]:
# Check if the article is not empty
if news_article.strip():
    # Summarize the article
    summary = summarizer(news_article, max_length=130, min_length=30, do_sample=False, truncation=True)

    # Print the summary
    print(summary[0]['summary_text'])
else:
    print("The article is empty.")

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


In the 2024 Paris Olympics, China achieved its most successful performance since 2008. The secret lies in a combination of economic growth, a large population, and geographical advantages. The east coast, known for its wealth, has produced many Olympic gold medallists.
